2018/January/Data Engineering
From charlesreid1
notes from January 2018 data engineering work.
this consists of review (rebooting the data engineering stuff) and coding (identifying relevant scenarios for data engineering scenarios).
pages
Review page: Google Cloud/Review
Project 1: 2018/January/Data Engineering/Scientific Data Processing
Project 2: 2018/January/Data Engineering/Big Data Text Processing
Project 3: 2018/January/Data Engineering/Cosmos
procedure
Expanding data-engineering-scenarios:
- Start with ready examples
- Work toward synthetic experimental data
- An imaginary factory... lots of widgets... Kubernetes/container engine... orchestrating a process
- Focus on a particular process or set of processes, and drill into it, use it to provide multiple angles on a single concept
Software tools list, (abstract) example for each: Google Cloud
- Storage/database/computation/GPUs vs CPUs/containerization
Software quality assurance: https://git.charlesreid1.com/charlesreid1/scientific-software
- 10 Best
- More informal
- Bullet points - things I've learned
- Apply style of later points to earlier points
- Github page - 10 things
- Clear out lorem ipsum (7-10)
links
links to notes
Notes review: GCDEC
- Case study - Google Cloud/Case Study
- 1 - GCDEC/Fundamentals/Notes
- 2 - GCDEC/Unstructured_Data/Notes
- 3a - GCDEC/BigQuery/Notes
- 3b - GCDEC/Dataflow/Notes
- 4a - GCDEC/Building_Tensorflow/Notes
- 4b - GCDEC/Deploying_Tensorflow/Notes
- 4c - GCDEC/Engineering_Tensorflow/Notes
- 5 - GCDEC/Streaming/Notes
links to codelabs
Google Codelabs:
- Main link - https://codelabs.developers.google.com/
- Kubernetes and Container Engine - https://codelabs.developers.google.com/codelabs/cloud-compute-kubernetes/index.html?index=..%2F..%2Findex#0
- Process Astronomy Data to Generate Images - https://codelabs.developers.google.com/codelabs/cloud-compute-the-cosmos/index.html?index=..%2F..%2Findex#0
- Kubernetes for Java apps - https://codelabs.developers.google.com/codelabs/cloud-springboot-kubernetes/index.html?index=..%2F..%2Findex#0
- Google Cloud Storage - https://codelabs.developers.google.com/codelabs/es003l-storage/index.html?index=..%2F..%2Findex
- Campaign finance with bigquery - https://codelabs.developers.google.com/codelabs/cloud-bq-campaign-finance/index.html?index=..%2F..%2Findex#0
- Text processing with big data - https://codelabs.developers.google.com/codelabs/cloud-dataflow-starter/index.html?index=..%2F..%2Findex#0
- Recommendations ML - https://codelabs.developers.google.com/codelabs/cloud-accelerate-dataproc/index.html?index=..%2F..%2Findex#0
- Spark + OpenCV - https://codelabs.developers.google.com/codelabs/cloud-dataproc-opencv/index.html?index=..%2F..%2Findex
- Speech to Text - https://codelabs.developers.google.com/codelabs/cloud-speech-intro/index.html?index=..%2F..%2Findex#0
- Translate Text - https://codelabs.developers.google.com/codelabs/cloud-translation-intro/index.html?index=..%2F..%2Findex#0
Google Qwiklabs:
- Google Cloud Platform essentials - https://google.qwiklabs.com/quests/23?locale=en
- Scientific data processing - https://google.qwiklabs.com/quests/28?locale=en
- Data engineering - https://google.qwiklabs.com/quests/25?locale=en