|
|
| Line 1: |
Line 1: |
| ==Review of Google Cloud and Data Engineering==
| |
|
| |
| <s>Review in preparation for interview:
| |
| * Components of workflow in cloud, analogies
| |
| * Open source tools used at each "step"
| |
| * Highlighting different workflows using repositories
| |
| * Quick/easy example: why so many database solutions? How to do basics?
| |
| * Specific challenges, software, workflow for genomics research</s>
| |
|
| |
| ==Review Notes Pages== | | ==Review Notes Pages== |
|
| |
|
| [[Google Cloud/Scientific Data Processing]] - doing the scientific data processing qwiklab | | [[Google Cloud/Scientific Data Processing]] - doing the scientific data processing qwiklab |
|
| |
|
| | Review page: [[Google Cloud/Review]] |
|
| |
|
| | ==Data Engineering Scenarios Review== |
|
| |
|
| ==Procedure==
| | Project 1: [[2018/January/Data Engineering/Scientific Data Processing]] |
| | |
| Software tools list, (abstract) example for each: [[Google Cloud]]
| |
| * Storage/database/computation/GPUs vs CPUs/containerization
| |
| | |
| Software quality assurance:
| |
| * Github page - 10 things
| |
| * Apply style of later points to earlier points
| |
| * Clear out lorem ipsum (7-10)
| |
| | |
| ===Links to Notes===
| |
| | |
| Notes review: GCDEC
| |
| * Case study - [[Google Cloud/Case Study]]
| |
| * 1 - [[GCDEC/Fundamentals/Notes]]
| |
| * 2 - [[GCDEC/Unstructured_Data/Notes]]
| |
| * 3a - [[GCDEC/BigQuery/Notes]]
| |
| * 3b - [[GCDEC/Dataflow/Notes]]
| |
| * 4a - [[GCDEC/Building_Tensorflow/Notes]]
| |
| * 4b - [[GCDEC/Deploying_Tensorflow/Notes]]
| |
| * 4c - [[GCDEC/Engineering_Tensorflow/Notes]]
| |
| * 5 - [[GCDEC/Streaming/Notes]]
| |
| | |
| ===Links to Code Labs===
| |
| | |
| Google Codelabs:
| |
| * Main link - https://codelabs.developers.google.com/
| |
| * Kubernetes and Container Engine - https://codelabs.developers.google.com/codelabs/cloud-compute-kubernetes/index.html?index=..%2F..%2Findex#0
| |
| * Process Astronomy Data to Generate Images - https://codelabs.developers.google.com/codelabs/cloud-compute-the-cosmos/index.html?index=..%2F..%2Findex#0
| |
| * Kubernetes for Java apps - https://codelabs.developers.google.com/codelabs/cloud-springboot-kubernetes/index.html?index=..%2F..%2Findex#0
| |
| * Google Cloud Storage - https://codelabs.developers.google.com/codelabs/es003l-storage/index.html?index=..%2F..%2Findex
| |
| * Campaign finance with bigquery - https://codelabs.developers.google.com/codelabs/cloud-bq-campaign-finance/index.html?index=..%2F..%2Findex#0
| |
| * Text processing with big data - https://codelabs.developers.google.com/codelabs/cloud-dataflow-starter/index.html?index=..%2F..%2Findex#0
| |
| * Recommendations ML - https://codelabs.developers.google.com/codelabs/cloud-accelerate-dataproc/index.html?index=..%2F..%2Findex#0
| |
| * Spark + OpenCV - https://codelabs.developers.google.com/codelabs/cloud-dataproc-opencv/index.html?index=..%2F..%2Findex
| |
| * Speech to Text - https://codelabs.developers.google.com/codelabs/cloud-speech-intro/index.html?index=..%2F..%2Findex#0
| |
| * Translate Text - https://codelabs.developers.google.com/codelabs/cloud-translation-intro/index.html?index=..%2F..%2Findex#0
| |
|
| |
|
| Google Qwiklabs:
| | Project 2: [[2018/January/Data Engineering/Big Data Text Processing]] |
| * Google Cloud Platform essentials - https://google.qwiklabs.com/quests/23?locale=en
| |
| * Scientific data processing - https://google.qwiklabs.com/quests/28?locale=en
| |
| * Data engineering - https://google.qwiklabs.com/quests/25?locale=en
| |
|
| |
|
| | Project 3: [[2018/January/Data Engineering/Cosmos]] |
|
| |
|
| ==Flags== | | ==Flags== |