Google Cloud: Difference between revisions
From charlesreid1
| Line 1: | Line 1: | ||
Notes for google cloud data engineer certification. | Notes for google cloud data engineer certification. | ||
The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic | The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic | ||
The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships. | The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships. | ||
==Goals and Motivation== | |||
Goals: | Goals: | ||
| Line 28: | Line 28: | ||
* Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line | * Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line | ||
==Technology Stack== | |||
Databases: | Databases: | ||
| Line 37: | Line 37: | ||
Applications: | Applications: | ||
* Customer frontend, middleware for orders and customs | * Customer frontend, middleware for orders and customs | ||
* Tomcat for Java services | * [[Tomcat]] for Java services | ||
* Nginx for static content | * [[Nginx]] for static content | ||
* Batch servers (?) | * Batch servers (?) | ||
| Line 47: | Line 47: | ||
Analytics: | Analytics: | ||
* Hadoop/Spark servers | * [[Hadoop]]/[[Spark]] servers | ||
* Core data lake | * Core data lake | ||
* Data analysis workloads | * Data analysis workloads | ||
Miscellaneous servers: | Miscellaneous servers: | ||
* Jenkins | * [[Jenkins]] | ||
* Monitoring of servers | * Monitoring of servers | ||
* Bastion hosts | * Bastion hosts | ||
* Security scanners | * Security scanners | ||
* Billing software | * Billing software | ||
Revision as of 00:04, 12 September 2017
Notes for google cloud data engineer certification.
The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic
The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.
Goals and Motivation
Goals:
- Implement real-time inventory tracking system that tracks locations
- Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
- Predict delays in shipments
Requirements:
- Reliable, reproducible environment that scales
- Aggregated data in centralized data lake
- Historical data used to perform predictive analytics on future shipments
- Accurate tracking of worldwide shipments (proprietary technology)
- Improvement of business agility and speed of innovation via rapid provisioning of new resources
- Analysis and optimization for performance in the cloud
- Migration to cloud, if all other requirements met
Deeper reasoning:
- Inability to upgrade infrastructure hampering growth and efficiency
- Ineffective at moving data around
- Need to better understand where/who customers are, what they are shipping
- IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology
- Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line
Technology Stack
Databases:
- SQL DB storing user data, static data
- Cassandra DB storing metadata, tracking messages
- Kafka servers tracking message aggregation and batch insert
Applications:
- Customer frontend, middleware for orders and customs
- Tomcat for Java services
- Nginx for static content
- Batch servers (?)
Storage:
- iSCSI (internet small-computer-system interface) to manage VM hosts
- Fiber channel network for SQL server storage
- NAS (network attached storage) for image storage, logs, and backups
Analytics:
Miscellaneous servers:
- Jenkins
- Monitoring of servers
- Bastion hosts
- Security scanners
- Billing software