Google Cloud
From charlesreid1
Notes for google cloud data engineer certification.
The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic
The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.
Goals and Motivation
Goals:
- Implement real-time inventory tracking system that tracks locations
- Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
- Predict delays in shipments
Requirements:
- Reliable, reproducible environment that scales
- Aggregated data in centralized data lake
- Historical data used to perform predictive analytics on future shipments
- Accurate tracking of worldwide shipments (proprietary technology)
- Improvement of business agility and speed of innovation via rapid provisioning of new resources
- Analysis and optimization for performance in the cloud
- Migration to cloud, if all other requirements met
Deeper reasoning:
- Inability to upgrade infrastructure hampering growth and efficiency
- Ineffective at moving data around
- Need to better understand where/who customers are, what they are shipping
- IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology
- Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line
Technology Stack
Databases:
- SQL DB storing user data, static data
- Cassandra DB storing metadata, tracking messages
- Kafka servers tracking message aggregation and batch insert
Applications:
- Customer frontend, middleware for orders and customs
- Tomcat for Java services
- Nginx for static content
- Batch servers (?)
Storage:
- iSCSI (internet small-computer-system interface) to manage VM hosts
- Fiber channel network for SQL server storage
- NAS (network attached storage) for image storage, logs, and backups
Analytics:
Miscellaneous servers:
- Jenkins
- Monitoring of servers
- Bastion hosts
- Security scanners
- Billing software