Google Cloud: Difference between revisions
From charlesreid1
No edit summary |
|||
| Line 20: | Line 20: | ||
* Analysis and optimization for performance in the cloud | * Analysis and optimization for performance in the cloud | ||
* Migration to cloud, if all other requirements met | * Migration to cloud, if all other requirements met | ||
Deeper reasoning: | |||
* Inability to upgrade infrastructure hampering growth and efficiency | |||
* Ineffective at moving data around | |||
* Need to better understand where/who customers are, what they are shipping | |||
* IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology | |||
* Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line | |||
Data center description: | Data center description: | ||
Revision as of 23:59, 11 September 2017
Notes for google cloud data engineer certification.
Technology stack
The following list is based on the sample case study for the GCDE certification exam: https://cloud.google.com/certification/guides/data-engineer/casestudy-flowlogistic
The case study focuses on a logistics company tracking orders and shipments via rail, truck, aircraft, and ships.
Goals:
- Implement real-time inventory tracking system that tracks locations
- Perform data analytics on order and shipment logs (structured/unstructured data) to make decisions about deploying resources, targeting customers, and expanding into markets
- Predict delays in shipments
Requirements:
- Reliable, reproducible environment that scales
- Aggregated data in centralized data lake
- Historical data used to perform predictive analytics on future shipments
- Accurate tracking of worldwide shipments (proprietary technology)
- Improvement of business agility and speed of innovation via rapid provisioning of new resources
- Analysis and optimization for performance in the cloud
- Migration to cloud, if all other requirements met
Deeper reasoning:
- Inability to upgrade infrastructure hampering growth and efficiency
- Ineffective at moving data around
- Need to better understand where/who customers are, what they are shipping
- IT is too busy managing infrastructure to organize data/build analytics/implement tracking technology
- Penalties for late shipments and deliveries translates into direct correlation between profitability and bottom line
Data center description:
Databases:
- SQL DB storing user data, static data
- Cassandra DB storing metadata, tracking messages
- Kafka servers tracking message aggregation and batch insert
Applications:
- Customer frontend, middleware for orders and customs
- Tomcat for Java services
- Nginx for static content
- Batch servers (?)
Storage:
- iSCSI (internet small-computer-system interface) to manage VM hosts
- Fiber channel network for SQL server storage
- NAS (network attached storage) for image storage, logs, and backups
Analytics:
- Hadoop/Spark servers
- Core data lake
- Data analysis workloads
Miscellaneous servers:
- Jenkins
- Monitoring of servers
- Bastion hosts
- Security scanners
- Billing software