Revision as of 22:35, 19 September 2017

Notes

Interesting question: why would Google be in the business of cloud computing?

Mission statement: to organize the world's information and make it accessible

Reason for being in cloud computing is, need to have massive amount of infrastructure in order to organize info and make it accessible

1 out of every 5 CPUs that is produced in the world is bought by Google

Organizing information:

GFS and Hadoop:

GFS (2002) was originally idea for organizing lots of files/information across large clusters, which in turn led to Hadoop HDFS (which is based on GFS)
MapReduce came out of Google around 2004
But, by 2006, Google was no longer writing any MapReduce programs
Why?
MapReduce and HDFS require sharding - distributing your data set across a cluster - which means that the size of your data sets and the size of your cluster are intimately linked

Google Data Technologies:

Various innovations coming out of Google are being released into Google Cloud

Elastic computing concept - you should be able to "instantaneously" scale out to as many machines as you need

Purpose of switching to the cloud:

Uptime, keeping hardware up and running
Making teams more efficient and effective
Having the entire Google data stack available to leverage the best software available

Spotify uses two products: PubSub and Dataflow PubSub is a messaging system, Dataflow is a data pipeline tool

Using GCP big data products helps companies:

BigQuery: reducing 2.2 BILLION items to 20K items in <1 min (transformational promise of the cloud)

A functional view:

Why the forked approach? Google is trying to solve SEVERAL DIFFERENT problems

Changing where people are computing

Keep doing the same things you're doing already, but changing where you're doing them
Each tool addresses different things that people are already doing on-premises (and would not require a change in CODE, just a change in LOCATION)
Cloud databases - (migrating DBs) Cloud SQL (relational databases, key-value databases, NoSQL databases), Cloud Datastore, Cloud BigTable
Storage platform - (migrating storage) Cloud Storage Standard, Durable Reduced Availability
Managed Hadoop/Spark/Pig/Hive - (migrating data processing) Cloud Dataproc

Providing speed, scalability, and reliability:

Want to provide scalable and reliable services (like Spotify)
Need to be able to justify using hundreds of machines for a few minutes, rather than a smaller number of machines that take much, much longer
Messaging - Cloud PubSub
Data Processing - Cloud Dataflow, Cloud Dataproc

Changing how computation is done:

Utilizing tools provided by Google to do new things, analyze more data, analyze in a different way, build better models
Examples: analyze customer behavior, analyze factory floors
There are basically three use-cases that typically play out
Data exploration and business intelligence - Cloud Datalab, Cloud Data Studio
Data Warehouse for large-scale analytics - Google BigQuery
Machine learning - Cloud Machine Learning, Vision API, Speech API, Translate API

Summary: three principal use-cases for GCP

@@ Line 109: / Line 109: @@
 * Scale up and reliability - making a service more scalable/reliable
 * Transforming business - adding new ways to deal with more data
+[[Category:Google Cloud]]
+[[Category:Data Engineering]]