Google Cloud/BigQuery
From charlesreid1
What is it
BigQuery is a serverless data warehouse solution from Google Cloud. It provides petabyte-scale, column-based storage with latency on the order of seconds, which can be queried using SQL. It provides a very flexible warehouse solution that can be used as a source or a sink for all manner of data pipelines.
Installing
Gcloud
BigQuery Client Library
There is a long list of client libraries for Google Cloud provided here: https://cloud.google.com/apis/docs/cloud-client-libraries
The Python API bundles each component separately, and not everything comes with the client library by default. For example, if you want to use BigQuery, you have to install the BigQuery API components. If you want to use PubSub, you have to install the PubSub API components. Installing one does not necessarily install the other.
Python API
To use BigQuery from Python, you need to install the Google Cloud Python API, plus BigQuery bindings. Use pip:
$ pip3 install --upgrade google-cloud-bigquery
Link/reference: https://cloud.google.com/bigquery/docs/reference/libraries
Also see: https://github.com/GoogleCloudPlatform/google-cloud-python
Specifically: https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/bigquery
Using
Using from gcloud
Using from Python
Resources
The Google Cloud podcast has a nice episode talking about BigQuery under the hood: https://www.gcppodcast.com/post/episode-94-big-query-under-the-hood-with-tino-tereshko-and-jordan-tigani/