From charlesreid1

Revision as of 01:11, 24 October 2017 by Admin (talk | contribs) (→‎Installing)

What is it

BigQuery is a serverless data warehouse solution from Google Cloud. It provides petabyte-scale, column-based storage with latency on the order of seconds, which can be queried using SQL. It provides a very flexible warehouse solution that can be used as a source or a sink for all manner of data pipelines.

Installing

Gcloud

BigQuery Client Library

There is a long list of client libraries for Google Cloud provided here: https://cloud.google.com/apis/docs/cloud-client-libraries

The Python API bundles each component separately, and not everything comes with the client library by default. For example, if you want to use BigQuery, you have to install the BigQuery API components. If you want to use PubSub, you have to install the PubSub API components. Installing one does not necessarily install the other.

Python API

To use BigQuery from Python, you need to install the Google Cloud Python API, plus BigQuery bindings. Use pip:

$ pip3 install --upgrade google-cloud-bigquery

Link/reference: https://cloud.google.com/bigquery/docs/reference/libraries

Also see: https://github.com/GoogleCloudPlatform/google-cloud-python

Specifically: https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/bigquery

Using

Using from gcloud

Using from Python

Resources

The Google Cloud podcast has a nice episode talking about BigQuery under the hood: https://www.gcppodcast.com/post/episode-94-big-query-under-the-hood-with-tino-tereshko-and-jordan-tigani/

Flags