GCDE/Practice Exam 1

Explanation: The most common model, the relational model sorts data into tables, also known as relations, each of which consists of columns and rows. Each column lists an attribute of the entity in question, such as price, zip code, or birth date. Together, the attributes in a relation are called a domain. A particular attribute or combination of attributes is chosen as a primary key that can be referred to in other tables, when it’s called a foreign key. Each row, also called a tuple, includes data about a specific instance of the entity in question, such as a particular employee. The model also accounts for the types of relationships between those tables, including one-to-one, one-to-many, and many-to-many relationships.

Missed Question 2

Q: Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:. Select the two Correct Answers. (Be Careful)

A1: BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two *distinct* values.

A2: BigQuery raises an error if the query result has ARRAYs which contain NULL elements, although such ARRAYs *can* be used inside the query.

Explanation: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types

Missed Question 3

Q: There are three types of data models. Select Three,

A1: Logical

A2: Conceptual

A3: Physical data

Explanation: There are three types of data models – conceptual, logical and physical. The level of complexity and detail increases from conceptual to logical to a physical data model. The conceptual model shows a very basic high level of design while the physical data model shows a very detailed view of design. The conceptual model will be just portraying entity names and entity relationships. Figure 1 shown in the later part of this article depicts a conceptual model. The logical model will be showing up entity names, entity relationships, attributes, primary keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article depicts a logical model. The physical data model will be showing primary keys, foreign keys, table names, column names and column data types. This view actually elaborates how the model will be actually implemented in the database.

Missed Question 4

Q: __________ modeling is one of the methods of data modeling, that help us store the data in such a way that it is relatively easy to retrieve the data from the database.

A: Dimensional

Explanation: Dimensional modeling is one of the methods of data modeling, that help us store the data in such a way that it is relatively easy to retrieve the data from the database.

Missed Question 5

You can update a cluster via a Cloud Dataproc via three different methods. Which method is NOT Correct? Select One.

A: TensorFlow.

Explanation: GCP Console, Google Cloud SDK, and API clusters.patch Request are all valid ways of updating a Dataproc cluster.

Missed Question 6

Which of the following rules DO NOT apply when you use pre-emptible workers with a Cloud Dataproc cluster?

A: Local Disk Only

Explanation: The rules that apply to pre-emptible are (1) no pre-emptible only clusters are allowed, (2) they are processing only, and (3) they have a persistent disk size. They are NOT local disk only - they can access cloud storage, network storage, etc.

See https://cloud.google.com/dataproc/docs/concepts/preemptible-vms

Missed Question 7

Q: When we speak about DataFlow we understand that each __________ of the Pipeline is applied on a PCollection; the result of apply() is another PCollection. Select One:

A: Transform.

Explanation: Each Transform of the Pipeline is applied on a PCollection; the result of apply() is another PCollection

Missed Question 8

Your customer has stated a large-scale analytics engine and highly scalable data processing framework are essential to be able to interactively explore and view large datasets in their application. They would like to use a solution that would be for "parallel processing" of data from both mobile devices sending data to Cloud Pub/Sub and log files from GCP Cloud Storage. What solution in the GCP Solutions portfolio would be a great fit? (Select One)

A: Cloud DataFlow.

Explanation: Cloud DataFlow offers "parallel processing" of both stream and batch data.

Missed Question 9

Q: Based on the model diagram below what type of data model does this closely represents? Select One.

A: Entity.

Explanation: Entity-relationship model - this model captures the relationships between real-world entities much like the network model, but it isn’t as directly tied to the physical structure of the database. Instead, it’s often used for designing a database conceptually. Here, the people, places, and things about which data points are stored are referred to as entities, each of which has certain attributes that together make up their domain. The cardinality, or relationships between entities, are mapped as well.

Missed Question 10

Q: In the early 2000’s Google laid the foundation for Big Data Strategy 1.0:

Design software with failure in mind
Use only commodity components
The cost of twice the amount of capacity should not be considerably more than the cost of twice the amount of hardware
Be consistent

These principles inspired new computation architectures.

What are the three architecture solutions that came out as a result of the Big Data Strategy? (NOTE: Don't get confused with GCPs 2.0 strategy.) Select Three:

A1: MapReduce - a computing paradigm that divides problems into parallelized pieces across a cluster of machines

A2: BigTable - enables structured storage to scale out to multiple servers

A3: GFS - a distributed, cluster-based filesystem, GFS assumes that any disk can fail so data is stored in multiple locations

Explanation: These principles inspired new computation architectures.

Missed Question 11

Q: In BigQuery there are roles called Project Primative roles. What are two characteristics of the Project Primative role? (Select Two)

A1: Roles are assigned by email address.

A2: Project owners can modify project roles

Explanation: Roles are assigned by email address for individual users, for groups, and for service accounts. Project owners can modify project roles, and these privileges are automatically granted to the project creator.

Missed Question 12

You just reviewed the documentation for Cloud DataFlow. The Pipeline is executed on the cloud by a ________ and each step is elastically scaled. Select the proper answer.

A: Runner.

Explanation: The pipeline is executed on the cloud by a Runner, and each step is elastically scaled.

Missed Question 13

Q: You would like a service that automatically schedules recurring data loads from the source application into BigQuery and makes sure you have the latest data at your fingertips. Also, the tool must be flexible and it allows you to update your data with ad-hoc data loads based on user-specified date ranges. What toolset would that be. (Select One)

A: Google BigQuery Data Transfer Service

Explanation: See https://cloud.google.com/data-transfer/

Missed Question 14

Q: Legacy SQL types have an equivalent in standard SQL and vice versa. In some cases, the type has a different name. Which legacy SQL data type and its standard SQL equivalent is significantly different .(Basically, data type name is different in other version.) Select One:

A: RECORD

Explanation: Legacy SQL uses RECORD and Standard SQL uses STRUCT. All other data types listed are same.

Link: https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql

Flags