Quizzes

BigQuery and Dataflow

I want to query a table, then query within the results of that query. Which of these is the BEST way to do this?

[ ] Write both queries in the BigQuery console. BigQuery automatically runs the second query on the results of the first.
[ ] Run the first query and export the results to a new BigQuery table. Then, run the second query on the exported table.
[ ] Use a subquery of the form SELECT ... FROM (SELECT …) …
[ ] Run the first query and save the result into a Pandas Dataframe. Then, slice the Dataframe.

Which of the following statements are true? (Select all 2 correct responses)

[ ] Dataflow executes Apache Beam pipelines
[ ] Dataflow transforms support both batch and streaming pipelines
[ ] Side-inputs in Dataflow are a way to export data from one pipeline to share with another pipeline
[ ] Map operations in a MapReduce can be performed by Combine transforms in Dataflow

Which of the following statements are true? (Select all 2 correct responses)

[ ] Dataflow executes Apache Beam pipelines
[ ] Side-inputs in Dataflow are a way to export data from one pipeline to share with another pipeline
[ ] Map operations in a MapReduce can be performed by Combine transforms in Dataflow
[ ] Dataflow transforms support both batch and streaming pipelines

Match each of the Dataflow terms with what they do in the life of a dataflow job:

Term	Definition
__ 1. Transform	A. Output endpoint for your pipeline
__ 2. PCollection	B. A data processing operation or step in your pipeline
__ 3. Sink	C. A set of data in your pipeline

Dataproc

Which of the following statements are true about Cloud Dataproc? (Select all 2 correct answers)

Lets you run Spark and Hadoop clusters with minimal administration

Streamlined API for Spark and Hadoop programming

Helps you create job-specific clusters without HDFS

Match each of the terms with what they do when setting up clusters in Cloud Dataproc:

Term	Definition
__ 1. Zone	A. Costs less but may not be available always
__ 2. Standard Cluster mode	B. Determines the Google data center where compute nodes will be
__ 3. Preemptible	C. Provides 1 master and N workers

Cloud Dataproc provides the ability for Spark programs to separate compute & storage by:

[ ] Reading and writing data directory from/to GCS
[ ] Pre-copying data from GCS to persistent disk on cluster startup
[ ] Mirroring data on both GCS and HDFS
[ ] Setting individual zones for compute and storage

Which of the following will you typically NOT use an initialization action script for?

[ ] Copy over custom configuration files to the cluster
[ ] Install software libraries on the master
[ ] Install software libraries on the worker
[ ] Change the number of workers in the cluster

Streaming analytics #{streaming}

Dataflow offers the following that makes it easy to create resilient streaming pipelines when working with unbounded data (Select all 2 correct responses)

[ ] Ability to flexibly reason about time
[ ] Controls to ensure correctness
[ ] Global message bus to buffer messages
[ ] SQL support to query in-process results

Match the GCP product with its role when designing streaming systems

Product	Role
__ 1. Pub/Sub	A. Controls to handle late-arriving and out-of-order data
__ 2. Dataflow	B. Global message queue
__ 3. BigQuery	C. Latency in the order of milliseconds when querying against overwhelming volume
__ 4. Bigtable	D. Query data as it arrives from streaming pipelines

Which of the following about Cloud Pub/Sub is NOT true?

[ ] Pub/Sub simplifies systems by removing the need for every component to speak to every component
[ ] Pub/Sub connects applications and services through a messaging infrastructure
[ ] Pub/Sub stores your messages indefinitely until you request it

True or False?

Cloud Pub/Sub guarantees that messages delivered are in the order they were received

[ ] True
[ ] False

Which of the following about Cloud Pub/Sub topics and subscriptions are true? (Select all 2 correct responses)

[ ] 1 or more publisher(s) can write to the same topic
1 or more subscriber(s) can request from the same subscription
Each topic will deliver ALL messages for a topic for each subscriber
Each topic MUST have at least 1 subscription

Which of the following delivery methods is ideal for subscribers needing close to real time performance?

[ ] Pull Delivery
[ ] Push Delivery

The Dataflow models provides constructs that map to the four questions that are relevant in any out-of-order data processing pipeline:

Questions	Constructs
__ 1. What results are calculated?	A. Answered via Event-time windowing
__ 2. Where in event time are results calculated?	B. Answered via transformations
__ 3. When in processing time are results materialized?	C. Answered via Accumulation modes
__ 4. How do refinements of results relate?	D. Answered via Watermarks, triggers, and allowed lateness.

Which of the following delivery methods is ideal for subscribers needing close to real time performance?

[ ] Pull Delivery
[ ] Push Delivery

Which of the following are true about Cloud Bigtable? (Mark all 3 correct responses)

[ ] Offers very low-latency in the order of milliseconds
[ ] Ideal for >1TB data
[ ] Great for time-series data
[ ] Support for SQL

Which of the following are true about Cloud Bigtable? (Mark all 3 correct responses)

[ ] Offers very low-latency in the order of milliseconds
[ ] Ideal for >1TB data
[ ] Great for time-series data
[ ] Support for SQL

True or False?

Cloud Bigtable learns access patterns and attempts to distribute reads and storage across nodes evenly

[ ] True
[ ] False

Which of the following can help improve performance of Bigtable? (Select all 3 correct responses)

[ ] Change schema to minimize data skew
[ ] Clients and Bigtable are in same zone
[ ] Use HDD instead of SDD
[ ] Add more nodes

Cloud Machine Learning

What are the differences between precision and recall

Which (one) of these is NOT a good use case for a ML API?

[ ] Read scanned receipts
[ ] Transcribe support conversations
[ ] Identify images where your product is shown upside down
[ ] Identify scenes in a video library where there are aircraft

Finish this sentence:

Machine Learning is a way to derive insights from data by adjusting weights...

[ ] on a model function so outputs are close to labels.
[ ] of different rules to predict outcome.
[ ] of example inputs so that the total is equal to the output.
[ ] of training data outputs to predict the most likely one.

Which of these is a machine learning problem where the outcome to be predicted is a continuous number?

[ ] Clustering
[ ] Regression
[ ] Classification
[ ] Logistic regression

What is the role of a neuron in a neural network?

[ ] Combine its inputs to map part of a decision surface
[ ] Compute the softmax of the expected output
[ ] Normalize the input variables to lie within a certain range
[ ] Perform gradient descent

Which of the following definitions are true? (Select all 2 of the correct responses)

[ ] Batch is a small set of examples on which gradient is computed
[ ] Gradient descent is a form of evaluating the performance of a ML model
[ ] Epoch refers to one complete pass through the training dataset
[ ] Feature engineering is how ML models learn complex data

TensorFlow is:

[ ] A fully managed machine learning service
[ ] A software framework for writing portable ML code
[ ] A hardware framework for executing ML model
[ ] A software framework for data processing

In tf.add(a,b), which one of these is a legal value for a?

[ ] np.array([5,3,8])
[ ] tf.constant([5,3,8])
[ ] tf.Session()
[ ] tf.constant([‘hello’, ’world’])

Which of these is a class that will do logistic regression?

[ ] LinearRegressor
[ ] LinearClassifier
[ ] DNNRegressor
[ ] DNNClassifier

Why is TextLineReader an efficient way to read data into TensorFlow?

[ ] It feeds data directly into the optimizer
[ ] It reads data directly into the graph
[ ] It caches data between epochs
[ ] It integrates well with BigQuery

The training data used in machine learning can often be enhanced by extraction of features from the raw data collected. This is referred to as:

[x] Feature Engineering
[ ] Feature Selection
[ ] Hyper parameter Tuning
[ ] Feature Mining

Which of these is a way of encoding categorical data ?

[ ] layers.sparse_column_with_keys()
[ ] layers.real_valued_column()
[ ] layers.crossed_column()
[ ] layers.bucketized_column()

Which of these is a way of discretizing a continuous variable?

[ ] layers.sparse_column_with_keys()
[ ] layers.real_valued_column()
[ ] layers.crossed_column()
[ ] layers.bucketized_column()

Cloud ML Engine: (Select all 2 correct responses)

[ ] Lets you train your TensorFlow machine learning models at scale.
[ ] Hosts trained models to make predictions
[ ] Hosts pretrained ML models for common use cases
[ ] Requires Cloud Datalab in order to run

In a model to classify X-ray images of legs as “broken” or “not broken”, which of these would normally be considered a hyperparameter? (Select all 2 correct responses)

[ ] Pixel values from the image
[ ] Age of patient
[ ] Number of layers in neural network
[ ] Number of gray levels into which to quantize image values

BigQuery and Dataflow

Quizzes

BigQuery and Dataflow

Dataproc

Streaming analytics #{streaming}

Cloud Machine Learning

results matching ""

No results matching ""