Quizzes
BigQuery and Dataflow
I want to query a table, then query within the results of that query. Which of these is the BEST way to do this?
- [ ] Write both queries in the BigQuery console. BigQuery automatically runs the second query on the results of the first.
- [ ] Run the first query and export the results to a new BigQuery table. Then, run the second query on the exported table.
- [ ] Use a subquery of the form SELECT ... FROM (SELECT …) …
- [ ] Run the first query and save the result into a Pandas Dataframe. Then, slice the Dataframe.
Which of the following statements are true? (Select all 2 correct responses)
- [ ] Dataflow executes Apache Beam pipelines
- [ ] Dataflow transforms support both batch and streaming pipelines
- [ ] Side-inputs in Dataflow are a way to export data from one pipeline to share with another pipeline
- [ ] Map operations in a MapReduce can be performed by Combine transforms in Dataflow
Which of the following statements are true? (Select all 2 correct responses)
- [ ] Dataflow executes Apache Beam pipelines
- [ ] Side-inputs in Dataflow are a way to export data from one pipeline to share with another pipeline
- [ ] Map operations in a MapReduce can be performed by Combine transforms in Dataflow
- [ ] Dataflow transforms support both batch and streaming pipelines
Match each of the Dataflow terms with what they do in the life of a dataflow job:
Term | Definition |
---|---|
__ 1. Transform | A. Output endpoint for your pipeline |
__ 2. PCollection | B. A data processing operation or step in your pipeline |
__ 3. Sink | C. A set of data in your pipeline |
- [ ]
- B
- C
A
[ ]
- C
- B
A
[ ]
- A
- C
B
[ ]
- B
- A
- C
Dataproc
Which of the following statements are true about Cloud Dataproc? (Select all 2 correct answers)
Lets you run Spark and Hadoop clusters with minimal administration
Streamlined API for Spark and Hadoop programming
Helps you create job-specific clusters without HDFS
Match each of the terms with what they do when setting up clusters in Cloud Dataproc:
Term | Definition |
---|---|
__ 1. Zone | A. Costs less but may not be available always |
__ 2. Standard Cluster mode | B. Determines the Google data center where compute nodes will be |
__ 3. Preemptible | C. Provides 1 master and N workers |
- [ ]
- A
- B
C
[ ]
- B
- C
A
[ ]
- C
- A
B
[ ]
- C
- B
- A
Cloud Dataproc provides the ability for Spark programs to separate compute & storage by:
- [ ] Reading and writing data directory from/to GCS
- [ ] Pre-copying data from GCS to persistent disk on cluster startup
- [ ] Mirroring data on both GCS and HDFS
- [ ] Setting individual zones for compute and storage
Which of the following will you typically NOT use an initialization action script for?
- [ ] Copy over custom configuration files to the cluster
- [ ] Install software libraries on the master
- [ ] Install software libraries on the worker
- [ ] Change the number of workers in the cluster
Streaming analytics #{streaming}
Dataflow offers the following that makes it easy to create resilient streaming pipelines when working with unbounded data (Select all 2 correct responses)
- [ ] Ability to flexibly reason about time
- [ ] Controls to ensure correctness
- [ ] Global message bus to buffer messages
- [ ] SQL support to query in-process results
Match the GCP product with its role when designing streaming systems
Product | Role |
---|---|
__ 1. Pub/Sub | A. Controls to handle late-arriving and out-of-order data |
__ 2. Dataflow | B. Global message queue |
__ 3. BigQuery | C. Latency in the order of milliseconds when querying against overwhelming volume |
__ 4. Bigtable | D. Query data as it arrives from streaming pipelines |
- [ ]
- A
- B
- D
C
[ ]
- B
- A
- D
C
[ ]
- C
- A
- D
B
[ ]
- D
- A
- B
- C
Which of the following about Cloud Pub/Sub is NOT true?
- [ ] Pub/Sub simplifies systems by removing the need for every component to speak to every component
- [ ] Pub/Sub connects applications and services through a messaging infrastructure
- [ ] Pub/Sub stores your messages indefinitely until you request it
True or False?
Cloud Pub/Sub guarantees that messages delivered are in the order they were received
- [ ] True
- [ ] False
Which of the following about Cloud Pub/Sub topics and subscriptions are true? (Select all 2 correct responses)
- [ ] 1 or more publisher(s) can write to the same topic
- 1 or more subscriber(s) can request from the same subscription
- Each topic will deliver ALL messages for a topic for each subscriber
- Each topic MUST have at least 1 subscription
Which of the following delivery methods is ideal for subscribers needing close to real time performance?
- [ ] Pull Delivery
- [ ] Push Delivery
The Dataflow models provides constructs that map to the four questions that are relevant in any out-of-order data processing pipeline:
Questions | Constructs |
---|---|
__ 1. What results are calculated? | A. Answered via Event-time windowing |
__ 2. Where in event time are results calculated? | B. Answered via transformations |
__ 3. When in processing time are results materialized? | C. Answered via Accumulation modes |
__ 4. How do refinements of results relate? | D. Answered via Watermarks, triggers, and allowed lateness. |
- [ ]
- A
- D
- C
B
[ ]
- B
- A
- D
C
[ ]
- C
- A
- D
B
[ ]
- D
- B
- A
- C
Which of the following delivery methods is ideal for subscribers needing close to real time performance?
- [ ] Pull Delivery
- [ ] Push Delivery
Which of the following are true about Cloud Bigtable? (Mark all 3 correct responses)
- [ ] Offers very low-latency in the order of milliseconds
- [ ] Ideal for >1TB data
- [ ] Great for time-series data
- [ ] Support for SQL
Which of the following are true about Cloud Bigtable? (Mark all 3 correct responses)
- [ ] Offers very low-latency in the order of milliseconds
- [ ] Ideal for >1TB data
- [ ] Great for time-series data
- [ ] Support for SQL
True or False?
Cloud Bigtable learns access patterns and attempts to distribute reads and storage across nodes evenly
- [ ] True
- [ ] False
Which of the following can help improve performance of Bigtable? (Select all 3 correct responses)
- [ ] Change schema to minimize data skew
- [ ] Clients and Bigtable are in same zone
- [ ] Use HDD instead of SDD
- [ ] Add more nodes
Cloud Machine Learning
- What are the differences between precision and recall
Which (one) of these is NOT a good use case for a ML API?
- [ ] Read scanned receipts
- [ ] Transcribe support conversations
- [ ] Identify images where your product is shown upside down
- [ ] Identify scenes in a video library where there are aircraft
Finish this sentence:
Machine Learning is a way to derive insights from data by adjusting weights...
- [ ] on a model function so outputs are close to labels.
- [ ] of different rules to predict outcome.
- [ ] of example inputs so that the total is equal to the output.
- [ ] of training data outputs to predict the most likely one.
Which of these is a machine learning problem where the outcome to be predicted is a continuous number?
- [ ] Clustering
- [ ] Regression
- [ ] Classification
- [ ] Logistic regression
What is the role of a neuron in a neural network?
- [ ] Combine its inputs to map part of a decision surface
- [ ] Compute the softmax of the expected output
- [ ] Normalize the input variables to lie within a certain range
- [ ] Perform gradient descent
Which of the following definitions are true? (Select all 2 of the correct responses)
- [ ] Batch is a small set of examples on which gradient is computed
- [ ] Gradient descent is a form of evaluating the performance of a ML model
- [ ] Epoch refers to one complete pass through the training dataset
- [ ] Feature engineering is how ML models learn complex data
TensorFlow is:
- [ ] A fully managed machine learning service
- [ ] A software framework for writing portable ML code
- [ ] A hardware framework for executing ML model
- [ ] A software framework for data processing
In tf.add(a,b)
, which one of these is a legal value for a
?
- [ ]
np.array([5,3,8])
- [ ]
tf.constant([5,3,8])
- [ ]
tf.Session()
- [ ]
tf.constant([‘hello’, ’world’])
Which of these is a class that will do logistic regression?
- [ ] LinearRegressor
- [ ] LinearClassifier
- [ ] DNNRegressor
- [ ] DNNClassifier
Why is TextLineReader an efficient way to read data into TensorFlow?
- [ ] It feeds data directly into the optimizer
- [ ] It reads data directly into the graph
- [ ] It caches data between epochs
- [ ] It integrates well with BigQuery
The training data used in machine learning can often be enhanced by extraction of features from the raw data collected. This is referred to as:
- [x] Feature Engineering
- [ ] Feature Selection
- [ ] Hyper parameter Tuning
- [ ] Feature Mining
Which of these is a way of encoding categorical data ?
- [ ] layers.sparse_column_with_keys()
- [ ] layers.real_valued_column()
- [ ] layers.crossed_column()
- [ ] layers.bucketized_column()
Which of these is a way of discretizing a continuous variable?
- [ ] layers.sparse_column_with_keys()
- [ ] layers.real_valued_column()
- [ ] layers.crossed_column()
- [ ] layers.bucketized_column()
Cloud ML Engine: (Select all 2 correct responses)
- [ ] Lets you train your TensorFlow machine learning models at scale.
- [ ] Hosts trained models to make predictions
- [ ] Hosts pretrained ML models for common use cases
- [ ] Requires Cloud Datalab in order to run
In a model to classify X-ray images of legs as “broken” or “not broken”, which of these would normally be considered a hyperparameter? (Select all 2 correct responses)
- [ ] Pixel values from the image
- [ ] Age of patient
- [ ] Number of layers in neural network
- [ ] Number of gray levels into which to quantize image values