Quizzes

BigQuery and Dataflow

I want to query a table, then query within the results of that query. Which of these is the BEST way to do this?

  • [ ] Write both queries in the BigQuery console. BigQuery automatically runs the second query on the results of the first.
  • [ ] Run the first query and export the results to a new BigQuery table. Then, run the second query on the exported table.
  • [ ] Use a subquery of the form SELECT ... FROM (SELECT …) …
  • [ ] Run the first query and save the result into a Pandas Dataframe. Then, slice the Dataframe.

Which of the following statements are true? (Select all 2 correct responses)

  • [ ] Dataflow executes Apache Beam pipelines
  • [ ] Dataflow transforms support both batch and streaming pipelines
  • [ ] Side-inputs in Dataflow are a way to export data from one pipeline to share with another pipeline
  • [ ] Map operations in a MapReduce can be performed by Combine transforms in Dataflow

Which of the following statements are true? (Select all 2 correct responses)

  • [ ] Dataflow executes Apache Beam pipelines
  • [ ] Side-inputs in Dataflow are a way to export data from one pipeline to share with another pipeline
  • [ ] Map operations in a MapReduce can be performed by Combine transforms in Dataflow
  • [ ] Dataflow transforms support both batch and streaming pipelines

Match each of the Dataflow terms with what they do in the life of a dataflow job:

Term Definition
__ 1. Transform A. Output endpoint for your pipeline
__ 2. PCollection B. A data processing operation or step in your pipeline
__ 3. Sink C. A set of data in your pipeline
  1. [ ]
  2. B
  3. C
  4. A

  5. [ ]

  6. C
  7. B
  8. A

  9. [ ]

  10. A
  11. C
  12. B

  13. [ ]

  14. B
  15. A
  16. C

Dataproc

Which of the following statements are true about Cloud Dataproc? (Select all 2 correct answers)

Lets you run Spark and Hadoop clusters with minimal administration

Streamlined API for Spark and Hadoop programming

Helps you create job-specific clusters without HDFS

Match each of the terms with what they do when setting up clusters in Cloud Dataproc:

Term Definition
__ 1. Zone A. Costs less but may not be available always
__ 2. Standard Cluster mode B. Determines the Google data center where compute nodes will be
__ 3. Preemptible C. Provides 1 master and N workers
  • [ ]
  • A
  • B
  • C

  • [ ]

  • B
  • C
  • A

  • [ ]

  • C
  • A
  • B

  • [ ]

  • C
  • B
  • A

Cloud Dataproc provides the ability for Spark programs to separate compute & storage by:

  1. [ ] Reading and writing data directory from/to GCS
  2. [ ] Pre-copying data from GCS to persistent disk on cluster startup
  3. [ ] Mirroring data on both GCS and HDFS
  4. [ ] Setting individual zones for compute and storage

Which of the following will you typically NOT use an initialization action script for?

  • [ ] Copy over custom configuration files to the cluster
  • [ ] Install software libraries on the master
  • [ ] Install software libraries on the worker
  • [ ] Change the number of workers in the cluster

Streaming analytics #{streaming}

Dataflow offers the following that makes it easy to create resilient streaming pipelines when working with unbounded data (Select all 2 correct responses)

  • [ ] Ability to flexibly reason about time
  • [ ] Controls to ensure correctness
  • [ ] Global message bus to buffer messages
  • [ ] SQL support to query in-process results

Match the GCP product with its role when designing streaming systems

Product Role
__ 1. Pub/Sub A. Controls to handle late-arriving and out-of-order data
__ 2. Dataflow B. Global message queue
__ 3. BigQuery C. Latency in the order of milliseconds when querying against overwhelming volume
__ 4. Bigtable D. Query data as it arrives from streaming pipelines
  • [ ]
  • A
  • B
  • D
  • C

  • [ ]

  • B
  • A
  • D
  • C

  • [ ]

  • C
  • A
  • D
  • B

  • [ ]

  • D
  • A
  • B
  • C

Which of the following about Cloud Pub/Sub is NOT true?

  • [ ] Pub/Sub simplifies systems by removing the need for every component to speak to every component
  • [ ] Pub/Sub connects applications and services through a messaging infrastructure
  • [ ] Pub/Sub stores your messages indefinitely until you request it

True or False?

Cloud Pub/Sub guarantees that messages delivered are in the order they were received

  • [ ] True
  • [ ] False

Which of the following about Cloud Pub/Sub topics and subscriptions are true? (Select all 2 correct responses)

  • [ ] 1 or more publisher(s) can write to the same topic
  • 1 or more subscriber(s) can request from the same subscription
  • Each topic will deliver ALL messages for a topic for each subscriber
  • Each topic MUST have at least 1 subscription

Which of the following delivery methods is ideal for subscribers needing close to real time performance?

  • [ ] Pull Delivery
  • [ ] Push Delivery

The Dataflow models provides constructs that map to the four questions that are relevant in any out-of-order data processing pipeline:

Questions Constructs
__ 1. What results are calculated? A. Answered via Event-time windowing
__ 2. Where in event time are results calculated? B. Answered via transformations
__ 3. When in processing time are results materialized? C. Answered via Accumulation modes
__ 4. How do refinements of results relate? D. Answered via Watermarks, triggers, and allowed lateness.
  • [ ]
  • A
  • D
  • C
  • B

  • [ ]

  • B
  • A
  • D
  • C

  • [ ]

  • C
  • A
  • D
  • B

  • [ ]

  • D
  • B
  • A
  • C

Which of the following delivery methods is ideal for subscribers needing close to real time performance?

  • [ ] Pull Delivery
  • [ ] Push Delivery

Which of the following are true about Cloud Bigtable? (Mark all 3 correct responses)

  • [ ] Offers very low-latency in the order of milliseconds
  • [ ] Ideal for >1TB data
  • [ ] Great for time-series data
  • [ ] Support for SQL

Which of the following are true about Cloud Bigtable? (Mark all 3 correct responses)

  • [ ] Offers very low-latency in the order of milliseconds
  • [ ] Ideal for >1TB data
  • [ ] Great for time-series data
  • [ ] Support for SQL

True or False?

Cloud Bigtable learns access patterns and attempts to distribute reads and storage across nodes evenly

  • [ ] True
  • [ ] False

Which of the following can help improve performance of Bigtable? (Select all 3 correct responses)

  • [ ] Change schema to minimize data skew
  • [ ] Clients and Bigtable are in same zone
  • [ ] Use HDD instead of SDD
  • [ ] Add more nodes

Cloud Machine Learning

  • What are the differences between precision and recall

Which (one) of these is NOT a good use case for a ML API?

  • [ ] Read scanned receipts
  • [ ] Transcribe support conversations
  • [ ] Identify images where your product is shown upside down
  • [ ] Identify scenes in a video library where there are aircraft

Finish this sentence:

Machine Learning is a way to derive insights from data by adjusting weights...

  • [ ] on a model function so outputs are close to labels.
  • [ ] of different rules to predict outcome.
  • [ ] of example inputs so that the total is equal to the output.
  • [ ] of training data outputs to predict the most likely one.

Which of these is a machine learning problem where the outcome to be predicted is a continuous number?

  • [ ] Clustering
  • [ ] Regression
  • [ ] Classification
  • [ ] Logistic regression

What is the role of a neuron in a neural network?

  • [ ] Combine its inputs to map part of a decision surface
  • [ ] Compute the softmax of the expected output
  • [ ] Normalize the input variables to lie within a certain range
  • [ ] Perform gradient descent

Which of the following definitions are true? (Select all 2 of the correct responses)

  • [ ] Batch is a small set of examples on which gradient is computed
  • [ ] Gradient descent is a form of evaluating the performance of a ML model
  • [ ] Epoch refers to one complete pass through the training dataset
  • [ ] Feature engineering is how ML models learn complex data

TensorFlow is:

  • [ ] A fully managed machine learning service
  • [ ] A software framework for writing portable ML code
  • [ ] A hardware framework for executing ML model
  • [ ] A software framework for data processing

In tf.add(a,b), which one of these is a legal value for a?

  • [ ] np.array([5,3,8])
  • [ ] tf.constant([5,3,8])
  • [ ] tf.Session()
  • [ ] tf.constant([‘hello’, ’world’])

Which of these is a class that will do logistic regression?

  • [ ] LinearRegressor
  • [ ] LinearClassifier
  • [ ] DNNRegressor
  • [ ] DNNClassifier

Why is TextLineReader an efficient way to read data into TensorFlow?

  • [ ] It feeds data directly into the optimizer
  • [ ] It reads data directly into the graph
  • [ ] It caches data between epochs
  • [ ] It integrates well with BigQuery

The training data used in machine learning can often be enhanced by extraction of features from the raw data collected. This is referred to as:

  • [x] Feature Engineering
  • [ ] Feature Selection
  • [ ] Hyper parameter Tuning
  • [ ] Feature Mining

Which of these is a way of encoding categorical data ?

  • [ ] layers.sparse_column_with_keys()
  • [ ] layers.real_valued_column()
  • [ ] layers.crossed_column()
  • [ ] layers.bucketized_column()

Which of these is a way of discretizing a continuous variable?

  • [ ] layers.sparse_column_with_keys()
  • [ ] layers.real_valued_column()
  • [ ] layers.crossed_column()
  • [ ] layers.bucketized_column()

Cloud ML Engine: (Select all 2 correct responses)

  • [ ] Lets you train your TensorFlow machine learning models at scale.
  • [ ] Hosts trained models to make predictions
  • [ ] Hosts pretrained ML models for common use cases
  • [ ] Requires Cloud Datalab in order to run

In a model to classify X-ray images of legs as “broken” or “not broken”, which of these would normally be considered a hyperparameter? (Select all 2 correct responses)

  • [ ] Pixel values from the image
  • [ ] Age of patient
  • [ ] Number of layers in neural network
  • [ ] Number of gray levels into which to quantize image values

results matching ""

    No results matching ""