Dataproc

Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning. Cloud Dataproc automation helps you create clusters quickly, manage them easily, and reduce cost by turning them off when you don't need them.

Compared to traditional on-premises solutions or other cloud service offerings, Cloud Dataproc has a number of unique benefits:

  • Low cost — Cloud Dataproc is priced at only 1 cent per virtual CPU in your cluster per hour, on top of the other Cloud Platform resources you use. In addition to this low price, Cloud Dataproc clusters can include preemptible instances that have lower compute prices, reducing your costs even further. Cloud Dataproc charges you only for what you really use with minute-by-minute billing and a low, ten-minute-minimum billing period.
  • Super fast — Without using Cloud Dataproc, it can take from five to 30 minutes to create Spark and Hadoop clusters on-premises or through IaaS providers. By comparison, Cloud Dataproc clusters are quick to start, scale, and shutdown, with each of these operations taking 90 seconds or less, on average.
  • Integrated — Cloud Dataproc has built-in integration with other Google Cloud Platform services, such as BigQuery, Cloud Storage, Cloud Bigtable, Stackdriver Logging, and Stackdriver Monitoring, so you have more than just a Spark or Hadoop cluster—you have a complete data platform. For example, you can use Cloud Dataproc to effortlessly ETL terabytes of raw log data directly into BigQuery for business reporting.
  • Managed — Use Spark and Hadoop clusters without the assistance of an administrator or special software. You can easily interact with clusters and Spark or Hadoop jobs through the Google Cloud Platform Console, the Google Cloud SDK, or the Cloud Dataproc REST API. When you're done with a cluster, you can simply turn it off, so you don’t spend money on an idle cluster. You won’t need to worry about losing data, because Cloud Dataproc is integrated with Cloud Storage, BigQuery, and Cloud Bigtable.
  • Simple and familiar — You don’t need to learn new tools or APIs to use Cloud Dataproc, making it easy to move existing projects into Cloud Dataproc without redevelopment. Spark, Hadoop, Pig, and Hive are frequently updated, so you can be productive faster.

DataProc offers the following open-source compute engines and connectors to other GCP services:Apache Spark 2.0.2

You can access Cloud Dataproc in the following ways:

results matching ""

    No results matching ""