Cloud Dataproc access control

Dataproc permissions

Cloud Dataproc permissions allow users to perform specific actions on Cloud Dataproc clusters, jobs, and operations. For example, thedataproc.clusters.createpermission allows a user to create Cloud Dataproc clusters in your project. You don't directly give users permissions; instead, you grant themroles, which have one or more permissions bundled within them.

The following tables list the permissions necessary to call Cloud Dataproc APIs (methods):

projects.regions.clusters methods Required permissions
Create dataproc.clusters.create
Get dataproc.clusters.get
List dataproc.clusters.list
Patch dataproc.cluster.update
Delete dataproc.clusters.delete
Diagnose dataproc.clusters.use
projects.regions.jobs Required permissions
Submit dataproc.jobs.create & dataproc.clusters.create
Get dataproc.jobs.get
List dataproc.jobs.list
Cancel dataproc.jobs.cancel
Delete dataproc.clusters.delete
projects.regions.operations Required permissions
Get dataproc.operations.get
List dataproc.operations.list
Delete dataproc.operations.delete

Dataproc roles

IAM roles are a bundle of one or more permission that you grant to users or groups to permit them to perform actions on Cloud Dataproc resources in your project.

The following table lists the Cloud Dataproc IAM roles and the permissions associated with each role:

Cloud Dataproc Role Permissions
Dataproc/Dataproc Editor dataproc.*.create
dataproc.*.get
dataproc.*.list
dataproc.*.delete
dataproc.*.update
dataproc.clusters.use
dataproc.jobs.cancel
compute.machineTypes.get
compute.machineTypes.list
compute.networks.get
compute.networks.list
compute.projects.get
compute.regions.get
compute.regions.list
compute.zones.get
compute.zones.list
Dataproc/Dataproc Viewer dataproc.*.get
dataproc.*.list
compute.machineTypes.get
compute.regions.get
compute.regions.list
compute.zones.get
resourcemanager.projects.get
Dataproc/Dataproc Worker (for service accounts only) dataproc.agents.*
dataproc.tasks.*
logging.logEntries.create
monitoring.metricDescriptors.create
monitoring.metricDescriptors.get
monitoring.metricDescriptors.list
monitoring.monitoredResourceDescriptors.get
monitoring.monitoredResourceDescriptors.list
monitoring.timeSeries.create
storage.buckets.get
storage.objects.create
storage.objects.get
storage.objects.list
storage.objects.update
storage.objects.delete

Notes:

  • "*" signifies "clusters," "jobs," or "operations," except that dataproc.jobs.updateis not currently supported, and the only permissions associated with dataproc.operations.are get, list, and delete.
  • The computepermissions listed above are needed or recommended to create and view Cloud Dataproc clusters when using the Google Cloud Platform Console or the Cloud SDK gcloudcommand-line tool.
  • To allow a user to upload files, grant the Storage Object Creatorrole. To allow a user to view job output, grant the Storage Object Viewerrole. Note that granting either of these Storage roles gives the user the ability to access any bucket in the project.
  • A user must have monitoring.timeSeries.listpermission in order to view graphs on the Google Cloud Platform Console→Dataproc→Cluster details Overview tab.
  • A user must have compute.instances.listpermission in order to view instance status and the master instance SSH menu on the Google Cloud Platform Console→Dataproc→Cluster details VM Instances tab.
  • To create a cluster with a user-specified service account, the specified service account must have all permissions granted by the Dataproc Workerrole.

Project Roles

You can also set permissions at the project level by using the IAMProjectroles. Here is a summary of the permissions associated with IAM Project roles:

Project Role Permissions
Project Viewer All project permissions for read-only actions that preserve state (get, list)
Project Editor All Project Viewer permissions plus all project permissions for actions that modify state (create, delete, update, use, cancel)
Project Owner All Project Editor permissions plus permissions to manage access control for the project (get/set IamPolicy) and to set up project billing

IAM Roles and Cloud Dataproc Operations Summary

The following table summarizes the Cloud Dataproc operations available based on the role granted to the user, with caveats noted.

Project Editor Project Viewer Cloud Dataproc Editor Cloud Dataproc Viewer
Create cluster Yes No Yes No
List clusters Yes Yes Yes Yes
Get cluster details Yes Yes Yes1, 2 Yes1, 2
Update cluster Yes No Yes No
Delete cluster Yes No Yes No
Submit job Yes No Yes3 No
List jobs Yes Yes Yes Yes
Get job details Yes Yes Yes4 Yes4
Cancel job Yes No Yes No
Delete job Yes No Yes No
List operations Yes Yes Yes Yes
Get operation details Yes Yes Yes Yes
Delete operation Yes No Yes

Service accounts

A service account is a special account that can be used by services and applications running on your Compute Engine instance to interact with other GCP APIs.

Applications can use service account credentials to authorize themselves to a set of APIs and perform actions within the permissions granted to the service account and virtual machine instance.

Specifying a user-managed service account when creating a Cloud Dataproc cluster allows you to use that service account for the virtual machines in that cluster. If a service account is not specified, Cloud Dataproc virtual machines will use the default Google-managed Compute Engine service account.

Specifying a user-managed service account when creating a Dataproc cluster allows you to create and utilize clusters with fine-grained access and control to Cloud resources. Using multiple user-managed service accounts with different Cloud Dataproc clusters allows for clusters with different access to Cloud resources.

Important notes
  • Service accounts can only be set when a cluster is created.
  • Once set, the service account used for a cluster cannot be changed.
  • Make sure that service accounts have appropriate scopes and IAM roles for your needs.
  • Service accounts used with Cloud Dataproc must have Dataproc/Dataproc Worker role (or have all the permissions granted by Dataproc Worker role).
  • Compute Engine virtual machines used in Dataproc clusters still need specific access scopes.
Specifying custom service account:

When creating a new cluster, you can use the following gcloud command-line tool argument to set the service account:

--service-account=[SERVICE-ACCOUNT-NAME]@[PROJECT_ID].iam.gserviceaccount.com
The Google Cloud IAM service account to be authenticated as.

When creating a Dataproc cluster through the REST API, specify a service account serviceAccount field of the GceClusterConfig parameter. Custom service accounts need permissions equivalent to the following IAM roles:

  • roles/logging.logWriter
  • roles/storage.objectAdmin
Default Google Compute Engine service account:

When created, Google Compute Engine virtual machines can be configured to use a specific service account. If a service account is not specified, the default Compute Engine service account is used:

[PROJECT_NUMBER][email protected]

When created with the default service account, an instance has automatically the following access scopes:

Default Dataproc service account scopes

To specify the service account scopes for a Dataproc cluster instances, use the serviceAccountScopes[] field of the GceClusterConfig parameter. The following base set of scopes is always included:

If no custom scopes are specified, the following defaults are also provided:

Remember that access scopes are limited to the service to which they apply. For example, if a Cloud Dataproc cluster been granted only the https://www.googleapis.com/auth/storage-full scope for Google Cloud Storage, then it can't use the same scope to make requests to BigQuery.

results matching ""

    No results matching ""