Cloud Dataproc access control
Dataproc permissions
Cloud Dataproc permissions allow users to perform specific actions on Cloud Dataproc clusters, jobs, and operations. For example, thedataproc.clusters.create
permission allows a user to create Cloud Dataproc clusters in your project. You don't directly give users permissions; instead, you grant themroles, which have one or more permissions bundled within them.
The following tables list the permissions necessary to call Cloud Dataproc APIs (methods):
projects.regions.clusters methods | Required permissions |
---|---|
Create | dataproc.clusters.create |
Get | dataproc.clusters.get |
List | dataproc.clusters.list |
Patch | dataproc.cluster.update |
Delete | dataproc.clusters.delete |
Diagnose | dataproc.clusters.use |
projects.regions.jobs | Required permissions |
---|---|
Submit | dataproc.jobs.create & dataproc.clusters.create |
Get | dataproc.jobs.get |
List | dataproc.jobs.list |
Cancel | dataproc.jobs.cancel |
Delete | dataproc.clusters.delete |
projects.regions.operations | Required permissions |
---|---|
Get | dataproc.operations.get |
List | dataproc.operations.list |
Delete | dataproc.operations.delete |
Dataproc roles
IAM roles are a bundle of one or more permission that you grant to users or groups to permit them to perform actions on Cloud Dataproc resources in your project.
The following table lists the Cloud Dataproc IAM roles and the permissions associated with each role:
Cloud Dataproc Role | Permissions |
---|---|
Dataproc/Dataproc Editor | dataproc.*.create |
dataproc.*.get | |
dataproc.*.list | |
dataproc.*.delete | |
dataproc.*.update | |
dataproc.clusters.use | |
dataproc.jobs.cancel | |
compute.machineTypes.get | |
compute.machineTypes.list | |
compute.networks.get | |
compute.networks.list | |
compute.projects.get | |
compute.regions.get | |
compute.regions.list | |
compute.zones.get | |
compute.zones.list | |
Dataproc/Dataproc Viewer | dataproc.*.get |
dataproc.*.list | |
compute.machineTypes.get | |
compute.regions.get | |
compute.regions.list | |
compute.zones.get | |
resourcemanager.projects.get | |
Dataproc/Dataproc Worker (for service accounts only) | dataproc.agents.* |
dataproc.tasks.* | |
logging.logEntries.create | |
monitoring.metricDescriptors.create | |
monitoring.metricDescriptors.get | |
monitoring.metricDescriptors.list | |
monitoring.monitoredResourceDescriptors.get | |
monitoring.monitoredResourceDescriptors.list | |
monitoring.timeSeries.create | |
storage.buckets.get | |
storage.objects.create | |
storage.objects.get | |
storage.objects.list | |
storage.objects.update | |
storage.objects.delete |
Notes:
- "*" signifies "clusters," "jobs," or "operations," except that
dataproc.jobs.update
is not currently supported, and the only permissions associated withdataproc.operations.
areget
,list,
anddelete
. - The
compute
permissions listed above are needed or recommended to create and view Cloud Dataproc clusters when using the Google Cloud Platform Console or the Cloud SDKgcloud
command-line tool. - To allow a user to upload files, grant the
Storage Object Creator
role. To allow a user to view job output, grant theStorage Object Viewer
role. Note that granting either of these Storage roles gives the user the ability to access any bucket in the project. - A user must have
monitoring.timeSeries.list
permission in order to view graphs on the Google Cloud Platform Console→Dataproc→Cluster details Overview tab. - A user must have
compute.instances.list
permission in order to view instance status and the master instance SSH menu on the Google Cloud Platform Console→Dataproc→Cluster details VM Instances tab. - To create a cluster with a user-specified service account, the specified service account must have all permissions granted by the
Dataproc Worker
role.
Project Roles
You can also set permissions at the project level by using the IAMProjectroles. Here is a summary of the permissions associated with IAM Project roles:
Project Role | Permissions |
---|---|
Project Viewer | All project permissions for read-only actions that preserve state (get, list) |
Project Editor | All Project Viewer permissions plus all project permissions for actions that modify state (create, delete, update, use, cancel) |
Project Owner | All Project Editor permissions plus permissions to manage access control for the project (get/set IamPolicy) and to set up project billing |
IAM Roles and Cloud Dataproc Operations Summary
The following table summarizes the Cloud Dataproc operations available based on the role granted to the user, with caveats noted.
Project Editor | Project Viewer | Cloud Dataproc Editor | Cloud Dataproc Viewer | |
---|---|---|---|---|
Create cluster | Yes | No | Yes | No |
List clusters | Yes | Yes | Yes | Yes |
Get cluster details | Yes | Yes | Yes1, 2 | Yes1, 2 |
Update cluster | Yes | No | Yes | No |
Delete cluster | Yes | No | Yes | No |
Submit job | Yes | No | Yes3 | No |
List jobs | Yes | Yes | Yes | Yes |
Get job details | Yes | Yes | Yes4 | Yes4 |
Cancel job | Yes | No | Yes | No |
Delete job | Yes | No | Yes | No |
List operations | Yes | Yes | Yes | Yes |
Get operation details | Yes | Yes | Yes | Yes |
Delete operation | Yes | No | Yes |
Service accounts
A service account is a special account that can be used by services and applications running on your Compute Engine instance to interact with other GCP APIs.
Applications can use service account credentials to authorize themselves to a set of APIs and perform actions within the permissions granted to the service account and virtual machine instance.
Specifying a user-managed service account when creating a Cloud Dataproc cluster allows you to use that service account for the virtual machines in that cluster. If a service account is not specified, Cloud Dataproc virtual machines will use the default Google-managed Compute Engine service account.
Specifying a user-managed service account when creating a Dataproc cluster allows you to create and utilize clusters with fine-grained access and control to Cloud resources. Using multiple user-managed service accounts with different Cloud Dataproc clusters allows for clusters with different access to Cloud resources.
Important notes
- Service accounts can only be set when a cluster is created.
- Once set, the service account used for a cluster cannot be changed.
- Make sure that service accounts have appropriate scopes and IAM roles for your needs.
- Service accounts used with Cloud Dataproc must have Dataproc/Dataproc Worker role (or have all the permissions granted by Dataproc Worker role).
- Compute Engine virtual machines used in Dataproc clusters still need specific access scopes.
Specifying custom service account:
When creating a new cluster, you can use the following gcloud command-line tool argument to set the service account:
--service-account=[SERVICE-ACCOUNT-NAME]@[PROJECT_ID].iam.gserviceaccount.com
The Google Cloud IAM service account to be authenticated as.
When creating a Dataproc cluster through the REST API, specify a service account serviceAccount
field of the GceClusterConfig
parameter. Custom service accounts need permissions equivalent to the following IAM roles:
- roles/logging.logWriter
- roles/storage.objectAdmin
Default Google Compute Engine service account:
When created, Google Compute Engine virtual machines can be configured to use a specific service account. If a service account is not specified, the default Compute Engine service account is used:
[PROJECT_NUMBER][email protected]
When created with the default service account, an instance has automatically the following access scopes:
- Read-only access to Google Cloud Storage: https://www.googleapis.com/auth/devstorage.read_only
- Write access to write Compute Engine logs: https://www.googleapis.com/auth/logging.write
- Write access to publish metric data to your Google Cloud projects: https://www.googleapis.com/auth/monitoring.write
- Read-only access to Service Management features required for Google Cloud Endpoints(Alpha): https://www.googleapis.com/auth/service.management.readonly
- Read/write access to Service Control features required for Google Cloud Endpoints(Alpha): https://www.googleapis.com/auth/servicecontrol
Default Dataproc service account scopes
To specify the service account scopes for a Dataproc cluster instances, use the serviceAccountScopes[] field of the GceClusterConfig parameter. The following base set of scopes is always included:
- https://www.googleapis.com/auth/cloud.useraccounts.readonly
- https://www.googleapis.com/auth/devstorage.read_write
- https://www.googleapis.com/auth/logging.write
If no custom scopes are specified, the following defaults are also provided:
- https://www.googleapis.com/auth/bigquery
- https://www.googleapis.com/auth/bigtable.admin.table
- https://www.googleapis.com/auth/bigtable.data
- https://www.googleapis.com/auth/devstorage.full_control
Remember that access scopes are limited to the service to which they apply. For example, if a Cloud Dataproc cluster been granted only the https://www.googleapis.com/auth/storage-full scope for Google Cloud Storage, then it can't use the same scope to make requests to BigQuery.