Terraform failed to acquire state lock: 403: Access denied., forbidden
2022-11-17
Categories: DevOps Programming
Problem
We stored Terraform state in gcs. For some reasons, we got this error randomly when running on BitBucket pipelines:
╷
│ Error: Error acquiring the state lock
│
│ Error message: 2 errors occurred:
│ * writing
│ "gs://bucket/path/to/default.tflock"
│ failed: googleapi: Error 403: Access denied., forbidden
│ * storage: object doesn't exist
│
│
│
│ Terraform acquires a state lock to protect the state from being written
│ by multiple users at the same time. Please resolve the issue above and try
│ again. For most commands, you can disable locking with the "-lock=false"
│ flag, but this is not recommended.
╵
It can be successful after re-running 3, 4 times.
Troubleshooting
I tried to reproduce this issue on my local, but I cannot. Please notice that if the state has been locked by other pipeline, we will get something different:
│ Error: Error acquiring the state lock
│
│ Error message: writing
│ "gs://bucket/path/to/default.tflock"
│ failed: googleapi: Error 403: Access denied., forbidden
│ Lock Info:
│ ID: 1668249141203541
│ Path: gs://bucket/path/to/default.tflock
│ Operation: OperationTypePlan
│ Who: root@de3c91cdae61
│ Version: 1.4.0
│ Created: 2022-11-12 10:32:21.161498675 +0000 UTC
│ Info:
And if the service account do not have permission on that bucket, we will get another error:
╷
│ Error: Error acquiring the state lock
│
│ Error message: 2 errors occurred:
│ * writing "gs://bucket/path/to/default.tflock" failed: googleapi: Error 403:
│ username@project.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied
│ on resource (or it may not exist)., forbidden
│ * googleapi: got HTTP response code 403 with body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access
│ denied.</Message><Details>username@project.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission
│ 'storage.objects.get' denied on resource (or it may not exist).</Details></Error>
Moreover, I cannot think of a reason why a service account don’t have permission randomly.
By re-reading the documentation, I pay attention to this:
IAM Changes to buckets are eventually consistent and may take upto a few minutes to take effect. Terraform will return 403 errors till it is eventually consistent.
I suspected that something changed IAM permissions but our DevOps team confirmed that there is no such cron job.
Here are the other things I tried, but it didn’t help:
- turn on debug by using
TF_LOG=DEBUG - update
terraformto the latest version 1.3.4
I tried to take a deeper look at the source code::
- terraform/internal/backend/remote-state/gcs/client.go
- google-cloud-go/storage/writer.go
- google-cloud-go/storage/http_client.go
- google-api-go-client/storage/v1/storage-gen.go#L9997
- google-api-go-client/storage/v1/storage-gen.go#L9948
and add some debug code to print the credential:
credential, err := google.FindDefaultCredentials(context.Background())
if err != nil {
return nil, err
}
log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential: %+v", credential)
log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential.JSON: %s", string(credential.JSON))
var data map[string]interface{}
if err := json.Unmarshal(credential.JSON, &data); err == nil {
log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential: %v", data)
}
and I know that our runners are running on Google Compute Engine.
By re-reading documentation one more time:
If you are running terraform on Google Cloud, you can configure that instance or cluster to use a Google Service Account. This will allow Terraform to authenticate to Google Cloud without having to bake in a separate credential/authentication file. Make sure that the scope of the VM/Cluster is set to cloud-platform.
I found out that one of our runners does not have that access scope:
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append"
]
}
Solution
Edit the VM and make sure that it has cloud-platform scope:
"scopes": [
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append"
]
Happy debugging!
Related Posts:
- gocloud - writing data to a bucket: 403
- How to create snippets in Helix?
- A terminal UI for Taskwarrior
- A simple terminal UI for ChatGPT
- Learning how to code
Quan Tong