Terraform failed to acquire state lock: 403: Access denied., forbidden
2022-11-17
Categories: DevOps Programming
Problem
We stored Terraform state in gcs. For some reasons, we got this error randomly when running on BitBucket pipelines:
1╷ 2│ Error: Error acquiring the state lock 3│ 4│ Error message: 2 errors occurred: 5│ * writing 6│ "gs://bucket/path/to/default.tflock" 7│ failed: googleapi: Error 403: Access denied., forbidden 8│ * storage: object doesn't exist 9│ 10│ 11│ 12│ Terraform acquires a state lock to protect the state from being written 13│ by multiple users at the same time. Please resolve the issue above and try 14│ again. For most commands, you can disable locking with the "-lock=false" 15│ flag, but this is not recommended. 16╵
It can be successful after re-running 3, 4 times.
Troubleshooting
I tried to reproduce this issue on my local, but I cannot. Please notice that if the state has been locked by other pipeline, we will get something different:
1│ Error: Error acquiring the state lock 2│ 3│ Error message: writing 4│ "gs://bucket/path/to/default.tflock" 5│ failed: googleapi: Error 403: Access denied., forbidden 6│ Lock Info: 7│ ID: 1668249141203541 8│ Path: gs://bucket/path/to/default.tflock 9│ Operation: OperationTypePlan 10│ Who: root@de3c91cdae61 11│ Version: 1.4.0 12│ Created: 2022-11-12 10:32:21.161498675 +0000 UTC 13│ Info:
And if the service account do not have permission on that bucket, we will get another error:
1╷ 2│ Error: Error acquiring the state lock 3│ 4│ Error message: 2 errors occurred: 5│ * writing "gs://bucket/path/to/default.tflock" failed: googleapi: Error 403: 6│ username@project.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied 7│ on resource (or it may not exist)., forbidden 8│ * googleapi: got HTTP response code 403 with body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access 9│ denied.</Message><Details>username@project.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission 10│ 'storage.objects.get' denied on resource (or it may not exist).</Details></Error>
Moreover, I cannot think of a reason why a service account don’t have permission randomly.
By re-reading the documentation, I pay attention to this:
IAM Changes to buckets are eventually consistent and may take upto a few minutes to take effect. Terraform will return 403 errors till it is eventually consistent.
I suspected that something changed IAM permissions but our DevOps team confirmed that there is no such cron job.
Here are the other things I tried, but it didn’t help:
- turn on debug by using
TF_LOG=DEBUG
- update
terraform
to the latest version 1.3.4
I tried to take a deeper look at the source code::
- terraform/internal/backend/remote-state/gcs/client.go
- google-cloud-go/storage/writer.go
- google-cloud-go/storage/http_client.go
- google-api-go-client/storage/v1/storage-gen.go#L9997
- google-api-go-client/storage/v1/storage-gen.go#L9948
and add some debug code to print the credential:
1 credential, err := google.FindDefaultCredentials(context.Background()) 2 if err != nil { 3 return nil, err 4 } 5 log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential: %+v", credential) 6 log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential.JSON: %s", string(credential.JSON)) 7 var data map[string]interface{} 8 if err := json.Unmarshal(credential.JSON, &data); err == nil { 9 log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential: %v", data) 10 }
and I know that our runners are running on Google Compute Engine.
By re-reading documentation one more time:
If you are running terraform on Google Cloud, you can configure that instance or cluster to use a Google Service Account. This will allow Terraform to authenticate to Google Cloud without having to bake in a separate credential/authentication file. Make sure that the scope of the VM/Cluster is set to cloud-platform.
I found out that one of our runners does not have that access scope:
1 "scopes": [ 2 "https://www.googleapis.com/auth/devstorage.read_only", 3 "https://www.googleapis.com/auth/logging.write", 4 "https://www.googleapis.com/auth/monitoring.write", 5 "https://www.googleapis.com/auth/servicecontrol", 6 "https://www.googleapis.com/auth/service.management.readonly", 7 "https://www.googleapis.com/auth/trace.append" 8 ] 9 }
Solution
Edit the VM and make sure that it has cloud-platform
scope:
1 "scopes": [ 2 "https://www.googleapis.com/auth/cloud-platform", 3 "https://www.googleapis.com/auth/devstorage.read_only", 4 "https://www.googleapis.com/auth/logging.write", 5 "https://www.googleapis.com/auth/monitoring.write", 6 "https://www.googleapis.com/auth/service.management.readonly", 7 "https://www.googleapis.com/auth/servicecontrol", 8 "https://www.googleapis.com/auth/trace.append" 9 ]
Happy debugging!
Related Posts:
- gocloud - writing data to a bucket: 403
- How to create snippets in Helix?
- A terminal UI for Taskwarrior
- A simple terminal UI for ChatGPT
- Learning how to code