"Life is all about sharing. If we are good at something, pass it on." - Mary Berry

Terraform failed to acquire state lock: 403: Access denied., forbidden

2022-11-17

Categories: DevOps Programming

Problem

We stored Terraform state in gcs. For some reasons, we got this error randomly when running on BitBucket pipelines:

 1 2│ Error: Error acquiring the state lock
 3 4│ Error message: 2 errors occurred:
 5* writing
 6"gs://bucket/path/to/default.tflock"
 7│ failed: googleapi: Error 403: Access denied., forbidden
 8* storage: object doesn't exist
 9101112│ Terraform acquires a state lock to protect the state from being written
13│ by multiple users at the same time. Please resolve the issue above and try
14│ again. For most commands, you can disable locking with the "-lock=false"
15│ flag, but this is not recommended.
16

It can be successful after re-running 3, 4 times.

Troubleshooting

I tried to reproduce this issue on my local, but I cannot. Please notice that if the state has been locked by other pipeline, we will get something different:

 1│ Error: Error acquiring the state lock
 2 3│ Error message: writing
 4"gs://bucket/path/to/default.tflock"
 5│ failed: googleapi: Error 403: Access denied., forbidden
 6│ Lock Info:
 7│   ID:        1668249141203541
 8│   Path:      gs://bucket/path/to/default.tflock
 9│   Operation: OperationTypePlan
10│   Who:       root@de3c91cdae61
11│   Version:   1.4.0
12│   Created:   2022-11-12 10:32:21.161498675 +0000 UTC
13│   Info: 

And if the service account do not have permission on that bucket, we will get another error:

 1 2│ Error: Error acquiring the state lock
 3 4│ Error message: 2 errors occurred:
 5* writing "gs://bucket/path/to/default.tflock" failed: googleapi: Error 403:
 6│ username@project.iam.gserviceaccount.com does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied
 7│ on resource (or it may not exist)., forbidden
 8* googleapi: got HTTP response code 403 with body: <?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access
 9│ denied.</Message><Details>username@project.iam.gserviceaccount.com does not have storage.objects.get access to the Google Cloud Storage object. Permission
10│ 'storage.objects.get' denied on resource (or it may not exist).</Details></Error>

Moreover, I cannot think of a reason why a service account don’t have permission randomly.

By re-reading the documentation, I pay attention to this:

IAM Changes to buckets are eventually consistent and may take upto a few minutes to take effect. Terraform will return 403 errors till it is eventually consistent.

I suspected that something changed IAM permissions but our DevOps team confirmed that there is no such cron job.

Here are the other things I tried, but it didn’t help:

I tried to take a deeper look at the source code::

and add some debug code to print the credential:

 1	credential, err := google.FindDefaultCredentials(context.Background())
 2	if err != nil {
 3		return nil, err
 4	}
 5	log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential: %+v", credential)
 6	log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential.JSON: %s", string(credential.JSON))
 7	var data map[string]interface{}
 8	if err := json.Unmarshal(credential.JSON, &data); err == nil {
 9		log.Printf("[TRACE] google-api-go-client/storage/v1: doRequest: credential: %v", data)
10	}

and I know that our runners are running on Google Compute Engine.

By re-reading documentation one more time:

If you are running terraform on Google Cloud, you can configure that instance or cluster to use a Google Service Account. This will allow Terraform to authenticate to Google Cloud without having to bake in a separate credential/authentication file. Make sure that the scope of the VM/Cluster is set to cloud-platform.

I found out that one of our runners does not have that access scope:

1    "scopes": [
2      "https://www.googleapis.com/auth/devstorage.read_only",
3      "https://www.googleapis.com/auth/logging.write",
4      "https://www.googleapis.com/auth/monitoring.write",
5      "https://www.googleapis.com/auth/servicecontrol",
6      "https://www.googleapis.com/auth/service.management.readonly",
7      "https://www.googleapis.com/auth/trace.append"
8    ]
9  }

Solution

Edit the VM and make sure that it has cloud-platform scope:

1    "scopes": [
2      "https://www.googleapis.com/auth/cloud-platform",
3      "https://www.googleapis.com/auth/devstorage.read_only",
4      "https://www.googleapis.com/auth/logging.write",
5      "https://www.googleapis.com/auth/monitoring.write",
6      "https://www.googleapis.com/auth/service.management.readonly",
7      "https://www.googleapis.com/auth/servicecontrol",
8      "https://www.googleapis.com/auth/trace.append"
9    ]

Happy debugging!

Tags: terraform gcs golang

Edit on GitHub

Related Posts: