[azure] Pod fails to obtain token, when 2 MSI are assigned to VM scale set #1548

mcmx · 2020-04-28T22:17:12Z

What happened: Pod fails to refresh token when more than 1 user assigned Managed Identity is associated with the Virtual Machine Scale Set.

time="2020-04-28T21:47:44Z" level=error msg="azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/6100196f-6f28-4a23-9ab0-2e6ccc491527/resourceGroups/EastUS-EQS-AKS-SYSENG/providers/Microsoft.Network/dnsZones?api-version=2018-05-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}"

What you expected to happen: Use the correct identity or at least show a message that the correct identity to use must be defined.

How to reproduce it (as minimally and precisely as possible):
Start an external-dns pod in a cluster with the Virtual Machine Scale Set using 2 User assigned identities

Anything else we need to know?:
Removing any of the assigned identities make it work, preferably remove the one that doesn't have permissions to the DNS zone you want to manipulate.

azure.json:
{
"tenantId": "cb333d92-redacted-bd",
"subscriptionId": "6100196f-redacted-27",
"resourceGroup": "EastUS-EQS-AKS-SYSENG",
"useManagedIdentityExtension": true
}

Adding to the azure.json "userAssignedIdentityID": "404a9933-redacted-582", didn't work either (but I'm not sure what this option is used for)

Environment:

External-DNS version (use external-dns --version): v20200401-v0.7.1
DNS provider: Azure
Others:

The text was updated successfully, but these errors were encountered:

Gemakk · 2020-05-12T22:11:53Z

Even 2 User Identities both with the proper permissions (Read on Resource Group, Contributor to DNS Zone) causes this error.

adamrushuk · 2020-05-23T11:42:37Z

I'm also getting this error when using AKS with managed identity auth enabled (which configures two other User assigned managed identities).

Is there a workaround until this is fixed?

digitamind · 2020-07-09T20:56:32Z

I'm still getting this error, please is there a workaround for now?

digitamind · 2020-07-19T20:12:08Z

As explained above, this issue occurred when more than 1 user assigned Managed Identity is associated with the Virtual Machine Scale Set. Adding a particular user assigned managed identity in the azure.json which is then used to create the secret resolved the issue for me.

{
"tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"resourceGroup": "resource-group-name",
"useManagedIdentityExtension": true,
"userAssignedIdentityID": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

fejta-bot · 2020-10-17T20:42:42Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-11-16T21:25:29Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

seanmalloy · 2020-11-18T04:30:04Z

/remove-lifecycle rotten

fejta-bot · 2021-02-16T05:09:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

darkn3rd · 2021-02-23T04:42:32Z

/remove-lifecycle stale

darkn3rd · 2021-02-23T04:45:16Z

@digitamind I was following current azure guide and I am not sure how to get the userAssignedIdentityID. Where do we get this from?

fejta-bot · 2021-05-24T04:50:21Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-06-23T05:14:26Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

etiennetremel · 2021-06-25T07:59:26Z

@darkn3rd if you have Azure AADPodIdentity setup in the cluster then you can use the following configuration to bind external-dns to the managed identity:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
  name: my-external-dns
  namespace: kube-system
spec:
  type: 0
  resourceID: /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-group/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my-external-dns

  # az ad sp list --filter "displayName eq 'my-external-dns'" -o tsv --query '[].appId'
  clientID: 00000000-0000-0000-0000-000000000000
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: my-external-dns-binding
  namespace: kube-system
spec:
  azureIdentity: my-external-dns
  selector: my-external-dns

And for the external-dns pod, add the following label: aadpodidbinding: my-external-dns.

IanMoroney · 2021-07-22T10:24:29Z

/remove-lifecycle rotten

IanMoroney · 2021-07-22T11:28:23Z

This also happens when using "useManagedIdentityExtension": true which means you are forced to not use this option and spin up your own user assigned identity and use that instead.

This issue needs resolving so that we can use the "useManagedIdentityExtension": true option

IanMoroney · 2021-07-23T08:39:40Z

@njuettner,
Would it be possible to use this example to fix the problem?
https://github.com/Azure/aad-pod-identity/blob/master/test/image/identityvalidator/keyvault.go#L42-L48

In this issue here: Azure/aad-pod-identity#778 (comment) it was stated that the above code was used to solve this issue.
I wonder if the environment variable AZURE_CLIENT_ID is not getting set, and needs to be?

MCKLMT · 2021-08-06T13:07:51Z

Hello all, any news on this issue? My customer is concerned by it.

darkn3rd · 2021-08-29T04:52:15Z

@MCKLMT I do not have this this issue when using AAD Pod Identity. I have the default assigned managed identity, and one that I created, which grants access Azure DNS. I am using the preview feature to automate the process, just in case that makes things work.

marley-ma · 2021-10-06T21:34:37Z

As explained above, this issue occurred when more than 1 user assigned Managed Identity is associated with the Virtual Machine Scale Set. Adding a particular user assigned managed identity in the azure.json which is then used to create the secret resolved the issue for me.

{ "tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "resourceGroup": "resource-group-name", "useManagedIdentityExtension": true, "userAssignedIdentityID": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }

For those who are still struggling with, please follow @digitamind suggestion, thanks buddy, you saved my day!

k8s-triage-robot · 2022-01-04T21:39:33Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-02-03T22:04:16Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-03-05T22:13:55Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-03-05T22:14:06Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

talha0324 · 2022-03-09T12:06:17Z

We are still having this issue. Any fixes or workaround on this?

mcmx added the kind/bug Categorizes issue or PR as related to a bug. label Apr 28, 2020

adamrushuk mentioned this issue May 23, 2020

Configure external-dns in AKS cluster adamrushuk/devops-lab#3

Closed

3 tasks

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 17, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 16, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 18, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 16, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 23, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 22, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 3, 2022

k8s-ci-robot closed this as completed Mar 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[azure] Pod fails to obtain token, when 2 MSI are assigned to VM scale set #1548

[azure] Pod fails to obtain token, when 2 MSI are assigned to VM scale set #1548

[azure] Pod fails to obtain token, when 2 MSI are assigned to VM scale set #1548

[azure] Pod fails to obtain token, when 2 MSI are assigned to VM scale set #1548

Comments