[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[azure] Pod fails to obtain token, when 2 MSI are assigned to VM scale set #1548

Closed
mcmx opened this issue Apr 28, 2020 · 24 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mcmx
Copy link
mcmx commented Apr 28, 2020

What happened: Pod fails to refresh token when more than 1 user assigned Managed Identity is associated with the Virtual Machine Scale Set.

time="2020-04-28T21:47:44Z" level=error msg="azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/6100196f-6f28-4a23-9ab0-2e6ccc491527/resourceGroups/EastUS-EQS-AKS-SYSENG/providers/Microsoft.Network/dnsZones?api-version=2018-05-01: StatusCode=400 -- Original Error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Identity not found"}"

What you expected to happen: Use the correct identity or at least show a message that the correct identity to use must be defined.

How to reproduce it (as minimally and precisely as possible):
Start an external-dns pod in a cluster with the Virtual Machine Scale Set using 2 User assigned identities

Anything else we need to know?:
Removing any of the assigned identities make it work, preferably remove the one that doesn't have permissions to the DNS zone you want to manipulate.

azure.json:
{
"tenantId": "cb333d92-redacted-bd",
"subscriptionId": "6100196f-redacted-27",
"resourceGroup": "EastUS-EQS-AKS-SYSENG",
"useManagedIdentityExtension": true
}

Adding to the azure.json "userAssignedIdentityID": "404a9933-redacted-582", didn't work either (but I'm not sure what this option is used for)

Environment:

  • External-DNS version (use external-dns --version): v20200401-v0.7.1
  • DNS provider: Azure
  • Others:
@mcmx mcmx added the kind/bug Categorizes issue or PR as related to a bug. label Apr 28, 2020
@Gemakk
Copy link
Gemakk commented May 12, 2020

Even 2 User Identities both with the proper permissions (Read on Resource Group, Contributor to DNS Zone) causes this error.

@adamrushuk
Copy link

I'm also getting this error when using AKS with managed identity auth enabled (which configures two other User assigned managed identities).

Is there a workaround until this is fixed?

@digitamind
Copy link

I'm still getting this error, please is there a workaround for now?

@digitamind
Copy link
digitamind commented Jul 19, 2020

As explained above, this issue occurred when more than 1 user assigned Managed Identity is associated with the Virtual Machine Scale Set. Adding a particular user assigned managed identity in the azure.json which is then used to create the secret resolved the issue for me.

{
"tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"resourceGroup": "resource-group-name",
"useManagedIdentityExtension": true,
"userAssignedIdentityID": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 17, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 16, 2020
@seanmalloy
Copy link
Member

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 18, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 16, 2021
@darkn3rd
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2021
@darkn3rd
Copy link
Contributor

@digitamind I was following current azure guide and I am not sure how to get the userAssignedIdentityID. Where do we get this from?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2021
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 23, 2021
@etiennetremel
Copy link

@darkn3rd if you have Azure AADPodIdentity setup in the cluster then you can use the following configuration to bind external-dns to the managed identity:

apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
  name: my-external-dns
  namespace: kube-system
spec:
  type: 0
  resourceID: /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/my-group/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my-external-dns

  # az ad sp list --filter "displayName eq 'my-external-dns'" -o tsv --query '[].appId'
  clientID: 00000000-0000-0000-0000-000000000000
---
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
  name: my-external-dns-binding
  namespace: kube-system
spec:
  azureIdentity: my-external-dns
  selector: my-external-dns

And for the external-dns pod, add the following label: aadpodidbinding: my-external-dns.

@IanMoroney
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 22, 2021
@IanMoroney
Copy link

This also happens when using "useManagedIdentityExtension": true which means you are forced to not use this option and spin up your own user assigned identity and use that instead.

This issue needs resolving so that we can use the "useManagedIdentityExtension": true option

@IanMoroney
Copy link

@njuettner,
Would it be possible to use this example to fix the problem?
https://github.com/Azure/aad-pod-identity/blob/master/test/image/identityvalidator/keyvault.go#L42-L48

In this issue here: Azure/aad-pod-identity#778 (comment) it was stated that the above code was used to solve this issue.
I wonder if the environment variable AZURE_CLIENT_ID is not getting set, and needs to be?

@MCKLMT
Copy link
MCKLMT commented Aug 6, 2021

Hello all, any news on this issue? My customer is concerned by it.

@darkn3rd
Copy link
Contributor

@MCKLMT I do not have this this issue when using AAD Pod Identity. I have the default assigned managed identity, and one that I created, which grants access Azure DNS. I am using the preview feature to automate the process, just in case that makes things work.

@marley-ma
Copy link

As explained above, this issue occurred when more than 1 user assigned Managed Identity is associated with the Virtual Machine Scale Set. Adding a particular user assigned managed identity in the azure.json which is then used to create the secret resolved the issue for me.

{ "tenantId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", "resourceGroup": "resource-group-name", "useManagedIdentityExtension": true, "userAssignedIdentityID": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" }

For those who are still struggling with, please follow @digitamind suggestion, thanks buddy, you saved my day!

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@talha0324
Copy link

We are still having this issue. Any fixes or workaround on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests