[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent reconciliations while waiting for external resource deletion #305

Open
sergenyalcin opened this issue Nov 26, 2021 · 6 comments
Open
Assignees
Labels
wontfix This will not be worked on

Comments

@sergenyalcin
Copy link
Member

What happened?

While trying to delete a Managed Resource that I created on provider-jet-azure, I observed that the resource's controller reconciles very often. To see if this behavior is normal, I debug provider-jet-azure and observed that in the Reconcile function of crossplane-runtime, we set the Requeue value of the Result object to true after the deletion requested.

return reconcile.Result{Requeue: true}, errors.Wrap(r.client.Status().Update(ctx, managed), errUpdateManagedStatus)

When I realized this situation, I expected to observe the same behavior in native providers and I did a debugging in provider-azure. I observed the same situation for native providers (frequent reconcile). However, after the second or third reconcile function, the sdk's get call no longer finds the resource in the cloud, so reconciling is not repeated too many times. Therefore, this is not a problem for native providers, as it does not reconcile too many times despite frequent reconciliations. However, since the deletion process in jet-providers takes a long time due to the terraform binary, we observe reconcile operation 5-10 times in a second for a few minutes.

2021-11-25T20:58:27.733+0300    DEBUG   provider-jet-azure      Reconciling     {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration"}
2021-11-25T20:58:27.734+0300    DEBUG   provider-jet-azure      Successfully requested deletion of external resource    {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration", "uid": "6854b1e0-cc0c-4020-b2a7-a80be394acf0", "version": "1295367", "external-name": "max_wal_senders", "deletion-timestamp": "2021-11-25 20:58:08 +0300 +03"}
2021-11-25T20:58:27.734+0300    DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"Configuration","name":"example-psql-configuration","uid":"6854b1e0-cc0c-4020-b2a7-a80be394acf0","apiVersion":"postgresql.azure.jet.crossplane.io/v1alpha1","resourceVersion":"1295367"}, "reason": "DeletedExternalResource", "message": "Successfully requested deletion of external resource"}
2021-11-25T20:58:27.741+0300    DEBUG   provider-jet-azure      Reconciling     {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration"}
2021-11-25T20:58:27.741+0300    DEBUG   provider-jet-azure      Successfully requested deletion of external resource    {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration", "uid": "6854b1e0-cc0c-4020-b2a7-a80be394acf0", "version": "1295367", "external-name": "max_wal_senders", "deletion-timestamp": "2021-11-25 20:58:08 +0300 +03"}
2021-11-25T20:58:27.741+0300    DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"Configuration","name":"example-psql-configuration","uid":"6854b1e0-cc0c-4020-b2a7-a80be394acf0","apiVersion":"postgresql.azure.jet.crossplane.io/v1alpha1","resourceVersion":"1295367"}, "reason": "DeletedExternalResource", "message": "Successfully requested deletion of external resource"}
2021-11-25T20:58:27.751+0300    DEBUG   provider-jet-azure      Reconciling     {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration"}
2021-11-25T20:58:27.751+0300    DEBUG   provider-jet-azure      Successfully requested deletion of external resource    {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration", "uid": "6854b1e0-cc0c-4020-b2a7-a80be394acf0", "version": "1295367", "external-name": "max_wal_senders", "deletion-timestamp": "2021-11-25 20:58:08 +0300 +03"}
2021-11-25T20:58:27.751+0300    DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"Configuration","name":"example-psql-configuration","uid":"6854b1e0-cc0c-4020-b2a7-a80be394acf0","apiVersion":"postgresql.azure.jet.crossplane.io/v1alpha1","resourceVersion":"1295367"}, "reason": "DeletedExternalResource", "message": "Successfully requested deletion of external resource"}
2021-11-25T20:58:27.759+0300    DEBUG   provider-jet-azure      Reconciling     {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration"}
2021-11-25T20:58:27.760+0300    DEBUG   provider-jet-azure      Successfully requested deletion of external resource    {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration", "uid": "6854b1e0-cc0c-4020-b2a7-a80be394acf0", "version": "1295367", "external-name": "max_wal_senders", "deletion-timestamp": "2021-11-25 20:58:08 +0300 +03"}
2021-11-25T20:58:27.760+0300    DEBUG   controller-runtime.manager.events       Normal  {"object": {"kind":"Configuration","name":"example-psql-configuration","uid":"6854b1e0-cc0c-4020-b2a7-a80be394acf0","apiVersion":"postgresql.azure.jet.crossplane.io/v1alpha1","resourceVersion":"1295367"}, "reason": "DeletedExternalResource", "message": "Successfully requested deletion of external resource"}
2021-11-25T20:58:27.767+0300    DEBUG   provider-jet-azure      Reconciling     {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration"}
2021-11-25T20:58:27.767+0300    DEBUG   provider-jet-azure      Successfully requested deletion of external resource    {"controller": "managed/postgresql.azure.jet.crossplane.io/v1alpha1, kind=configuration", "request": "/example-psql-configuration", "uid": "6854b1e0-cc0c-4020-b2a7-a80be394acf0", "version": "1295367", "external-name": "max_wal_senders", "deletion-timestamp": "2021-11-25 20:58:08 +0300 +03"}

How can we reproduce it?

Firstly, you should create a MR in provider-jet-azure. After the external resource provisioned and the state of MR is ready, you should delete the MR. Then you can check the provicer-jet-azure logs and see the frequent reconciliations.

What environment did it happen in?

Crossplane version: crossplane-runtime v0.15.1-0.20211004150827-579c1833b513

@sergenyalcin sergenyalcin changed the title Frequent reconciliation in jet provider while deletion Frequent reconciliation while deletion Nov 26, 2021
@sergenyalcin sergenyalcin changed the title Frequent reconciliation while deletion Frequent reconciliations while deletion Nov 26, 2021
@muvaf
Copy link
Member
muvaf commented Nov 26, 2021

We're seeing a similar behavior in AWS RDS Instance or EKS clusters as well where deletion takes ~3-10 minutes. In Jet providers, we have callback mechanisms for async calls, so even if we don't requeue after deletion call, we will get an event once it's finished but for native ones, there is no way to know if the deletion is completed unless you run Observe.

I think using RequeueAfter: time.Second * 1 could be a long way, at least saving us from ms-level intervals.

@negz negz self-assigned this Nov 28, 2021
@negz
Copy link
Member
negz commented Dec 1, 2021

I think in the jet providers this could be in part because we're missing a layer of rate limiting. Most Crossplane providers use a rate limiter that applies exponential backoff from 1 to 60 seconds per individual object, and is further constrained by a global token bucket rate limiter that tries to keep overall requeues at 1 per second (allowing bursts of up to 10).

You can see that at https://github.com/crossplane/provider-gcp/blob/cf9303bdf/pkg/controller/database/cloudsql.go#L77

It seems like the jet providers are only using the global rate limiter - not the managed resource scoped one, per:

Note that the global rate limiter is never wrapped with a NewDefaultManagedRateLimiter.

That said, I would expect the global rate limiter alone to level out to reconciling once per second (across all managed resources) with a maximum burst of 10 reconciles per second. Or put otherwise, seeing ~10 reconciles for roughly the first second makes sense to me but after that I'd expect it to happen less frequently.

I would expect that native providers - and jet providers if they started using the DefaultManagedRateLimiter - should be limited to one requeue per second per kind of managed resource.

Note though that I'm in the (slow) process of changing this all per crossplane/crossplane#2595.

@negz negz changed the title Frequent reconciliations while deletion Frequent reconciliations while waiting for external resource deletion Dec 1, 2021
@ulucinar
Copy link
Contributor
ulucinar commented Dec 1, 2021

Then what about consuming crossplane-runtime v1.6 in terrajet-based providers and adding the --max-reconcile-rate command-line option to these providers? We should also consider the same for the https://github.com/crossplane-contrib/provider-jet-template repo.

@negz
Copy link
Member
negz commented Dec 2, 2021

@ulucinar I agree we should update both. #285 might block that though - I'm currently looking into it.

@muvaf
Copy link
Member
muvaf commented Dec 6, 2021

@negz How would #285 block? I don't think Terrajet is affected by it anymore.

@stale
Copy link
stale bot commented Aug 13, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants