[go: nahoru, domu]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: More pause options for disaster recovery control #360

Open
chlunde opened this issue Oct 6, 2022 · 3 comments
Open

Proposal: More pause options for disaster recovery control #360

chlunde opened this issue Oct 6, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@chlunde
Copy link
chlunde commented Oct 6, 2022

Just writing down two related ideas here

What problem are you facing?

Disaster recover or migrating resources to other clusters is hard and scary

How could Crossplane help solve your problem?

During migration or disaster recovery, it will be difficult to set "pause" on all resources. It would be nice to pause a full provider, like a CLI argument --pause.

It would also be nice to have a pause option which would Observe but not Create/Update/Delete. This would give an operator confidence in what kinds of actions would run when the cluster is unpaused. This might be a different CLI option or annotation.

@bobh66
Copy link
Contributor
bobh66 commented Oct 6, 2022

Another way to completely disable a provider is to set replicas to 0 in the provider's ControllerConfig

@luebken
Copy link
luebken commented Nov 3, 2022

@chlunde so we have two options for disaster recovery use-cases:

  1. As @bobh66 mentioned, setting the replicas to 0 for the ControllerConfig.
  2. Setting the pause annotation for specific resources: https://crossplane.io/docs/v1.10/concepts/managed-resources.html#pausing-reconciliations.

Would that be sufficient for your use-cases? If not would you mind elaborating why not.

@chlunde
Copy link
Author
chlunde commented Nov 9, 2022

@luebken my main worry when doing use cases such as

  • restoring a cluster (recreate, partial restore, go back in time for a namespace) with thousands of managed resources
  • restore an external resource from backup and then restore and re-attach it to a managed resource

would be that due to some unforeseen issue:

  • many resources are doubly created, for example due to generateName we get role-HASH2 when we had role-HASH1. For example if just restoring a claim and the composition rendering does not use predicatable name/external-name.
  • resources are garbage collected, and then, deleted if we only restore managed resource without claims

So I would like to pause Create/Update/Delete but not Observe to ensure everything is as expected. Pause (as implemented today) would not give any comfort similar to a terraform plan, but this might do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants