[go: nahoru, domu]

Skip to content

Releases: litmuschaos/litmus

2.1.0

14 Sep 15:00
47263dc
Compare
Choose a tag to compare

New Features & Enhancements

  • Introduces azure-disk-loss experiment, which detaches the virtual disk from an Azure instance

  • Introduces pod-network-partition experiment, which blocks Ingress and Egress traffic of the target application. It provides the option to specify the pod-selector, namespace-selector, and ports.

  • Introduces the ability to provide default application status checks as a tunable for the pod-specific experiments from the ChaosEngine. It is helpful for the scenarios where we don’t want to validate the application status as a mandatory check during pre & post chaos.

  • Enhance the GCP VM Instance Stop experiment for VMs that are a part of an Autoscaling Group, where the VMs are now awaited to fully stop before proceeding with the experiment.

  • Enhancements for the cmdProbe to run the probe pod with the host network.

  • Enhance probe pod functionality where the experiment serviceAccount is now passed to the probe pod. It is helpful for the scenarios where the probe pod needs specific RBAC permissions for the execution.

  • litmuschaos_awaited_experiment metric contains an injection_time label. Adds injection_time label as a tunable via the INJECTION_TIME_FILTER ENV. Which can be used for the annotations inside the interleaving dashboards

  • Enhance the chaos-scheduler to specify the minutes and hours separately. This provides the ability to schedule the chaos at the nth minutes per x hours. Which will be helpful for the staggered schedules to avoid collisions.

  • Adds the ability to search experiments based on certain keywords in Chaos Hub. Added all the related keywords for individual experiments.

  • Improves the error handling in the chaos-operator for chaosengine status nil check

  • Removes the init container which modifies the permissions of the docker socket path, and instead provides the docker socket path inside the DOCKER_HOST ENV for the Pumba Lib.

  • litmus metrics are now exported only for the active ChaosEngines. It stops overriding the value of existing metrics as soon as ChaosEngine comes to the completed state.

  • Enhance the e2e pipeline by adding debug steps and using shared runners to run tests in parallel, also allow the ability to run a test with a custom service account to validate a specific set of permissions required for an experiment.

Major Bug Fixes

  • Fixes chaos-result name generation where the experiment-name and instance-id are passed inside the helper pods, which are used to generate the chaos-result name.

Installation

kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v2.1.0.yaml

Verify your installation

  • Verify if the chaos operator is running
    kubectl get pods -n litmus

  • Verify if chaos CRDs are installed
    kubectl get crds | grep chaos

For more details refer to the documentation at Docs

2.0.0

11 Aug 11:39
e5999d5
Compare
Choose a tag to compare

Introduction

Version 2.0.0 brings newer capabilities to the LitmusChaos platform, enabling a more efficient practice of chaos engineering. The major version upgrade is being carried out to reflect significant improvements and new features in the platform - many of which were introduced & curated across several preceding 2.0 beta releases with community feedback (thanks to all the early adopters & beta testers for your continued support. Some of these changes, especially. newer experiments and observability improvements have been made available in 1.x too).

Litmus 1.x brought a cloud-native approach to chaos engineering to the definition and execution of chaos intent, along with a ready set of experiments maintained in the ChaosHub. Along the way, newer requirements were incorporated into the project, most notably around a centralized management approach for managing chaos across environments (K8s clusters and cloud instances) and the ability to define workflows to stitch together multiple experiments as part of a complex scenario.

The 2.0 GA release brings these features into the mainstream, having been validated for their usefulness & architecture. Subsequent improvements to these will be carried out in 2.x releases. Some salient features are described briefly in below sections:

Chaos Center

  • A chaos control plane or portal which provides centralized management of chaos operations on multiple clusters across datacenters/cloud. The control plane carries out experiments through agents installed on the registered clusters.
  • Comprises documented APIs that can be used to invoke chaos programmatically
  • Provides visualization capabilities and analytics around chaos execution.
  • Supports a project-teams-users structure to enable collaboration within teams for chaos operations.

Litmus Workflows

  • Introduces chaos workflows - to (a) automate dependency setup (b) aid creation of complex chaos scenarios with multiple faults (c) support definition of load/validation jobs along with chaos injection
  • Provides flexibility in creating/running workflows in different ways - via templates, from an integrated hub, and custom uploads.

Multi-Tenancy

  • Supports setup (control plane & agents) and execution of chaos experiments in both: cluster-scoped and namespace-scoped modes to help operations in shared clusters with a self-service model

Observability & Steady State Hypothesis Validation

  • Provides an increased set of Prometheus metrics with additional filters - which can be used for instrumenting application dashboards to observe chaos impact
  • Provides diverse set of probes to automate validation of steady-state hypothesis - thereby improving the efficiency of running automated chaos experiments

GitOps for Chaos

  • Integrates with Git-based SCM to provide a single-source-of-truth for chaos artifacts (workflows), such that changes are synchronized bi-directionally b/w the git source and the chaos center - thereby pulling the latest artifact for execution.
  • Provides an event-tracker microservice to automatically launch “subscribed” chaos workflows upon app upgrades effected by GitOps tools like ArgoCD, Flux

Non-Kubernetes Chaos

Adds experiments to inject chaos on infrastructure (cloud) resources such as VMs/instances and disks (AWS, GCP, Azure, VMWare) - irrespective of whether they host a Kubernetes cluster or not.

Release Cadence & Versioning

The release cadence & naming conventions continue to adhere to the principles followed thus far in the Litmus project: the monthly minor version releases (2.x.0) will happen on the 15th, with patch releases/hotfixes going into 2.x.x, on a need/demand basis. The 1.x version will be stopped at 1.13.x (1.13.8 at this point) and further patches will be made only upon request/community need.

Backward Compatibility

Having said that, Litmus 2.x completely lends itself to the 1.x mode of execution the users are familiar with, i.e., you could still continue to deploy the latest chaos-operator deployment in admin/namespace mode, pull ChaosExperiment templates/CRs from the ChaosHub & trigger chaos by applying the ChaosEngine CR. The latest chaos-exporter & chaos-scheduler will continue to be operable as they are. However, the introduction of the Chaos-Center (also commonly referenced as Litmus Portal by the beta test community) simplifies the above process greatly while giving you additional nuts & bolts.

Migration from 1.x to 2.x

To make use of the Chaos Center and other capabilities of Litmus 2.0, please remove any existing ChaosEngines, uninstall the chaos operator deployment & follow the Litmus 2.x installation instructions.

If you would like to consume just the backend infrastructure components (chaos operator, crds et al), please follow the regular procedure in applying the latest operator manifest or start using the operator helm chart to allow for subsequent helm upgrades.

If you are a beta user on 2.0.0-beta9, follow the upgrade procedure to start using the Litmus 2.0 GA build.

Documentation

The documentation has undergone considerable changes - in terms of content and structure and it continues to undergo improvements as of the 2.0 release. We expect that a few more iterations are needed to sort out the Information Architecture.

The installation details for the 2.0 platform along with detailed introductions to concepts, architecture as well as a user guide are now available at https://docs.litmuschaos.io/

The latest chaos experiment details along with chaos custom resource schema specifications (tunables, examples, etc.,) and detailed FAQs & troubleshooting info can be found in https://litmuschaos.github.io/litmus/

For those continuing to use 1.x releases, please note that the docs are now moved to: https://v1-docs.litmuschaos.io/

Misc (monthly changelog between 15/07/2021 to 15/08/2021)

Notes on changes to control plane (chaos center) since 2.0.0-beta9

  • Added new API routes to check the status of the authentication server and to update the user details
  • Added an API to terminate chaos workflow
  • Added namespace scope support for event tracker
  • Bugs fixes/enhancement in the frontend
  • Typo in the nodeSelector schema key
  • Adheres to correct schema in the steady-state validation wizard for Litmus Probes
  • Fixes the inability to login/authenticate after upgrade of chaos-center

Notes on changes to backend execution infrastructure (chaos operator, experiments) since 1.13.8

  • Supports VM belonging to scale-sets (VMSS) as target resources in the Azure instance stop experiment
  • Fixes the limitation/inability to perform abort operations in the namespaced mode of operation in a chaos operator.
  • Fixes an issue (edge case in scaled scenarios) within the abort functionality for the “exec” based chaos experiments (pod-cpu-hog-exec & pod-memory-hog-exec) wherein chaos injection continues to occur even post issual of abort.
  • Adds fix to fail faster when helper pods do not run successfully in an experiment (fail immediately upon identifying helper failure instead of waiting for the customary statusCheckTimeout of 180s, as the helper pods are usually brought up with restartNever policy)
  • Fixes the inability of certain experiments (pod-cpu-hog-exec, pod-memory-hog-exec and pod-dns-error, pod-dns-spoof) to select targets serially for cases where 0 > PODS_AFFECTED_PERC <= 100.
  • Adds a condition to error out/call out the engine schema when neither the .spec.appinfo.applabel nor TARGET_PODS env are specified.
  • Adds missing ability to perform auxiliary application health check in the node-memory-hog experiment and missing support for specifying multiple target nodes via a comma-separated list in node-cpu-hog, node-memory-hog & node-io-stress experiment.
  • Fixes a regression in recent 1.13.x experiments wherein the .spec.appinfo.appkind is mandated (in order to derive parent controller name for pods - as this is used to patch the chaosresult status with target info) even when .spec.annotationCheck set to false. With this fix, you will be able to see older behavior wherein appkind can be left empty for cases where annotationCheck is set to false in the ChaosEngine CR.

2.0.0-RC1

09 Aug 12:02
cbb6f09
Compare
Choose a tag to compare
2.0.0-RC1 Pre-release
Pre-release
Fixed docs versioning (#3106)

Signed-off-by: Amit Kumar Das <amit@chaosnative.com>

2.0.0-Beta9

15 Jul 17:58
7ac0cfa
Compare
Choose a tag to compare

Major Updates

  • Refactored the authentication server to reduce the network latency and added support for enabling/disabling users
  • Added a new pre-defined chaos workflow to check the resiliency of the Bank of Anthos application
  • Enhanced the analytics module with API and user interface optimization
  • Added a usage module in the litmusportal for admin users to check the agent and workflow usage across all accounts
  • Replaced all the privileged permissions with minimal permissions in the cluster and namespace manifest
  • Built a mirror docker image of frontend application with the based path of “/litmuschaos” that can be used in ingress paths
  • Optimized the UI rendering by scoping out the header and sidebar scaffold to a global level and enhanced the user experience with the support of code-splitting at the component level.
  • Added support for multi-architecture(ARM64 and AMD64) docker images for Argo workflow executor and Argo workflow controller
  • Enhanced the E2E pipeline with parallel build to fasten the testing using comment driven approach

Minor Updates

  • Added node Selector as a configurable environment variable for agent manifests
  • Added support to tune chaos workflow environment variables from the user interface
  • Upgraded the go version to 1.16, litmus UI to 1.4.0, and MyHub go pkg to a stable version of go-git

NOTE: With the revamp of authentication module and addition of a new feature Deactivate/Activate User, we have made some changes in Auth Schema and module. This has been done to reduce the network latency and optimize the user experience. With these new changes, who have been running Portal previously won't be able to login after upgrading. We are working on the streamlined Upgrade/Migration process, for now we suggest you to re-setup portal for upgrading to Beta9.

Installation

Applying k8s manifest

Litmus-2.0.0-Beta9 (Stable) cluster scope manifest

kubectl apply -f https://litmuschaos.github.io/litmus/2.0.0-Beta/litmus-2.0.0-Beta.yaml

Or

Litmus-2.0.0-Beta9 (Stable) namespace scope manifest.

export LITMUS_PORTAL_NAMESPACE="litmus"
kubectl create ns ${LITMUS_PORTAL_NAMESPACE}
kubectl apply -f https://raw.githubusercontent.com/litmuschaos/litmus/master/litmus-portal/litmus-portal-crds.yml
curl https://raw.githubusercontent.com/litmuschaos/litmus/master/docs/2.0.0-Beta/litmus-namespaced-2.0.0-Beta.yaml --output litmus-portal-namespaced-k8s-template.yml
envsubst < litmus-portal-namespaced-k8s-template.yml > ${LITMUS_PORTAL_NAMESPACE}-ns-scoped-litmus-portal-manifest.yml
kubectl apply -f ${LITMUS_PORTAL_NAMESPACE}-ns-scoped-litmus-portal-manifest.yml -n ${LITMUS_PORTAL_NAMESPACE}

2.0.0-Beta8

15 Jun 16:57
64a8b12
Compare
Choose a tag to compare

Major Updates

  • Added chaos scheduler dependency in the control plane server to support cron scheduling.
  • Added support for ARM64 architecture for Litmus control plane and agent plane components
  • A warning will be displayed for long-running workflow after 20 mins and added support for deleting and syncing long-running chaos workflows.
  • Added support to use predefined workflow from the connected MyHubs with the project.
  • Optimised the workflow’s graphql queries with support of pagination, sorting, and filtering.
  • Introduced chaos engine as a standalone workflow which makes it independent of Argo Workflow.
  • Added podGC strategy along with revert chaos to remove workflow related artifacts after the completion of workflow
    Note: Logs won’t be accessible if the revert chaos step is enabled.
  • Added new graph components (stack-bar graph, line-area graph, and radial charts) in the litmus-ui NPM package
  • Restructured the directory of the subscriber and bug fixes of kubeobject in the namespace mode.
  • Redesigned analytics dashboards with the new UI components and enhanced some of the existing features like workflow comparison, query manipulation, and dashboard operation. Also, Optimised the graphql APIs to fasten the real-time response.

Minor Updates

  • Removed Argo server deployment and its dependencies from the agent plane list
  • Enhanced the user interface of workflow editor in the litmus portal.

2.0.0-Beta7

17 May 11:35
e728319
Compare
Choose a tag to compare
2.0.0-Beta7 Pre-release
Pre-release
Added CRD for Event-tracker (#2812)

Signed-off-by: Jonsy13 <vedant.shrotria@chaosnative.com>

2.0.0-Beta6

15 May 18:34
8c3d20a
Compare
Choose a tag to compare

Major Updates

  • Added MongoDB go-interface and refactored the database operations and structure to accommodate the test cases easily.
  • Support for adding custom container image registry to chaos workflow manifest.
  • Enhanced the performance of the analytics APIs with memory caching and added APIs to fetching labels and values for a Prometheus series.
  • Added support for mutating the sequence of the workflow steps by drag and drop which reflect the live changes in the DAG.
  • Enhanced the workflow graph to show other node phases such as Omitted, Skipped, and Error for a good user experience.
  • Enhanced the verify and commit page to allow users to have a final review and edit their workflow details before scheduling the same.
  • Bug fixed for some user management operations and refactored teaming APIs to increase the performance.
  • Enhanced the litmusportal user interface to fastens the onboarding process.

Minor Updates

  • Adding support for liveness check of the dependent applications in the agent plane before going active.
  • AirGapped support for the pre-defined workflows by moving the fetching logic to the backend.
  • Added instance-id label in the chaos workflow manifest to avoid multiple scheduling in the multi-Argo server cluster.
  • Added validations for workflow name, GitHub URL, and different probe inputs.

2.0.0-Beta5

30 Apr 21:30
59904a4
Compare
Choose a tag to compare
2.0.0-Beta5 Pre-release
Pre-release
Minor SA fix in eventtracker (namespace) (#2760)

Signed-off-by: Raj Das <mail.rajdas@gmail.com>

2.0.0-Beta4

20 Apr 19:49
7494b0b
Compare
Choose a tag to compare

Major Updates

  • Fixes the inability to successfully register the agents/targets when litmus portal server is brought up with loadbalancer/nodeport service type
  • Makes MyHub source configurable by branch so that latest stable versions of experiments are pulled for custom & predefined workflows
  • Updates the chaos operator dependencies on the subscriber to make use of the latest api changes for chaos resources
  • Updates the chaos operator, runner & exporter image tunables/ENVs in the subscriber so that the latest stable versions are installed on the targets
  • Updates Okteto dev setup instructions to reflect latest image versions and changes in specification (env) as well as instructions
  • Updates the chaosengine CRD validation schema for annotation injection in the manifests maintained & installed by the subscriber

Minor Updates

  • Improves the icons for revert chaos and workflow scheduling
  • Optimizes the teaming code to remove redundant conditions
  • Improved styling & background adopted from litmus-ui

2.0.0-Beta3

15 Apr 18:48
aff0fef
Compare
Choose a tag to compare

Litmus 2.0.0-Beta3

Major Updates

  • Support for policy-based control of event tracker where users can define their own policy using JMESPath query and based on that event-tracker will react to the application changes.
  • Enhanced UI for workflow Scheduling, gives users the ability to tune annotations, target application details like application namespace, labels, and kind, and probe data using User Interface.
  • New UI for workflow visualization for showing information about workflow and nodes in a better way.
  • We made the onboarding process for users and easier to use through the new UI.
  • Enhanced the homepage to show information like Recent workflow runs, Agent details, and Project details.
  • Shifting project switching from using Redux-based technique to URL-based technique to avoid caching problems.
  • Migrated CircleCI to GitHub workflow and enhanced the continuous integration of the project.
  • Enhanced the analytics module in terms of UI and computation
  • Enhanced the browse workflows table to show resilience score and the total number of experiments passed for the listed workflows.* Support role-based access control in the backend for handling authorization for all requests.
  • Support for storing scheduled workflow templates and adding some new podtato-head predefined workflow templates

Minor Updates

  • Increment in the Better Code Hub(BCH) score
  • Optimized the frontend by shifting the resiliency score calculation to the backend.
  • Restructured the directory structure for settings in the frontend to modularise the code.
  • Support for a reinstall of litmus agents by moving the litmus-portal-config configmap independent of the subscriber.
  • Support for Ingress and Load balancer network type for connecting external agents with Litmus Portal. Based on the server service type, it will generate the endpoint for the external agent.