When thinking out possible scenarios, it will probably quickly become apparent that you can think about it all you want, but that in practice just that one scenario will come along that you didn’t think of. Perhaps because you know how the application works and are therefore subconsciously influenced or simply because there are many complex scenarios to consider. To avoid this and better test the application, it is possible to inject Chaos. There is then often talk of Chaos Monkeys: you let them loose in your application but you have no idea what they are going to figure out.
FOUR FORMS OF CHAOS
Simmy is actually a descendant of Polly and has the same setup: create complex rules that you can combine or not.
There are currently 4 forms of chaos:
- errors(exceptions): instead of successful processing, an error is regularly returned;
- result: instead of the expected, correct result, something completely different is returned;
- delay(latency): by building in different delays, you can test how systems handle slow connections to external services or databases;
- behavior: this provides the ability to build additional behaviors into the application. In an example from Simmy’s documentation, they even discard database tables before a call can be processed in order to mimic extreme situations.
For each form of chaos, you can indicate whether a particular rule is on and at what percentage of the rule should be applied.
The code below is an example of ‘exception’ chaos: for 50% of all requests for related products, an ‘InvalidDataException’ is returned:
WHAT TO DO IN THE EVENT OF CHAOS
Injecting chaos, of course, is not the goal in itself. The goal is to be able to resist (un)anticipated problems. A couple of common examples: if you are shopping at Amazon and the alternate offer service has a glitch, it is still possible to place an order. You may not see alternatives, or some regular, well-run products. Chances are you won’t even notice. With Netflix, you may well be temporarily unable to see the latest releases or top 10 while still being able to watch your favorite series. In both cases, the complete service is not dropped and even the most important functionality continues to work. This will not always be possible, but by thinking carefully, testing and developing alternatives, you are constantly working to improve services.
Turning on and off a Chaos can be done in several ways: in the application’s code, in configuration files, remotely using external web services or even real-time with Azure App Configuration. The options are too diverse to discuss them all here, check the source references, the sample project provided or Simmy’ s documentation for all possible options. It is especially important to think carefully about turning Chaos Monkeys on and off. Are you testing in production? Then you probably want to be able to intervene immediately if things go wrong. In other environments, on the contrary, it may be important to be able to determine the degree of chaos so that manual testing can quickly reveal the consequences of the chaos.
How do you actually test that, chaos? Fixed paths and roadmaps will probably not help because you never know whether or not there was chaos during testing. Measuring is knowing, especially when it comes to chaos in production. Does the number of orders decrease significantly? Are predefined boundaries crossed? Is the number of errors recorded increasing? All tools that can help determine whether everything continues to work as it should. Chances are, many of these tools are already relevant to new releases and management of current software.
If you can turn rules on and off yourself, it is also possible to decide when it is a good time to introduce chaos. When it is quiet, or just when it is busy, after new releases or maybe even randomly, pre-announced or without teams being aware.