Hello fellow Hakkers,
If you’re working with Scala and asynchronous I/O – especially in distributed computing – then you should quickly get familiar with the circuit breaker pattern.
Akka has a simple circuit breaker implementation and it has absolutely nothing to do with Actors. If you’re using Scala futures, you would almost certainly benefit from the implementation of circuit breakers.
Let’s look at what happens when we talk to a service in most normal circumstances.
You may be sending requests to a database or another service, and that service might have a known or estimated maximum throughput – say 100 requests per seconds per instance.
Then, you have the consumers of that service, which could be live users. The known throughput of the input is unknown and can potentially fluctuate 100x throughout the day. If one day your article shows up on hacker news, then you might see 1000x more traffic than you had anticipated.
Normally that fluctuation is tolerable – you’ll end up queuing several requests, and they wait until your system is able to get to your request. Let’s say we want to parse articles for an iphone reader application. You might have some slow consumer in the data pipeline like so:
What happens when you have a large numbers of requests coming in is that the messages start to queue up. It takes longer and longer to process each message.
Eventually your slow consumer will take such a long time that users will get impatient and start hitting retry, only compounding the issue. Services will timeout and users will wait tens of seconds to see errors.
This is a violation of our responsibility to be responsive in all cases.
Ultimately your consumer might end up storing or attempting to process so many messages that it runs out of memory and crashes.
This is a violation of our responsibility to built resilient systems.
Enter the circuit breaker.
If, instead of accepting messages even when the response of downstream systems is slow, we temporarily fail the responses immediately, telling users to come back later, then we start to have a different picture of the problem.
Now the downstream services can “catch up” and heal so we have two benefits:
– We don’t make users wait for an error (responsive)
– We let our systems heal (resiliency)
If the requests are going to timeout and fail anyway, we might as well not compound the problem by throwing more requests on.
The circuit breaker pattern will monitor timeouts and failures between the producer and consumer of requests. Normally it will not affect behaviour. This is the “closed” state for the circuit breaker:
Then, if downstream, a consumer gets overwhelmed and lots of errors or timeouts are occurring, the circuit breaker opens, giving failure messages back to the user quickly.
For timeouts, you can roughly guess the period of time needed based on the the measured max throughput of down service systems against the timeout latency thresh-hold. (ignoring all dependencies, if the system can process 100req/s and the timeout is at 5 seconds, there should be 500 messages queued so it will take 5 seconds to recover.
After waiting a period of time for recovery, the circuit breaker will change states to “half-open” and will try a request to see if the downstream system has recovered. All other requests will still fail with exception.
The response of the sample’d requests is evaluated. If the request fails, we assume the system has not recovered yet, and we flip back to open. If it appears to have recovered, then we can fully close the circuit breaker and continue as usual.
I produced a small scala/akka example here which demonstrates a message producer producing messages faster than the consumer, and shows how the circuit breaker protects against long response times and a total meltdown (eg out of memeory errors). While the Akka toolkit contains a circuit breaker out of the box, you can certainly implement the circuit breaker in any language and system you are using.