Murtaza Ali Laghari
4 min readMay 22, 2021

--

Why chaos is sometimes needed — Chaos Engineering — Royal Cyber

By Murtaza Laghari

Have you ever thought of “What-if” scenarios in your organization? Let’s suppose you are a successful retailer and on Black Friday (a day with massive sales across the world specially US) your website goes down. You are probably thinking it won’t happen right? Wrong. It happened. J.Crew/Ulta/Lulemon went through this. Read more here: https://www.ciodive.com/news/black-friday-traffic-brings-down-j-crew-ulta-sites-among-other-retailers/542926/. See, the thing is no one was expecting traffic issues, server issues, or network issues within their organization hence they missed out on millions of dollars and customers who went elsewhere because a proactive approach was not in place. That proactive approach is called Chaos Engineering.

  • What exactly is chaos engineering? Chaos Engineering is a principle where planned and systematic experiments are conducted in order to reveal the weaknesses. What kind of weaknesses? Like I mentioned in the previous paragraph the “what-ifs” are answered. What if the system fails? Now, no one likes chaos. As the old saying goes “Don’t fix something which isn’t broken”, however times have changed. Advancement in technology is at a tremendous pace. Nowadays you need to have a backup data center, not have failover apps, not have failover data centers, and this kind of availability is only improved through chaos engineering.

Netflix worked on Chaos Engineering as they knew the importance of it. All organizations want to reach the maturity level of Netflix and Amazon, but they need to ask themselves are they adopting the same practices. Blockbuster 2.0 aka Netflix attained tremendous benefits from chaos engineering, and they tackled the practice of chaos engineering in a perfect manner. Netflix created what is known as the Simian Army (learn more at https://www.gremlin.com/chaos-monkey/the-simian-army/), and within the Simian Army these are some of the most famous tools

1. Chaos Monkey — Chaos Monkey was the first tool created by Netflix. What chaos monkey did was amazing. It went and killed random services to see how the rest of the system would react. Would the system still work? Would the speed change? Netflix got answers to so many of these queries through Chaos Money

2. Chaos Gorilla — Chaos Gorilla targeted availability zones. Availability zones are basically data centers within a region. AWS currently has 18 regions. Chaos Gorilla basically took out an entire availability zone and evaluated what the reaction was. Was there a backup? Was the system still running? Were there any changes to the network? These are the questions chaos gorilla solved for Netflix.

3. Chaos Kong — Chaos Kong is named after King Kong. This means a big issue has occurred. This means what would happen if an entire region was experiencing outages. What would happen then?

Even though Netflix led the way in chaos engineering. Facebook and Amazon soon followed. Facebook started Facebook Storm which answered queries like what would happen if its data centers go down, and Amazon started Amazon Gamedays which killed random servers to see the reaction of the system.

So, what are the benefits of chaos engineering? If model companies like Netflix, Facebook, and Amazon are practicing it there must some sort of return, right? Yes. The return is a proactive approach. Always being ready. Here are some of the benefits of chaos engineering:

1. Prevents business revenue loss. It assists in helping prevent outages that would cause massive disruption in the market. Helps you prepare for disaster.

2. Increased understanding of the system. The teams working on the system get a better understanding of the system, how it works, and what to do in order to keep it fully functionally at its best capability

3. Happy customers. Fewer outages mean customers are always getting what they want; hence they are happy. Client satisfaction is one of the most important metrics in which success is measured at organizations.

The future of Chaos Engineering is pretty clear. It needs to be practiced by all organizations in order to know the faults in their systems and use a proactive approach. This is of utmost importance as the majority of their customers and revenue depend on the availability, reliability, and scalability of their systems.

Now let’s discuss Gremlin. Gremlin is a platform used for running chaos engineering practices. With Gremlin, you can inject attacks into hosts or containers to see how the system would react. From DNS attacks to traffic changes. In short, Gremlin can be called Chaos Engineering as a Service.

Chaos Engineering has tremendous benefits and is being adopted by some of the world’s most leading companies like Facebook, Amazon, Netflix, and others. This is why I want you to ponder upon the statement, “Why Chaos Is Sometimes Good” If you would like a free demo and learn more feel free to visit us at https://www.royalcyber.com/services/devops/chaos-engineering/ and contact us.

You are probably wondering why you should choose us to do Chaos Engineering in your organization. Well, simple answer. We have been in the IT industry since 2002, and have constantly delivered for clients like Capital One, Caesars Ent, AAA, and Toyota. We have certified experts who deliver projects based on a customer solution-focused approach. We have a complimentary assessment which would cost you $0. What are you waiting for, visit the link above to get started. We would love for you to be proactive and help you reach your business goals.

Thank you for reading

--

--