In my previous blog post I talked about Black Friday and Cyber Monday, and how times of high traffic are also the time your internet facing assets are at the most vulnerable to a DDoS attack. In this post, I want to provide some non-product specific best practices for situations when you expect extraordinary amounts of traffic. This might be during a holiday season, due to a sale, or being featured in the media. In these times, regardless of what security solutions you have implemented, there are a number of things you can do to tilt the scales in your favor. I won’t be talking about the business side, making sure you’ve got the right amount of stock in your warehouses for example, I’ll leave that to the logistics experts. These are focused on the IT Operations and Security tasks.
With that in mind, I’ve assembled the following five things that you should do when preparing – even better, do them on a scheduled basis, twice or more a year.
1. Make sure you understand ALL the components
Most larger websites are a complicated system. There are Frontends, backends, connections to databases, payment processors, and third parties. You need to make sure your organization has the whole chain documented, and mapped out. Without a total understanding, it is hard to find the most likely failure points. If your connection to the payment processor is running though a single ISP with no redundancy, that will pose an issue. It doesn’t matter if your product is protected by the most state of the art NGFW and security services money can buy if your DNS provider is running on a single Commodore64 on 10mbit connection. Make sure the map is shared out and you have organization redundancy in your team.
2. Do REAL testing, End to End
Large scale testing of systems is always a challenge, and often the time, specialised knowledge and tools are not at your disposal. At the very least, run a security scan with some open source tools. Even better is to make sure to include this testing in your yearly budget. There are a number of companies out there you can reach out to for this sort of testing. There are also a lot of semi-legitimate ‘booter’ and ‘stress tester’ services. Don’t use those! Here are some companies that members of the Baffin Bay Networks team have used in the past with good results: Red Wolf Security, ZeroBS (Site in German) and Load Impact.
There is a temptation here to run these tests in a development or test environment. While this is useful, there is no substitute for a full test against the production environment.
3. Find out what options you already have
A lot of production environments have the mantra ‘It’s not broken, so don’t touch it!’. Which, while wise can also limit your resiliency. The services you use are constantly changing and adding new capabilities. Take the time to Speak to your ISP, hosting provider or cloud vendor; Check what kind of services they are offering and what is included in the “standard” package. Most likely there are already a number of capabilities available to you that aren’t being used. Many such valuable features, such as BGP blackholing exist but are seen as too complicated to implement. It’s worth exploring these capabilities. One final note on this point, Cloud computing provides infrastructure, but the burden of securing that infrastructure is on you. Make sure it’s up to the task.
4. Assemble your team
Make sure you’ve allocated extra staff to monitor things for at least a day prior and one day after the expected traffic spike. For example, start on the Thursday before Black Friday, end after Tuesday. This includes monitoring servers, databases, network connections, and everything else you found during your mapping. In the majority of incidents there are warning signs that things are going south far before the actual crash. By increasing vigilance you can hopefully catch things early enough to prevent downtime.
For your external services (DNS, hosting provider, etc), make sure you’ve got the proper support level and know who to contact if something happens. You should also be monitoring these services in-house around the clock.
5. Documented Processes are your best tool
You might have the latest technology and sharpest Team managing it, but it won’t matter if you don’t tie them together with documented processes. Map out both the most likely and most damaging scenarios that might happen. Make contingency plans on what to do if they should occur. Far too often this exists in the head of a key team member – WRITE THEM DOWN (and print them out!). Make Checklists to provide some order to the chaos. You’ll appreciate them on the support call at 3AM.
I’d also recommend running through these in a round table format with all the relevant members of the team, doing a few fire drills each year will be worth it when the unexpected happens.
While this list is by no means comprehensive, I hope that it can provide some food for thought going into the busiest shopping months of the year and help in protecting your web facing assets.
By: James Tucker, Director of System Engineering