Another week, another massive corporate data leak. Last week, Decathlon disclosed that 123 million records were exposed to the public as a result of a misconfigured data warehouse. The data warehouse was discovered by Noam Rotem and Ran Locar of VPNMentor. The personally identifiable information (PII) of both customers and employees were unencrypted and made available for the entire public to access online. Exposed information included employee usernames, clear text passwords, social security numbers, full names, addresses, mobile phone numbers and birth dates, as well as customer email and login details.
It’s become clear to us at Satori that today’s model for data security is completely inadequate for the cloudDecathlon is far from the first company to suffer from such a misconfiguration and they certainly won’t be the last. In fact, Satori’s research team has discovered that the ingredients for another leak of this scale exist in nearly ten thousand other companies worldwide that use similar internet-accessible databases. Considering the public relation and regulatory fallout from these kinds of data leaks, many find themselves asking what, if anything, is being done to prevent them. Many are also wondering why they’re still happening at all. As someone that has lived and breathed data security for 15 years, I have a few thoughts. First, let’s do away with a few assumptions floating around. Are companies simply disregarding the importance of keeping their big data repositories secure? I don’t think that’s the case. Are companies failing to understand that their cloud configuration is an important part of their defense? Surely we’re past that point. Do companies struggle to appreciate that cloud development models demand their own unique requirements and constraints on security? On this point, I’ve seen the security industry increasingly adopt this ethos, and many vendors have begun to roll-out incredible solutions tailored to the specific needs of cloud security. So what’s missing? It’s become clear to us at Satori that today’s model for data security is completely inadequate for the cloud. For years, data has been couched in layers of security, from network security to application security, end-point security to anomaly detection. This approach ensured that gaps were more or less covered and significantly limited the real threat of a data leak. Unfortunately, this layered security approach has failed to be implemented as companies migrate to the cloud—and nothing else has taken its place.
Unfortunately, this layered security approach has failed to be implemented as companies migrate to the cloud—and nothing else has taken its placeThere’s a saying in aviation to always “fly two mistakes high”. It means that you should never put yourself in a position where one mistake can take you down. This is exactly what layered approaches help security teams achieve and precisely why today’s approach to data lake and data warehouse security on the cloud is doomed to fail. Relying on cloud configuration management alone cannot keep companies safe from data leaks and is many steps short of keeping big data stores safe. It is enough for one employee to replicate a VM housing sensitive data to an environment that is not configured to hold it to bring the whole plane down The challenge with cloud configuration management is that it directly ties to the parameters of your cloud deployment—the services you run, the instances you deploy and your environment. These are all very dynamic and perfectly representative of how engineering teams constantly evolve software and services. While a company should naturally aim to be on top of that, having such a dynamic and volatile last line of defense exposes them to an unacceptable degree of both unintentional and unpredicted risks. This begs the question: what should the last line of defense of data security look like on the cloud?:
- First, it must be isolated from environment changes—if it isn’t, you can’t be reactive to new changes and are dangerously exposed to the risk of mistakes and slow response.
- Second, it must be simple to configure and enforce—otherwise, you end up with the same configuration challenges you have for your cloud configuration and will find yourself back at square one.
- Next, it must be transparent in the environment—without transparency, friction will push people to bypass it in order to get their job done.
- Finally, it must be universal, running on any environment which ensures that it can be deployed across different cloud providers and environments.