Posted by Martin Hentschel
Jul 16, 2015

Security of customer data is critically important at Snowflake. We use sophisticated technologies to ensure the security and safety of data. We decided to share some of these technologies and the experiences we had with them.

AWS CloudHSM

One of the services offered by Amazon Web Services (AWS) is AWS CloudHSM. CloudHSM is a hardware security module (HSM) that allows you to securely store keys and perform cryptographic operations on the device. CloudHSM is an important building block of Snowflake’s security infrastructure, ensuring the security and integrity of customers’ data.

luna-sa-5_angle-photo
Hardware Security Module (Image: SafeNet HSM Device)

CloudHSM is a hardware security module that plugs into your “cloud”. It’s almost as if someone from Amazon walks over and attaches a physical device to your Amazon Virtual Private Cloud (VPC). An example of an HSM is shown in the image above. CloudHSM is only available if you are using Amazon Virtual Private Cloud, and it is fairly expensive as well–it currently costs $5,000 upfront and $1.88 per hour.

CloudHSM has some useful properties:

  • Safe key storage. Keys generated on CloudHSM can never be exported from the HSM. They are safely stored on the device.
  • Cryptographic operations on the device. Stored keys can be used to encrypt and decrypt other data. These cryptographic operations happen on the HSM, without releasing the stored keys.
  • Random number generation. CloudHSM generates strong random numbers.

CloudHSM at Snowflake

CloudHSM sits within Snowflake’s security infrastructure. Without going into too much detail about our security framework, we store the top-most encryption keys of our key hierarchy in CloudHSM and generate lower-level keys using CloudHSM’s random number generation. That means we encrypt customer data with strong protection (because CloudHSM safely stores encryption keys) and strong security (because CloudHSM generates strong random numbers).

At Snowflake, we use CloudHSM for the following tasks:

  • Securely storing keys. At Snowflake, we store the top-most keys of the key hierarchy in CloudHSM. These keys can never leave the HSM and they are used to encrypt and decrypt lower-level keys in the hierarchy.
  • Wrapping / unwrapping keys. Lower-level keys in the key hierarchy are encrypted (wrapped) and decrypted (unwrapped) using their respective upper-level keys. On the very top of the key hierarchy, wrapping and unwrapping are performed on CloudHSM. This allows wrapping and unwrapping keys without ever revealing or releasing the top-most keys.
  • Generating random numbers. All encryption keys at Snowflake are created using strong random numbers generated by CloudHSM.

War Stories

Adding CloudHSM to your system can be tricky. Here are two war stories that our engineers were facing when interfacing with CloudHSM. These stories show how tricky it can be when interacting with CloudHSM–when used incorrectly, an HSM will quickly shut down or lock you out, demonstrating what HSM vendors advertise as tamper resistance.

  • Crashed JVM. While configuring CloudHSM, installing partitions and getting the correct configuration files in place, we started implementing client-side code in parallel. While our operations team was setting up the device, our engineering team was already issuing first commands from Java. Because Java connects to CloudHSM via the Java Native Interface (JNI) and because something on the device was ill configured, we experienced an abnormal crash of the Java Virtual Machine (JVM). Such an abnormal crash of the JVM coming from JNI is undetectable; you cannot catch it like you can catch Java exceptions. Digging into it, we believe we hit a SIGABRT in the underlying C library that JNI connects to. In discussions with the support teams from Amazon and SafeNet, the CloudHSM vendor, we realized that you should not configure CloudHSM while making client calls in parallel. It might crash your JVM.
  • Crashed network interface. We performed various tests such as sending malformed client requests to the CloudHSM. Immediately, CloudHSM stopped responding and ultimately shut down its network interface. While at first this seemed scary, especially when developing client applications, this never happened when sending correctly formatted client requests.

Both of these stories show that (a) CloudHSM can be tricky to deal with, but (b) that this is also by design. HSMs are designed to be tamper-resistant. So any wrongdoing when interacting with CloudHSM will lock you out, and that’s a good thing. However, when used correctly, CloudHSM is a stable and safe method for ensuring the security and safety of data.