Snowflake: Seriously Serious About Security

How Serious Are We About Security? Extremely.

No self-respecting security team is ever satisfied with the existing security controls it has in place. Some mistake this dissatisfaction as a personality disorder, referring to their security team members as “control-freaks” or “over-achievers”. Let’s face it: security professionals tend to be an eccentric group. However, for a truly committed and competent security team, this eccentricity is simply the symptom of the healthy paranoia that comes with being responsible for the protection of vital infrastructure and sensitive data.

Snowflake’s security team has channeled this paranoia into a program we call Seriously Serious Security Testing. There are several components of this program, including the audit of all the usual administrative and technical controls you would expect from a cloud company. However, where Snowflake’s eccentricity truly surfaces is in our embrace of the dreaded Penetration Test. Here are the highlights of Snowflake’s security testing program and the key role that penetration testing plays.

First: What is a Penetration Test?

A penetration test is a controlled attempt to exploit vulnerabilities to determine whether unauthorized access or other malicious activity is possible within the target environment. This is a requirement for PCI-DSS compliance, and is also considered best practice for any organization that takes security seriously. Snowflake engages globally-recognized experts to perform this activity within specific constraints and guidelines as described in the Methodology section below.

Frequency

Most companies avoid penetration tests altogether. Others perform them annually at best, which is the minimum frequency required to meet the standards and certifications their auditors tell them they need. What many auditors don’t challenge is whether or not adequate penetration testing has been performed after every “material change” to the company’s product or infrastructure. It’s unlikely that performing penetration tests annually would be sufficient in a cloud environment where most vendors take pride in the frequent deployment of new features and functionality (Snowflake is no different in this regard with releases several times a month, at least). Because of these frequent changes, it’s important to ensure your cloud vendors are performing frequent penetration testing to ensure no new vulnerabilities have inadvertently been introduced.

Security Penetration Test - Frequency

source: https://www.rapid7.com/globalassets/_pdfs/whitepaperguide/rapid7-research-report-under-the-hoodie.pdf

Much to the irritation of our Operations and Engineering teams, Snowflake has performed more than 5 penetration tests in the past 6 months.

Why would we do this to ourselves? Because we want to know what our weaknesses are! The frequency with which we perform these tests provides Snowflake with the assurance that changes to the Snowflake service, as well as newly discovered vulnerabilities within other components of our environment, are not putting Snowflake or (more importantly) Snowflake’s customers and their data at risk.

Methodology

Another example of Snowflake Security’s paranoia is the approach we take with our penetration testers. Typical penetration testing engagements at Snowflake are designed to simulate the compromise of an employee’s or customer’s credentials by providing the tester with limited access to a non-production environment. Engagements run a minimum of two weeks and begin with providing the testers not only with the aforementioned credentials, but also with substantial information about the architecture, network design, and, when applicable, our application code. (This method is sometimes referred to as White Box Testing.) If, after a specific period of time, the testers have not been able to find an entry point, Snowflake gradually provides the testers with slightly more access until they are able to uncover vulnerabilities, or until the time is up.

Why would we divulge so much information? We want to know what ALL our weaknesses are! This provides us with visibility into what would happen if, for example, we had an insider attempting to gain unauthorized access to data. How far would they get? How quickly could we detect them? How would we contain them? And so on. The information is invaluable.

Security Penetration Test - Vulnerabilities

Most common vulnerabilities found by penetration testers

source: https://www.rapid7.com/globalassets/_pdfs/whitepaperguide/rapid7-research-report-under-the-hoodie.pdf

Transparency

The final example of Snowflake’s Seriously Serious Security Testing program is the highly unusual practice of sharing penetration test reports and remediation documentation with qualified prospects and customers (under NDA, transmitted securely, and with the promise of their first born if there is a compromise). By sharing our reports we are able to solicit additional feedback on ways to improve our testing.

I’ve been on both sides of the audit fence for years, and I’ve yet to find an organization as willing to share as much information about its penetration testing frequency and methodology as Snowflake. However, it comes as no surprise to anyone who has worked with Snowflake. Snowflake’s corporate culture is based on teamwork and collaboration, which spills over into Snowflake’s relationships with customers and vendors. We believe that transparency is the cornerstone to trust, and trust is the cornerstone to a healthy partnership between Snowflake and our customers. Providing the penetration test report and remediation evidence allows customers to see for themselves how seriously we take security, as well as how effective we are at achieving it. This allows our customers and prospects to make an informed decision about the risks they’re taking.

Conclusion

Security is a constantly moving target. Our team will never stop this extreme security testing of our infrastructure because threats are constantly evolving.

So…
Call us control freaks.
Call us over-achievers.
Call us paranoid.

One thing you’ll never call us is complacent…seriously.

For more information, please feel free to reach out to us at info@snowflake.net. We would love to help you on your journey to the cloud, securely. And keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and happenings here at Snowflake Computing.

 

Is My Data Safe in the Cloud?

This post comes to us from a guest blogger, Robert Lockard. Robert is an Oracle ACE with over 30 years experience as an Oracle DBA / Designer and Developer. He has spent the past 10 years focusing on database security. Robert frequently speaks at conferences all over the world, teaching DBA’s and Developers how to secure their data. You can follow him on twitter at @YourNavionPilot and read more of his blogs at http://www.oraclewizard.com.

I have four mantras for any system I’m involved in: Stability, Security, Accuracy and Performance.

When it comes to information security in the cloud, there are lots of recommendations for vendors and customers to consider. When I am called into customer sites to ensure their data is properly protected, my first step is making sure the person connecting to the database is authenticated properly and that encryption has been properly set up, along with a host of other related things. The amount of work required just to get data encrypted, make sure there is no ghost data, and ensure there is no spillage is a difficult process. From my perspective, Snowflake makes the end-to-end encryption easy because encryption is turned on by default. Your data is being encrypted from day one.

Here is my best advice on how to insure your critical business data is secure in the cloud along with my review of how Snowflake addresses these issues.

#1 – Demand Multi-factor Authentication

Snowflake supports using DUO multi-factor authentication to ensure the user who is authenticated to the database is also authorized. Stolen credentials are one of the top ways bad actors get in where they do not belong. The username / password paradigm of authentication has been around for several decades. Username / password harvesting and many of the data breaches, could be reduced by following best practices and a best practice is to use multi-factor authentication. This is something you know and something you have. When you authenticate with a username / password, a token will display a code;  you enter that code into the system and then you are authenticated. This is something Snowflake has built into their system. Multi-factor authentication is not an add-on option; it comes standard with their product.

In addition, Snowflake’s implementation of DUO also allows a verification request to be sent to a mobile device. When you enter your username / password, a request is sent to your mobile device. If you approve the request, you are connected. The beauty of this is, most of us always have our mobile phones with us. If you get a request that you did not originate, you know someone else is trying to use your credentials to get into the system. When you reject the request, they will be locked out.

#2 – Demand End-to-end Encryption

The customer should require end-to-end encryption from any cloud provider they decide to use. Snowflake delivers AES-256 encryption at no extra cost. The AES standard was adopted in 2001 by the National Institute of Standards and Technology (NIST). The AES standard has displaced the older DES standard. The AES standard uses 128 block sizes with three keys sizes 128, 192 and 256 bits. AES encryption is the US Government standard for strong encryption and has been adopted by financial, government and other commercial entities world wide.

#2a – Demand Sophisticated Encryption Key Management

Snowflake uses the Amazon Hardware Security Module (HSM) to manage sophisticated key structures such as key wrapping, key rotation and re-keying.

Key Wrapping (aka Hierarchical Keys)

Key wrapping has become the industry standard. Oracle provides it in Transparent Data Encryption with a master key that is stored outside the database to decrypt the keys that are used to secure the data. Microsoft SQL Server uses a certificate that is stored outside the database to decrypt the key that is used to decrypt the data. Snowflake uses four levels of keys to encrypt customers data.

Using this model, each customer’s data is encrypted with a unique set of keys. This way, customer “A”s data is encrypted with a different set of keys then customer “B”. By using multiple customer keys, each customer’s data is segregated, and secured, from the others.

See the Snowflake Encryption Key Management blog post for more details.

Cryptologic Key Life Time

National Institute of Standards (NIST) recommends having a policy in place so each key has a limited lifetime. The longer a key is in use, the greater the odds it will be compromised (see NIST Encryption Key Management Recommendation for more details). Snowflake uses two methods to control the lifetime of keys: key rotation and rekeying.

1) Cryptologic Key Rotation

One way to minimize the impact of a cryptologic key being compromised is to rotate the keys at a set interval. Snowflake rotates keys at a system defined interval. This ensures if a cryptologic key is compromised then the amount of data at risk is minimized.

The easy way to understand key rotation is this. Data is encrypted and decrypted using a key and, after a set period of time, another key is added; all new data is encrypted and decrypted with this new key and old data is decrypted with the old key. Again, see the Snowflake Encryption Key Management blog post for more details.

The advantage of using key rotation is two fold: a) Because you are adding a key, you don’t have to change the keying on the data that had been encrypted; you get performance. b) Because now there are multiple keys; if any key is compromised only a subset of information is vulnerable.

2) Cryptologic Rekeying

As an additional security option, customers can choose to have their data rekeyed annually. By rekeying the data annually, Snowflake minimizes the amount of time a key is in use, thereby minimizing the odds a cryptologic key will be compromised. Once data is rekeyed, the old keys are destroyed.

Network Encryption

Now we need to deal with data that is moving over the network. All network communications between loading the data and analyzing the data is encrypted using TLS (see Advantages of Using TLS for more details). In addition, for ESD (Enterprise with Sensitive Data) customers, all internal network communications is encrypted using TLS.

By using the TLS standard, Snowflake has implemented the industry best practice.

What about Performance and Stability?

There exists a perception that encryption has a negative performance impact on CPU. The HSM performs a variety of functions. It manages encrypting and decrypting keys, handles key rotation and rekeying. In addition, the HSM is only used to wrap and unwrap account master keys. All remaining encryption is performed by other services in the background, resulting in no impact on customer workloads.

The hardware encryption module is also tamper-resistant. Efforts to tamper with the hardware encryption module will shut the HSM down, yet Snowflake has configured the HSM to be stable and reliable.

Conclusion

Based on my evaluation, by using Snowflake, the customer gets three of my four mantras:

Security: You get multi-factor authentication and end-to-end encryption right out of the box.

Stability: Snowflake has configured the HSM to be very stable, plus it lives on AWS which has proven to be a very stable cloud platform.

Performance: Snowflake’s use of the Amazon Elastic Compute Cloud gives performance on demand.

My fourth manta, Accuracy, is up to you.

Automatic Encryption of Data

Hopefully you had a chance to read our previous top 10 posts. As promised, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:

#2 Automatic Encryption of Data

One of the biggest worries people have about moving to the cloud is security. One key piece of providing enterprise class security is the ability to encrypt the data in your data warehouse environment. With Snowflake, your data is automatically encrypted by default.

No setup, no configuration, no add-on costs for high security features. Data is encrypted during its entire lifecycle. From loading data to storing data at rest, we apply end-to-end encryption, such that only the customer can read the data, and no one else. It is just part of the Snowflake service! That is a huge win for anyone who has ever tried to set up database security of any kind. In addition, this gives Snowflake a significant advantage compared to environments like Hadoop, where encryption and security is almost almost entirely left up to the customer to implement and maintain.

So what level of encryption?

For all data within Snowflake, we use strong AES 256-bit keys. Your data is encrypted as you load it. That is the default, and you cannot turn it off. In addition, our Snowflake security framework includes additional security best practices such as the use of a hierarchical key model and regular key rotation. All of this is automatic and transparent to our customers. In this way we provide our customers best-in-class data security as a service. For even more details on our approach to end-to-end encryption and how we secure your data, check out these blogs; end to end encryption and encryption key management. 

Remember the #10 Top feature – persistent result sets? Well those query results are also encrypted with 256-bit encryption keys (all the data, all the time).

So do you have your data at rest encrypted in your data warehouse today? Are your loading tools, and staging environments also encrypted?

If not, then put your data into the Snowflake Elastic Data Warehouse and rest easy knowing your data is safe and secure. Snowflake brings enterprise grade security to data warehousing in the cloud, with end-to-end encryption as a major part of our offering.

As a co-writer of this series, I would like to thank Kent Graziano, who has put in a lot of effort into bringing the thoughts behind this series to the audience. Without his persistence and vision, this would not have been possible. As always, keep an eye on this blog site, our Snowflake Twitter feed (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.

Kent Graziano and Saqib Mustafa

End-to-End Encryption in the Snowflake Data Warehouse

By Martin Hentschel and Peter Povinec.

Protecting customer data is one of the highest priorities for Snowflake. The Snowflake data warehouse encrypts all customer data by default, using the latest security standards, at no additional cost. Snowflake provides best-in-class key management, which is entirely transparent to customers. This makes Snowflake one of the easiest to use and most secure data warehouses on the market.

In previous blog posts, we explained critical components of Snowflake’s security architecture, including:

  • How Snowflake manages encryption keys; including the key hierarchy, automatic key rotation, and automatic re-encryption of data (“rekeying”)
  • How Snowflake uses Amazon CloudHSM to store and use Snowflake’s master keys with the highest protection
  • How Amazon CloudHSM is configured to run in high-availability mode.

In this blog post, we explain Snowflake’s ability to support end-to-end encryption, including:

  • How customers can upload their files to Amazon S3 using client-side encryption
  • How to use Snowflake to import and export client-side encrypted data.

End-to-End Encryption

End-to-end encryption is a form of communication where only the end users can read the data, but nobody else. For the Snowflake data warehouse service it means that only the customer and runtime components of the Snowflake service can read the data. No third parties, including Amazon AWS and any ISPs, can see data in the clear. This makes end-to-end encryption the most secure way to communicate with the Snowflake data warehouse service.

End-to-end encryption is important because it minimizes the attack surface. In the case of a security breach of any third party (for example of Amazon S3) the data is protected because it is always encrypted, regardless whether the breach is due to the exposure of access credentials indirectly or the exposure of data files directly, whether by an insider or by an external attacker, whether inadvertent or intentional. The encryption keys are only in custody of the customer and Snowflake. Nobody else can see the data in the clear – encryption works!


encryption_blog_diagrams_Diagram 1_Diagram 1
Figure 1: End-to-end Encryption in Snowflake

Figure 1 illustrates end-to-end encryption in the Snowflake data warehouse. There are three actors involved: the customer in its corporate network, a staging area, and the Snowflake data warehouse running in a secure virtual private cloud (VPC). Staging areas are either provided by the customer (option A) or by Snowflake (option B). Customer-provided staging areas are buckets or directories on Amazon S3 that the customer owns and manages. Snowflake-provided stages are built into Snowflake and are available to every customer in their account. In both cases, Snowflake supports end-to-end encryption.

The flow of end-to-end encryption in Snowflake is the following (illustrated in Figure 1):

  1. The customer uploads data to the staging area. If the customer uses their own staging area (option A), the customer may choose to encrypt the data files using client-side encryption. If the customer uses Snowflake’s staging area (option B), data files are automatically encrypted by default.
  2. The customer copies the data from the staging area into the Snowflake data warehouse. Within the Snowflake data warehouse, the data is transformed into Snowflake’s proprietary file format and stored on Amazon S3 (“data at rest”). In Snowflake, all data at rest is always encrypted.
  3. Results can be copied back into the staging area. Results are (optionally) encrypted using client-side encryption in the case of customer-provided staging areas or automatically encrypted in the case of Snowflake-provided staging areas.
  4. The customer downloads data from the staging area and decrypts the data on the client side.

In all of these steps, all data files are encrypted. Only the customer and runtime components of Snowflake can read the data. Snowflake’s runtime components decrypt the data in memory for query processing. No third-party service can see data in the clear.

Customer-provided staging areas are an attractive option for customers that already have data stored on Amazon S3, which they want to copy into Snowflake. If customers want extra security, they may use client-side encryption to protect their data. However, client-side encryption of customer-provided stages is optional.

Client-Side Encryption on Amazon S3

Client-side encryption, in general, is the most secure form of managing data on Amazon S3. With client-side encryption, the data is encrypted on the client before it is uploaded. That means, Amazon S3 only stores the encrypted version of the data and never sees data in the clear.

end-to-end-encryption-05

Figure 2: Uploading data to Amazon S3 using client-side encryption

Client-side encryption follows a specific protocol defined by Amazon AWS. The AWS SDK and third-party tools such as s3cmd or S3 Browser implement this protocol. Amazon S3’s client-side encryption protocol works as follows (Figure 2):

  1. The customer creates a secret master key, which remains with the customer.
  2. Before uploading a file to Amazon S3, a random encryption key is created and used to encrypt the file. The random encryption key, in turn, is encrypted with the customer’s master key.
  3. Both the encrypted file and the encrypted random key are uploaded to Amazon S3. The encrypted random key is stored with the file’s metadata.

When downloading data, the encrypted file and the encrypted random key are both downloaded. First, the encrypted random key is decrypted using the customer’s master key. Second, the encrypted file is decrypted using the now decrypted random key. All encryption and decryption happens on the client side. Never does Amazon S3 or any other third party (for example an ISP) see the data in the clear. Customers may upload client-side encrypted data using any clients or tools that support client-side encryption (AWS SDK, s3cmd, etc.).

Ingesting Client-Side Encrypted Data into Snowflake

Snowflake supports reading and writing to staging areas using Amazon S3’s client-side encryption protocol. In particular, Snowflake supports client-side encryption using a client-side master key.

end-to-end-encryption-06
Figure 3: Ingesting client-Side encrypted data into Snowflake

Ingesting client-side encrypted data from a customer-provided staging area into Snowflake (Figure 3) is just as easy as ingesting any other data into Snowflake. To ingest client-side encrypted data, the customer first creates a stage object with an additional master key parameter and then copies data from the stage into their database tables.

As an example, the following SQL snippet creates a stage object in Snowflake that supports client-side encryption:

-- create encrypted stage
create stage encrypted_customer_stage
url='s3://customer-bucket/data/'
credentials=(AWS _KEY_ID='ABCDEFGH' AWS_SECRET_KEY='12345678')
encryption=(MASTER_KEY='aBcDeFgHiJkL7890=');

The master key specified in this SQL command is the Base64-encoded string of the customer’s secret master key. As with all other credentials, this master key is transmitted to Snowflake over TLS (HTTPS) and stored in a secure, encrypted way in Snowflake’s metadata storage. Only the customer and query-processing components of Snowflake know the master key and are therefore able to decrypt data stored in the staging area.

As a side note, stage objects can be granted to other users within a Snowflake account without revealing S3 access credentials and client-side encryption keys to those users. This makes a stage object an interesting security feature in itself that is a Snowflake advantage over alternatives.

After the customer creates the stage object in Snowflake, the customer may copy data into their database tables. As an example, the following SQL command creates a database table “users” in Snowflake and copies data from the encrypted stage into the “users” table:

-- create table and ingest data from stage
create table users (id bigint, name varchar(500), purchases int);
copy into table from @encrypted_customer_stage/users;

The data is now ready to be analyzed using the Snowflake data warehouse. Of course, data can be offloaded into the staging area as well. As a last example, the following SQL command first creates a table “most_purchases” as the result of a query that finds the top 10 users with the most purchases, and then offloads the table into the staging area:

-- find top 10 users by purchases, unload into stage
create table most_purchases as select * from users order by purchases desc limit 10;
copy into @encrypted_customer_stage/most_purchases from most_purchases;

Snowflake encrypts the data files copied into the customer’s staging area using the master key stored in the stage object. Of course, Snowflake adheres to the client-side encryption protocol of Amazon S3. Therefore, the customer may download the encrypted data files using any clients or tools that support client-side encryption.

Summary

All customer data in the Snowflake warehouse is encrypted at transit and at rest. By supporting encryption for all types of staging areas as well, whether customer-owned staging areas or Snowflake-owned staging areas, Snowflake supports full end-to-end encryption in all cases. With end-to-end encryption, only the customer and runtime components of the Snowflake service can read the data. No third parties in the middle, for example Amazon AWS or any ISPs, see data in the clear. Therefore, end-to-end encryption secures data communicated with the Snowflake data warehouse service.

Protecting customer data at all levels is one of the pillars of the Snowflake data warehouse service. As such, Snowflake takes great care in securing the import and export of data into the Snowflake data warehouse. When importing and exporting data in Snowflake, customers may choose to use customer-provided staging areas or Snowflake-provided staging areas. Using customer-provided staging areas, customers may choose to upload data files using client-side encryption. Using Snowflake-provided staging, data files are always encrypted by default. Thus, Snowflake supports the end-to-end encryption in both cases.