The Virtual Private Snowflake Story

Snowflake was built as a secure, multi-tenant SaaS data warehouse. We’ve been offering a multi-tenant product designed for high-security needs for years now. But we knew from previous discussions with Capital One and other financial services companies that a dedicated solution would be required to meet their regulatory requirements.

The input we received from the financial services industry gave rise to our idea for a new product: Virtual Private Snowflake (VPS). Those deep discussions with industry customers helped shape and define the fundamental requirements of VPS:

  • Certifiably Secure with PCI support, validated by the customer’s security team.
  • The customer is in complete Control of the data with comprehensive Auditability. Customer-managed keys are a must, as is a complete record of all operations performed by the data warehouse.
  • Resilience to failures with a roadmap to cross-region business continuity. The long-term goal is a 15-minute recovery time from a total regional failure.
  • Isolation that delivers a dedicated instance of Snowflake, running in a separate Virtual Private Cloud.

Snowflake and our financial services customers have worked together since that time to further define the product requirements. Today, I’m proud to announce that our journey to building the most secure cloud data warehouse has reached its first milestone. Virtual Private Snowflake is now commercially available, with Capital One as our first customer.

VPS delivers the full power of the Snowflake service in the form of a fully managed, dedicated pod running in Amazon Web Services (AWS). Financial Services, healthcare and companies from many other industries that handle sensitive data get the best of all worlds with VPS. A dedicated instance of the best cloud data warehouse.

There are many milestones that are still ahead for the most secure, flexible and powerful cloud data warehouse available. But helping customers succeed with their data projects is what we do at Snowflake. That commitment is true for all of our customers.

We thank Capital One and our other financial services customers for their support of VPS. And we look forward to working with all of our customers to help them solve their toughest data challenges.

To learn more about VPS and about Snowflake’s approach to securing sensitive data, click here.

Data Encryption with Customer-Managed Keys

The security of customer data is Snowflake’s first priority. All customer data is encrypted using industry-standard techniques such as AES-256. Encryption keys are organized hierarchically, rooted in a hardware security module (HSM). This allows complete isolation of customer data and greatly reduces the attack vectors.

For customers with the highest security requirements, we are adding another security component: customer-managed keys. With customer-managed keys, the customer manages the encryption key and makes it available to Snowflake. The customer has full control over this key. If the customer disables access to the encryption key, Snowflake can no longer access the customer’s data. Your data. Your encryption keys.

In this blog post, we will explain the benefits of customer-managed keys and their implementation in the Snowflake cloud data warehouse.

Benefits

Customer-managed keys provide the following benefits:

More Control over Data Access: Customer-managed keys make it impossible for Snowflake to comply with requests to access customer data. If data is encrypted using customer-managed keys and the customer disables access to the encryption key, it is technically impossible for Snowflake to decrypt the data. It is therefore the customer’s responsibility to comply with such requests directly.

Stop Data Breaches: If a customer experiences a data breach, they may disable access of customer-managed keys to Snowflake. This will halt all running queries in Snowflake, including queries that may inspect data or unload data. Disabling customer-managed keys allows customers to stop ongoing exfiltration of their data.

More Control over Data Lifecycle: The last reason why customers require this feature is lack of trust with any cloud provider. Customers may have sensitive data that they do not trust Snowflake to manage on their own. Using customer-managed keys, such sensitive data is ultimately encrypted with the customer’s key. It is impossible for Snowflake to decrypt this data without the customer’s consent. The customer has full control over the data’s lifecycle.

Implementation

Before we explain the implementation of customer-managed keys, we should first give a background of Snowflake’s key hierarchy and Amazon’s key management service.

Background 1: Snowflake’s Key Hierarchy

Snowflake manages encryption keys hierarchically. Within this key hierarchy, a parent key encrypts all of its child keys. When a key encrypts another key, it is called “wrapping”. When the key is decrypted again, it is called “unwrapping”.

Encryption key hierarchy - Snowflake

Figure 1: Encryption key hierarchy in Snowflake.

Figure 1 shows Snowflake’s hierarchy of encryption keys. The top-most root keys are stored in a hardware security module (or CloudHSM). A root key wraps account master keys. Each account master key corresponds to one customer account in Snowflake. Account master keys, in turn, wrap all data-level keys, including table master keys, stage master keys, and result master keys. In addition, every single data file is encrypted with a separate key. A detailed overview of Snowflake’s encryption key management is provided in this Blog post.

Background 2: AWS Key Management Service

Amazon’s AWS Key Management Service (KMS) is a service to store encryption keys and tightly control access to them. Amazon provides an audit log of all operations and interactions with KMS by using CloudTrail. This allows customers to manage their own encryption keys and validate their usage via the audit log. KMS also allows customers to disable access to any keys at any time. Combining KMS with Snowflake’s encryption key hierarchy allows us to implement customer-managed keys. More details about AWS KMS can be found on the Amazon website.

Implementation of Customer-Managed Keys

The implementation of customer-managed keys changes the way account master keys (AMKs) are stored within Snowflake’s encryption key hierarchy. Normally, as shown in Figure 1 above, an AMK is wrapped by the root key stored in CloudHSM. For customer-managed keys, this is only partly true. There are two AMKs involved: a first key is wrapped by the root key stored in the CloudHSM and a second key is wrapped by the customer key in KMS. Unwrapping and combining these two keys leads to the composed account master key, which then wraps and unwraps all underlying keys in the hierarchy (table master keys, result master keys, etc.).

Account master key - Customer-Managed Keys

Figure 2: Account master key composed of AMK-S and AMK-C. AMK-C is wrapped by KMS.

Figure 2 shows this concept in detail. With customer-managed keys, the AMK is composed of two keys: AMK-S and AMK-C. AMK-S is a random 256-bit key that is wrapped with the root key stored in HSM. AMK-C is a second random 256-bit key that is wrapped with the customer key stored in KMS. AMK-S and AMK-C are completely random and unrelated. Both wrapped keys are stored in Snowflake’s encryption key hierarchy.

Figure 3: Unwrapping and composing of AMK.

When the customer runs a query in Snowflake that requires access to customer data, the composed AMK is produced as follows (see Figure 3). Both wrapped keys, AMK-S and AMK-C, are retrieved from the encryption key hierarchy. AMK-S is unwrapped using the root key in HSM. AMK-C is unwrapped using the customer key in KMS. The KMS audit log logs an access event to the customer key. Both unwrapped 256-bit keys are combined using XOR to form the composed AMK. The composed AMK is then used to unwrap the underlying table master keys to access the customer data.

The composed AMK is cached within the Snowflake data warehouse for performance reasons. This cache has a timeout period after which the cached AMK is not accessible anymore. The cache is refreshed in the background such that continuous queries are not impacted by any latency to KMS. If access to KMS is revoked, refreshing the cache fails and the AMK is removed from the cache immediately. Any running queries are aborted. New queries fail to start because no AMK can be composed. The customer’s data can no longer be decrypted by the Snowflake service.

Summary

Customer-managed keys provide an extra level of security for customers with sensitive data. With this feature, the customer manages the encryption key themselves and makes it accessible to Snowflake. If the customer decides to disable access, data can no longer be decrypted. In addition, all running queries are aborted. This has the following benefits for customers: (a) it makes it technically impossible for Snowflake to comply with requests for access to customer data, (b) the customer can actively mitigate data breaches and limit data exfiltration, and (c) it gives the customer full control over data lifecycle.

Availability

Customer-managed keys are a primary component of Tri-Secret Secure, a Snowflake Enterprise Edition for Sensitive Data (ESD) feature. To enable Tri-Secret Secure for your ESD account, you need to first create a key in AWS KMS (in your AWS account) and then contact Snowflake Support.

Acknowledgements

We want to thank Difei Zhang for his contributions to this project.

For more information, please feel free to reach out to us at info@snowflake.net. We would love to help you on your journey to the cloud. And keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and developments here at Snowflake Computing.

Snowflake: Seriously Serious About Security

How Serious Are We About Security? Extremely.

No self-respecting security team is ever satisfied with the existing security controls it has in place. Some mistake this dissatisfaction as a personality disorder, referring to their security team members as “control-freaks” or “over-achievers”. Let’s face it: security professionals tend to be an eccentric group. However, for a truly committed and competent security team, this eccentricity is simply the symptom of the healthy paranoia that comes with being responsible for the protection of vital infrastructure and sensitive data.

Snowflake’s security team has channeled this paranoia into a program we call Seriously Serious Security Testing. There are several components of this program, including the audit of all the usual administrative and technical controls you would expect from a cloud company. However, where Snowflake’s eccentricity truly surfaces is in our embrace of the dreaded Penetration Test. Here are the highlights of Snowflake’s security testing program and the key role that penetration testing plays.

First: What is a Penetration Test?

A penetration test is a controlled attempt to exploit vulnerabilities to determine whether unauthorized access or other malicious activity is possible within the target environment. This is a requirement for PCI-DSS compliance, and is also considered best practice for any organization that takes security seriously. Snowflake engages globally-recognized experts to perform this activity within specific constraints and guidelines as described in the Methodology section below.

Frequency

Most companies avoid penetration tests altogether. Others perform them annually at best, which is the minimum frequency required to meet the standards and certifications their auditors tell them they need. What many auditors don’t challenge is whether or not adequate penetration testing has been performed after every “material change” to the company’s product or infrastructure. It’s unlikely that performing penetration tests annually would be sufficient in a cloud environment where most vendors take pride in the frequent deployment of new features and functionality (Snowflake is no different in this regard with releases several times a month, at least). Because of these frequent changes, it’s important to ensure your cloud vendors are performing frequent penetration testing to ensure no new vulnerabilities have inadvertently been introduced.

Security Penetration Test - Frequency

source: https://www.rapid7.com/globalassets/_pdfs/whitepaperguide/rapid7-research-report-under-the-hoodie.pdf

Much to the irritation of our Operations and Engineering teams, Snowflake has performed more than 5 penetration tests in the past 6 months.

Why would we do this to ourselves? Because we want to know what our weaknesses are! The frequency with which we perform these tests provides Snowflake with the assurance that changes to the Snowflake service, as well as newly discovered vulnerabilities within other components of our environment, are not putting Snowflake or (more importantly) Snowflake’s customers and their data at risk.

Methodology

Another example of Snowflake Security’s paranoia is the approach we take with our penetration testers. Typical penetration testing engagements at Snowflake are designed to simulate the compromise of an employee’s or customer’s credentials by providing the tester with limited access to a non-production environment. Engagements run a minimum of two weeks and begin with providing the testers not only with the aforementioned credentials, but also with substantial information about the architecture, network design, and, when applicable, our application code. (This method is sometimes referred to as White Box Testing.) If, after a specific period of time, the testers have not been able to find an entry point, Snowflake gradually provides the testers with slightly more access until they are able to uncover vulnerabilities, or until the time is up.

Why would we divulge so much information? We want to know what ALL our weaknesses are! This provides us with visibility into what would happen if, for example, we had an insider attempting to gain unauthorized access to data. How far would they get? How quickly could we detect them? How would we contain them? And so on. The information is invaluable.

Security Penetration Test - Vulnerabilities

Most common vulnerabilities found by penetration testers

source: https://www.rapid7.com/globalassets/_pdfs/whitepaperguide/rapid7-research-report-under-the-hoodie.pdf

Transparency

The final example of Snowflake’s Seriously Serious Security Testing program is the highly unusual practice of sharing penetration test reports and remediation documentation with qualified prospects and customers (under NDA, transmitted securely, and with the promise of their first born if there is a compromise). By sharing our reports we are able to solicit additional feedback on ways to improve our testing.

I’ve been on both sides of the audit fence for years, and I’ve yet to find an organization as willing to share as much information about its penetration testing frequency and methodology as Snowflake. However, it comes as no surprise to anyone who has worked with Snowflake. Snowflake’s corporate culture is based on teamwork and collaboration, which spills over into Snowflake’s relationships with customers and vendors. We believe that transparency is the cornerstone to trust, and trust is the cornerstone to a healthy partnership between Snowflake and our customers. Providing the penetration test report and remediation evidence allows customers to see for themselves how seriously we take security, as well as how effective we are at achieving it. This allows our customers and prospects to make an informed decision about the risks they’re taking.

Conclusion

Security is a constantly moving target. Our team will never stop this extreme security testing of our infrastructure because threats are constantly evolving.

So…
Call us control freaks.
Call us over-achievers.
Call us paranoid.

One thing you’ll never call us is complacent…seriously.

For more information, please feel free to reach out to us at info@snowflake.net. We would love to help you on your journey to the cloud, securely. And keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and happenings here at Snowflake Computing.

 

Is My Data Safe in the Cloud?

This post comes to us from a guest blogger, Robert Lockard. Robert is an Oracle ACE with over 30 years experience as an Oracle DBA / Designer and Developer. He has spent the past 10 years focusing on database security. Robert frequently speaks at conferences all over the world, teaching DBA’s and Developers how to secure their data. You can follow him on twitter at @YourNavionPilot and read more of his blogs at http://www.oraclewizard.com.

I have four mantras for any system I’m involved in: Stability, Security, Accuracy and Performance.

When it comes to information security in the cloud, there are lots of recommendations for vendors and customers to consider. When I am called into customer sites to ensure their data is properly protected, my first step is making sure the person connecting to the database is authenticated properly and that encryption has been properly set up, along with a host of other related things. The amount of work required just to get data encrypted, make sure there is no ghost data, and ensure there is no spillage is a difficult process. From my perspective, Snowflake makes the end-to-end encryption easy because encryption is turned on by default. Your data is being encrypted from day one.

Here is my best advice on how to insure your critical business data is secure in the cloud along with my review of how Snowflake addresses these issues.

#1 – Demand Multi-factor Authentication

Snowflake supports using DUO multi-factor authentication to ensure the user who is authenticated to the database is also authorized. Stolen credentials are one of the top ways bad actors get in where they do not belong. The username / password paradigm of authentication has been around for several decades. Username / password harvesting and many of the data breaches, could be reduced by following best practices and a best practice is to use multi-factor authentication. This is something you know and something you have. When you authenticate with a username / password, a token will display a code;  you enter that code into the system and then you are authenticated. This is something Snowflake has built into their system. Multi-factor authentication is not an add-on option; it comes standard with their product.

In addition, Snowflake’s implementation of DUO also allows a verification request to be sent to a mobile device. When you enter your username / password, a request is sent to your mobile device. If you approve the request, you are connected. The beauty of this is, most of us always have our mobile phones with us. If you get a request that you did not originate, you know someone else is trying to use your credentials to get into the system. When you reject the request, they will be locked out.

#2 – Demand End-to-end Encryption

The customer should require end-to-end encryption from any cloud provider they decide to use. Snowflake delivers AES-256 encryption at no extra cost. The AES standard was adopted in 2001 by the National Institute of Standards and Technology (NIST). The AES standard has displaced the older DES standard. The AES standard uses 128 block sizes with three keys sizes 128, 192 and 256 bits. AES encryption is the US Government standard for strong encryption and has been adopted by financial, government and other commercial entities world wide.

#2a – Demand Sophisticated Encryption Key Management

Snowflake uses the Amazon Hardware Security Module (HSM) to manage sophisticated key structures such as key wrapping, key rotation and re-keying.

Key Wrapping (aka Hierarchical Keys)

Key wrapping has become the industry standard. Oracle provides it in Transparent Data Encryption with a master key that is stored outside the database to decrypt the keys that are used to secure the data. Microsoft SQL Server uses a certificate that is stored outside the database to decrypt the key that is used to decrypt the data. Snowflake uses four levels of keys to encrypt customers data.

Using this model, each customer’s data is encrypted with a unique set of keys. This way, customer “A”s data is encrypted with a different set of keys then customer “B”. By using multiple customer keys, each customer’s data is segregated, and secured, from the others.

See the Snowflake Encryption Key Management blog post for more details.

Cryptologic Key Life Time

National Institute of Standards (NIST) recommends having a policy in place so each key has a limited lifetime. The longer a key is in use, the greater the odds it will be compromised (see NIST Encryption Key Management Recommendation for more details). Snowflake uses two methods to control the lifetime of keys: key rotation and rekeying.

1) Cryptologic Key Rotation

One way to minimize the impact of a cryptologic key being compromised is to rotate the keys at a set interval. Snowflake rotates keys at a system defined interval. This ensures if a cryptologic key is compromised then the amount of data at risk is minimized.

The easy way to understand key rotation is this. Data is encrypted and decrypted using a key and, after a set period of time, another key is added; all new data is encrypted and decrypted with this new key and old data is decrypted with the old key. Again, see the Snowflake Encryption Key Management blog post for more details.

The advantage of using key rotation is two fold: a) Because you are adding a key, you don’t have to change the keying on the data that had been encrypted; you get performance. b) Because now there are multiple keys; if any key is compromised only a subset of information is vulnerable.

2) Cryptologic Rekeying

As an additional security option, customers can choose to have their data rekeyed annually. By rekeying the data annually, Snowflake minimizes the amount of time a key is in use, thereby minimizing the odds a cryptologic key will be compromised. Once data is rekeyed, the old keys are destroyed.

Network Encryption

Now we need to deal with data that is moving over the network. All network communications between loading the data and analyzing the data is encrypted using TLS (see Advantages of Using TLS for more details). In addition, for ESD (Enterprise with Sensitive Data) customers, all internal network communications is encrypted using TLS.

By using the TLS standard, Snowflake has implemented the industry best practice.

What about Performance and Stability?

There exists a perception that encryption has a negative performance impact on CPU. The HSM performs a variety of functions. It manages encrypting and decrypting keys, handles key rotation and rekeying. In addition, the HSM is only used to wrap and unwrap account master keys. All remaining encryption is performed by other services in the background, resulting in no impact on customer workloads.

The hardware encryption module is also tamper-resistant. Efforts to tamper with the hardware encryption module will shut the HSM down, yet Snowflake has configured the HSM to be stable and reliable.

Conclusion

Based on my evaluation, by using Snowflake, the customer gets three of my four mantras:

Security: You get multi-factor authentication and end-to-end encryption right out of the box.

Stability: Snowflake has configured the HSM to be very stable, plus it lives on AWS which has proven to be a very stable cloud platform.

Performance: Snowflake’s use of the Amazon Elastic Compute Cloud gives performance on demand.

My fourth manta, Accuracy, is up to you.

Encryption Everywhere

Data encryption, while vital, has not traditionally been within reach of all organizations primarily due to budget constraints and implementation complexity. By providing a centralized cloud solution for managing an enterprise’s data sources, Snowflake now makes it possible for all companies to achieve best practice in protecting their data by encrypting everything. To highlight this new paradigm for their CTO customers, the Silicon Valley Software Group (SV/SG) asked our Director of Security Mario Duarte to discuss this concept of “data encryption everywhere” on their blog.

Automatic Encryption of Data

Hopefully you had a chance to read our previous top 10 posts. As promised, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:

#2 Automatic Encryption of Data

One of the biggest worries people have about moving to the cloud is security. One key piece of providing enterprise class security is the ability to encrypt the data in your data warehouse environment. With Snowflake, your data is automatically encrypted by default.

No setup, no configuration, no add-on costs for high security features. Data is encrypted during its entire lifecycle. From loading data to storing data at rest, we apply end-to-end encryption, such that only the customer can read the data, and no one else. It is just part of the Snowflake service! That is a huge win for anyone who has ever tried to set up database security of any kind. In addition, this gives Snowflake a significant advantage compared to environments like Hadoop, where encryption and security is almost almost entirely left up to the customer to implement and maintain.

So what level of encryption?

For all data within Snowflake, we use strong AES 256-bit keys. Your data is encrypted as you load it. That is the default, and you cannot turn it off. In addition, our Snowflake security framework includes additional security best practices such as the use of a hierarchical key model and regular key rotation. All of this is automatic and transparent to our customers. In this way we provide our customers best-in-class data security as a service. For even more details on our approach to end-to-end encryption and how we secure your data, check out these blogs; end to end encryption and encryption key management. 

Remember the #10 Top feature – persistent result sets? Well those query results are also encrypted with 256-bit encryption keys (all the data, all the time).

So do you have your data at rest encrypted in your data warehouse today? Are your loading tools, and staging environments also encrypted?

If not, then put your data into the Snowflake Elastic Data Warehouse and rest easy knowing your data is safe and secure. Snowflake brings enterprise grade security to data warehousing in the cloud, with end-to-end encryption as a major part of our offering.

As a co-writer of this series, I would like to thank Kent Graziano, who has put in a lot of effort into bringing the thoughts behind this series to the audience. Without his persistence and vision, this would not have been possible. As always, keep an eye on this blog site, our Snowflake Twitter feed (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.

Kent Graziano and Saqib Mustafa

End-to-End Encryption in the Snowflake Data Warehouse

By Martin Hentschel and Peter Povinec.

Protecting customer data is one of the highest priorities for Snowflake. The Snowflake data warehouse encrypts all customer data by default, using the latest security standards, at no additional cost. Snowflake provides best-in-class key management, which is entirely transparent to customers. This makes Snowflake one of the easiest to use and most secure data warehouses on the market.

In previous blog posts, we explained critical components of Snowflake’s security architecture, including:

  • How Snowflake manages encryption keys; including the key hierarchy, automatic key rotation, and automatic re-encryption of data (“rekeying”)
  • How Snowflake uses Amazon CloudHSM to store and use Snowflake’s master keys with the highest protection
  • How Amazon CloudHSM is configured to run in high-availability mode.

In this blog post, we explain Snowflake’s ability to support end-to-end encryption, including:

  • How customers can upload their files to Amazon S3 using client-side encryption
  • How to use Snowflake to import and export client-side encrypted data.

End-to-End Encryption

End-to-end encryption is a form of communication where only the end users can read the data, but nobody else. For the Snowflake data warehouse service it means that only the customer and runtime components of the Snowflake service can read the data. No third parties, including Amazon AWS and any ISPs, can see data in the clear. This makes end-to-end encryption the most secure way to communicate with the Snowflake data warehouse service.

End-to-end encryption is important because it minimizes the attack surface. In the case of a security breach of any third party (for example of Amazon S3) the data is protected because it is always encrypted, regardless whether the breach is due to the exposure of access credentials indirectly or the exposure of data files directly, whether by an insider or by an external attacker, whether inadvertent or intentional. The encryption keys are only in custody of the customer and Snowflake. Nobody else can see the data in the clear – encryption works!


encryption_blog_diagrams_Diagram 1_Diagram 1
Figure 1: End-to-end Encryption in Snowflake

Figure 1 illustrates end-to-end encryption in the Snowflake data warehouse. There are three actors involved: the customer in its corporate network, a staging area, and the Snowflake data warehouse running in a secure virtual private cloud (VPC). Staging areas are either provided by the customer (option A) or by Snowflake (option B). Customer-provided staging areas are buckets or directories on Amazon S3 that the customer owns and manages. Snowflake-provided stages are built into Snowflake and are available to every customer in their account. In both cases, Snowflake supports end-to-end encryption.

The flow of end-to-end encryption in Snowflake is the following (illustrated in Figure 1):

  1. The customer uploads data to the staging area. If the customer uses their own staging area (option A), the customer may choose to encrypt the data files using client-side encryption. If the customer uses Snowflake’s staging area (option B), data files are automatically encrypted by default.
  2. The customer copies the data from the staging area into the Snowflake data warehouse. Within the Snowflake data warehouse, the data is transformed into Snowflake’s proprietary file format and stored on Amazon S3 (“data at rest”). In Snowflake, all data at rest is always encrypted.
  3. Results can be copied back into the staging area. Results are (optionally) encrypted using client-side encryption in the case of customer-provided staging areas or automatically encrypted in the case of Snowflake-provided staging areas.
  4. The customer downloads data from the staging area and decrypts the data on the client side.

In all of these steps, all data files are encrypted. Only the customer and runtime components of Snowflake can read the data. Snowflake’s runtime components decrypt the data in memory for query processing. No third-party service can see data in the clear.

Customer-provided staging areas are an attractive option for customers that already have data stored on Amazon S3, which they want to copy into Snowflake. If customers want extra security, they may use client-side encryption to protect their data. However, client-side encryption of customer-provided stages is optional.

Client-Side Encryption on Amazon S3

Client-side encryption, in general, is the most secure form of managing data on Amazon S3. With client-side encryption, the data is encrypted on the client before it is uploaded. That means, Amazon S3 only stores the encrypted version of the data and never sees data in the clear.

end-to-end-encryption-05

Figure 2: Uploading data to Amazon S3 using client-side encryption

Client-side encryption follows a specific protocol defined by Amazon AWS. The AWS SDK and third-party tools such as s3cmd or S3 Browser implement this protocol. Amazon S3’s client-side encryption protocol works as follows (Figure 2):

  1. The customer creates a secret master key, which remains with the customer.
  2. Before uploading a file to Amazon S3, a random encryption key is created and used to encrypt the file. The random encryption key, in turn, is encrypted with the customer’s master key.
  3. Both the encrypted file and the encrypted random key are uploaded to Amazon S3. The encrypted random key is stored with the file’s metadata.

When downloading data, the encrypted file and the encrypted random key are both downloaded. First, the encrypted random key is decrypted using the customer’s master key. Second, the encrypted file is decrypted using the now decrypted random key. All encryption and decryption happens on the client side. Never does Amazon S3 or any other third party (for example an ISP) see the data in the clear. Customers may upload client-side encrypted data using any clients or tools that support client-side encryption (AWS SDK, s3cmd, etc.).

Ingesting Client-Side Encrypted Data into Snowflake

Snowflake supports reading and writing to staging areas using Amazon S3’s client-side encryption protocol. In particular, Snowflake supports client-side encryption using a client-side master key.

end-to-end-encryption-06
Figure 3: Ingesting client-Side encrypted data into Snowflake

Ingesting client-side encrypted data from a customer-provided staging area into Snowflake (Figure 3) is just as easy as ingesting any other data into Snowflake. To ingest client-side encrypted data, the customer first creates a stage object with an additional master key parameter and then copies data from the stage into their database tables.

As an example, the following SQL snippet creates a stage object in Snowflake that supports client-side encryption:

-- create encrypted stage
create stage encrypted_customer_stage
url='s3://customer-bucket/data/'
credentials=(AWS _KEY_ID='ABCDEFGH' AWS_SECRET_KEY='12345678')
encryption=(MASTER_KEY='aBcDeFgHiJkL7890=');

The master key specified in this SQL command is the Base64-encoded string of the customer’s secret master key. As with all other credentials, this master key is transmitted to Snowflake over TLS (HTTPS) and stored in a secure, encrypted way in Snowflake’s metadata storage. Only the customer and query-processing components of Snowflake know the master key and are therefore able to decrypt data stored in the staging area.

As a side note, stage objects can be granted to other users within a Snowflake account without revealing S3 access credentials and client-side encryption keys to those users. This makes a stage object an interesting security feature in itself that is a Snowflake advantage over alternatives.

After the customer creates the stage object in Snowflake, the customer may copy data into their database tables. As an example, the following SQL command creates a database table “users” in Snowflake and copies data from the encrypted stage into the “users” table:

-- create table and ingest data from stage
create table users (id bigint, name varchar(500), purchases int);
copy into table from @encrypted_customer_stage/users;

The data is now ready to be analyzed using the Snowflake data warehouse. Of course, data can be offloaded into the staging area as well. As a last example, the following SQL command first creates a table “most_purchases” as the result of a query that finds the top 10 users with the most purchases, and then offloads the table into the staging area:

-- find top 10 users by purchases, unload into stage
create table most_purchases as select * from users order by purchases desc limit 10;
copy into @encrypted_customer_stage/most_purchases from most_purchases;

Snowflake encrypts the data files copied into the customer’s staging area using the master key stored in the stage object. Of course, Snowflake adheres to the client-side encryption protocol of Amazon S3. Therefore, the customer may download the encrypted data files using any clients or tools that support client-side encryption.

Summary

All customer data in the Snowflake warehouse is encrypted at transit and at rest. By supporting encryption for all types of staging areas as well, whether customer-owned staging areas or Snowflake-owned staging areas, Snowflake supports full end-to-end encryption in all cases. With end-to-end encryption, only the customer and runtime components of the Snowflake service can read the data. No third parties in the middle, for example Amazon AWS or any ISPs, see data in the clear. Therefore, end-to-end encryption secures data communicated with the Snowflake data warehouse service.

Protecting customer data at all levels is one of the pillars of the Snowflake data warehouse service. As such, Snowflake takes great care in securing the import and export of data into the Snowflake data warehouse. When importing and exporting data in Snowflake, customers may choose to use customer-provided staging areas or Snowflake-provided staging areas. Using customer-provided staging areas, customers may choose to upload data files using client-side encryption. Using Snowflake-provided staging, data files are always encrypted by default. Thus, Snowflake supports the end-to-end encryption in both cases.

Making Data Warehousing Easy

Legacy Problems

Organizations with legacy on-premise data warehouses spend a lot of time and money managing their environments and keeping up with business demands. Because of the size of the investments, organizations often run their data warehouses close to full utilization. While this may meet their current needs, the inherent lack of scalability could mean compromises on performance or failure to meet SLAs when more workloads, data sources and users need to be added.  Then the journey begins to add more capacity.  Organizations often need to acquire specialized resources, or become reliant on legacy vendors to manage and maintain the environment. All of this means that if there is a spike in demand for the environment, these organizations cannot accommodate the growth without impacting performance or must absorb additional costs for a greater footprint that waits unused until needed for that brief spike in the future.

Dealing with Growth

With performance concerns come the typical headaches of any data warehouse environment. These include, but are not limited to growing the environment, finding qualified resources for  performance tuning, optimizing queries, and dealing with concurrency and user growth. On the other hand businesses are facing stiffer competition, and end users are clamoring for faster answers to their business questions. In the past, data warehousing was limited to a set of users typically in marketing or finance. Now even field sales reps want access to up-to-date data, creating more load on the data warehouse. Plus the more data you have, the more important security becomes and the cost for performance increases. So now organizations are not only keeping the lights on, but also increasing spending to get performance, and securing the environment. In short, data warehouses have become more difficult to maintain and run!

An example of an organization facing this challenge of scalability is CapSpecialty, a leading provider of specialty insurance for small to mid-sized businesses. CapSpecialty used a legacy data warehouse to support analytics needed by their actuarial users to understand how to price and package products in various geographies. With the increased demand for access to this data, this legacy environment required a significant upgrade. Performance impacts led to users having to start their queries before leaving the office for the weekend, hoping they would be completed when they returned to work on Monday. The legacy environment also limited their ability to report on important KPIs that were critical to running the business in a timely manner. As with any financial organization, the environment also needed to provide a very secure environment to store the crown jewels: customers’ risk profile and related financial data.Unfortunately, upgrading their environment to meet this increased demand was going to cost them $500K just for licensing, and that would only give them a 2X increase in performance. This does not even include the costs for deployment, management and hosting for the new environment.

Making Data Warehousing Easy

The need for a scalable, more cost effective solution led them to Snowflake. After evaluating a number of data warehouse options,CapSpecialty decided to implement the Snowflake cloud-based Elastic Data Warehouse. Besides offering an attractive cost structure, Snowflake’s true cloud solution delivered ease of migration and scalability. With Snowflake, CapSpecialty was up and running in less than a week. In addition to achieving an increase of 200x query performance, they leveraged existing infrastructure and were set up to scale for future growth. Snowflake also provided end to end enterprise level security to protect their sensitive financial data in the cloud.

CapSpecialty underwriters are now able to analyze 10 years’ worth of governed data in 15 minutes. The stage has also been set for CapSpecialty executives to view dashboards that display real-time profitability and KPIs. Using Snowflake, CapSpecialty can also bring semi-structured data to the environment, and serve the analytics to their field agents to effectively market their products in various geographies.

To learn the details of how Snowflake made data warehousing easy for CapSpecialty, we encourage you to read more in the case study. You can also attend the our webinar  on April 27th, 2016, 10:00 AM-PST/1:00 PM-EST, to find out how Snowflake and Microstrategy enable CapSpecialty analysts to understand data in real time.

Are Data Security Breaches Accelerating the Shift to the Cloud?

There is an old saying that there are two things certain in life: death and taxes. I would like to add a third one–data security breaches. The Identity Theft Resource Center (ITRC) defines a data security breach as “an incident in which an individual name plus a Social Security, driver’s license number, medical record or financial records (credit/debit cards included) is potentially put at risk because of exposure.” The ITRC reports that 717 data breaches have occurred this year exposing over 176 million records.

On the surface, finding a pattern across all such breaches may appear daunting considering how varied the targeted companies are. However, the ITRC argues that the impacted organizations are similar in that all of the data security breaches contained “personally identifiable information (PII) in a format easily read by thieves, in other words, not encrypted.” Based on my experience, I’d expect that a significant portion of the data breaches compromised data in on-premises systems. Being forced to realize the vulnerability of on-premises systems, organizations are beginning to rethink their cloud strategy.

For example, Tara Seals declares in her recent Infosecurity Magazine article that “despite cloud security fears, the ongoing epidemic data breaches is likely to simply push more enterprises towards the cloud.” Is the move to the cloud simply a temporary, knee-jerk reaction to the growing trend in security breaches or are we witnessing a permanent shift towards the cloud? Some industry experts conclude that a permanent shift is happening. Tim Jennings from Ovum for example, believes that a driving force behind enterprises’ move to the cloud is that they lack the in-house security expertise to deal with today’s threats and highly motivated bad actors. Perhaps the headline from the Onion, which declares “China Unable To Recruit Hackers Fast Enough To Keep Up With Vulnerabilities In U.S. Security Systems” is not so funny after all.

But are the cloud and cloud offerings more secure than their on-premises counterparts? Tara Seals appears to suggest that they can be when she writes that, “Modern cloud providers have invested large sums of money into end-to-end security” by providing sophisticated security intelligence.” Let’s consider data encryption as an illustration of her point.

The principle behind safeguarding information by leveraging encryption is as old as the Roman Empire, with most organizations agreeing that it is an effective way to minimize the impact of a security breach. But if that is true, what is behind ITRC’s observation that PII was not encrypted by the impacted organizations?

The truth of the matter is that encryption is hard. Take the example of storing encryption keys using Hardware Security Modules (HSMs). In general, using an HSM is a good security practice for safeguarding encryption keys and for meeting government standards and compliance requirements. However, its utility is as useful as an unlocked safe without the proper security and operational controls to protect it. To that end, organizations moving to the cloud need to understand their cloud provider’s encryption framework to measure their effectiveness in thwarting an intruder’s attack. Things to consider when assessing a cloud provider’s encryption solution include:

  1. Encryption key wrapping strategies
  2. Rotation encryption key frequency
  3. Methods for rekeying encryption keys
  4. Ability to monitor, log, and alert when suspicious activities are performed against the HSM

Tim Jennings and Tara Seals present compelling arguments for the possible security advantage of cloud providers over their on-premises counterparts. However, I feel that there are other equally or possibly more compelling reasons than just that cloud providers have more talented security experts.

The systems that organizations use to store and analyze data are often critical to the business. As a result, any planned or unplanned outage can significantly impact productivity and may even result in lost revenue. Now imagine the position that a CISO may find herself when requesting that an emergency security patch be deployed under the aforementioned situation. Even under the best conditions, coordinating and deploying a security update may take weeks if not months, which ultimately leaves the system vulnerable to a bad actor. That’s where a cloud solution can outperform its on-premises counterpart. An effective cloud solution allows one to almost instantly deploy security updates without impacting consumers of its services and thus reducing the time that the system is vulnerable.

Alas, PII data is so financially attractive of a target, whether the data is located on-premises or on the cloud, that one should more and more attempts—some of which will succeed—to breach systems in the cloud as organizations continue to leverage more cloud services. It is therefore imperative that organizations perform their due diligence when selecting the right security-focused cloud services partners.

Unsafe Harbor? Data Privacy and Security Post Safe Harbor

On October 6th, the European Court of Justice declared the Safe Harbor structure to be invalid. Since its establishment in 2000, Safe Harbor has provided a simple, legal mechanism to transfer Personal Data outside of Europe to data centers located in the United States. Safe Harbor as we’ve known it is gone forever and European regulators have indicated that they plan to begin strict enforcement of this at the end of January. This news is leaving many American companies reeling.

There is a scramble within political bodies to find a solution for this and to establish new regulation, but given the differences in U.S. and European privacy laws it is highly unlikely that a Safe Harbor replacement will be created before the January deadline. Perhaps the deadline will be extended, but given the definitive nature of the ruling it seems unwise to count on that.

Some companies are looking at an alternative structure called Model Clauses to replace Safe Harbor. Our analysis of this structure is that it is cumbersome at best, and in the long-term Model Clauses may be overturned by a future European court ruling as they do not address the fundamental differences in privacy policy between Europe and the U.S.

Unlike Safe Harbor, Model Clauses are a contract between organizations that have access to Personal Data. The challenge is that these clauses require agreement by all parties involved – the company providing the service, the recipient of the service, and all subcontractors that process the Personal Data. Although Model Clauses are generally standardized, they need to be individually negotiated. In some cases, Model Clauses require individual registration within the jurisdiction where the Personal Data is collected, often the home country of the relevant European citizen. Beyond these complexities, Model Clauses could be challenged in court so this messy solution may be temporary at best.

So what is the answer if your business relies on processing European Personal Data in the United States? Snowflake’s recommendation is to leave the Personal Data in Europe. If it is feasible to perform all data processing in Europe, that will solve your European Personal Data problem. But duplicating your entire application stack in Europe is impractical for some companies. The alternative is a hybrid solution. If it is preferable to perform some processing in the U.S., your best option is to create a mapping within your European data center between the Personal Data and a unique ID. As long as that unique ID is anonymized and does not contain any descriptive information about the user, it can be sent safely over to the U.S. If further actions that require Personal Data need to be performed (for example, sending an email to the user), the ID’s must be sent back to Europe and those steps must be performed on European soil.

While an anonymized ID can be safely sent to the U.S., descriptive information about that ID may render it identifiable and as such, it would be considered Personal Data under European law. These rules vary by European country so, for example, associating an ID with gender, location, and age would be considered identifiable by French law and thus this information cannot be sent to the U.S. While the approach of sending only anonymized ID’s to the U.S. is generally safe, the details are important so you need to validate your solution with legal council experienced in European privacy law.

Of course, keeping Personal Data outside the U.S. means establishing some form of data center footprint within Europe. Fortunately, this is much easier because of the prevalence of cloud services. And while the approach of leaving Personal Data within Europe adds complexity to a solution, this complexity appears to be more manageable than the legal quagmire of Safe Harbor alternatives.

There is a reasonable argument that storing data within a sovereign nation is no longer a sound basis for privacy protection. In today’s connected world, every time a European citizen travels abroad, their Personal Data moves with them and thus is potentially vulnerable. Snowflake believes in pervasive use of strong data encryption to protect data. In the long-term, we feel that thoughtful, encryption-based privacy laws can replace today’s antiquated policies which rely on the location of data storage.

However, the reality is that government policy moves much slower than technology, and it will be quite some time before these laws are updated to reflect the reality of today’s world. Until then, the only thing we can do is abide by the laws of individual countries. For now, the only truly safe harbor for European Personal Data is to leave that data in Europe.

Bob Muglia
CEO, Snowflake Computing