Snowflake’s Remediation Plans for the Meltdown and Spectre Vulnerabilities

Meltdown

Meltdown is a hardware vulnerability that primarily affects Intel x86 processors. An attacker must have local access on the target system and must be able to run their rogue code to successfully exploit the Meltdown vulnerability. Moreover, security researchers have  determined that Meltdown poses a clear risk to a virtualized environment.

In lieu of that, we commend the major cloud IaaS providers such as AWS for recognizing the threat, rolling-up their sleeves and quickly deploying a remediation security update. All indications from AWS suggest they have successfully remediated this vulnerability. Since Snowflake security tightly controls the code that can be run on our production servers, the main threat for data exposure is cross-VM attacks which AWS has remediated with its hypervisor patch.

There has also been a lot of concern about performance degradation after AWS deployed the security update. Our current internal performance results fall well within the noise range. In other words, we have not detected any significant impact to performance.

In addition, AWS has published an AWS kernel update so customers can deploy it to their respective VMs. However, Snowflake’s defense-in-depth approach adequately addresses the impact of Meltdown in the Snowflake service because we have a tight control of who can access our production environment. We limit this access to only those who need to perform administrative and security support. We also enforce several forms of multi-factor authentications before anyone can access the production VPC,  and we monitor all system changes on our servers and ensure those changes are authorized and secure. Although our security architecture does not require the AWS kernel patch for security reasons, we are evaluating the performance impact of this patch and will install it in all situations that do not materially impact the experience of our customers. Moreover, we have updated all of our Snowflake endpoints such as our company laptops.  

Spectre

Snowflake currently considers the Spectre Variant 1 vulnerability (CVE-2017-5753) as the most risky of the three new classes of speculative attacks (e.g., Spectre Variant 1, Spectre Variant 2, and Meltdown) because it has the ability to exploit browsers via JavaScript. Therefore, we have deployed all available browser Spectre patches to all of our Snowflake endpoints and we will continue to quickly deploy new browser Spectre patches when they become publicly available.

Outside of the browser attack surface, we will continue to remediate this vulnerability across our environment as vendors take proactive measures by releasing security updates. For example, we have deployed a vendor’s Spectre security update in our test environment and we are currently running regression and performance tests. We expect to deploy such a patch to the production environment shortly.

In the interim, we are monitoring our environment and continue to research for potential exploits by leveraging our security partners.

Customers

It is also critical that our customers update their systems, especially if they may execute untrusted code, which could be vulnerable to Meltdown or Spectre. This includes updating user web browsers with vendor-provided updates as soon as possible. We also recommend that customers leverage two-factor authentication whenever possible. As such, Snowflake generally recommends that customers use our MFA services and our IP whitelist features for interactive logins to their Snowflake account for defense-in-depth.

Conclusion

We will continue to send customer updates as we reach patch deployment milestones or if we detect significant system performance issues with the mitigations associated with these vulnerabilities.

Try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.

PrivateLink for Snowflake: No Internet Required

Improve Security and Simplify Connectivity with PrivateLink for Snowflake

AWS recently announced PrivateLink, the newest generation of VPC Endpoints that allows direct and secure connectivity between AWS VPCs, without traversing the public Internet. We’ve been working closely with the AWS product team to integrate PrivateLink with Snowflake and we’re  excited to be among the first launch partners. By integrating with PrivateLink, we allow customers with strict security policies to connect to Snowflake without exposing their data to the Internet. In this blog post, we’ll highlight how PrivateLink enhances our existing security capabilities, and how customers can easily set up PrivateLink with Snowflake.

Snowflake is an enterprise-grade, cloud data warehouse with a unique, multi-cluster, shared data architecture purpose-built for the cloud. From day one, security has been a central pillar of Snowflake’s architecture, with advanced security features baked into the solution. Customers get varying levels of security from Snowflake’s five different product editions: Standard, Premier, Enterprise, Enterprise for Sensitive Data (ESD) and Virtual Private Snowflake (VPS).

Across all editions, Snowflake provides a secure environment for customer data, protecting it in-transit and at rest. All customer data is encrypted by default using the latest security standards and best practices, and validated by compliance with industry-standard security protocols. In addition, customers have access to a host of security features and data protection enhancements such as IP whitelisting, role-based access control, and multi-factor authentication.

As shown in figure 1 below, Snowflake’s multi-tenant service runs inside a Virtual Private Cloud (VPC), isolating and limiting access to its internal components. Incoming traffic from customer VPCs is routed through an Elastic Load Balancer (ELB) to the Snowflake VPC.

For customers working with highly sensitive data or with specific compliance requirements, such as HIPAA and PCI, Snowflake offers Enterprise for Sensitive Data (ESD). With ESD edition, customer data is encrypted in transit across all networks including within Snowflake’s own VPC. ESD customers also benefit from additional security features such as Tri-Secret Secure, giving them full control over access to their data. See figure 2 below.

Earlier this year, we also introduced a private, single-tenant version of the Snowflake service – Virtual Private Snowflake. VPS, which is the most advanced and secure edition of Snowflake, includes all features of ESD and addresses the specific needs of regulated companies such as those in the financial industries. With VPS, customers get a dedicated and managed instance of Snowflake within a separate, dedicated VPC. Additionally, VPS customers can use secure proxies for egress traffic control to minimize risks associated with their internal users and systems communicating with unauthorized external hosts, as shown in figure 3 below:

But we recognize that a key area of concern for some customers has been around how data is sent from their private subnet to Snowflake. These customers need to enforce restrictive firewall rules on egress traffic. Others have restrictive policies about their resources accessing the Internet at all. So, how do you send data without allowing unrestricted outbound access to the public Internet and without violating existing security compliance requirements?

Enter AWS PrivateLink: a purpose-built technology that enables direct, secure connectivity among VPCs while keeping network traffic within the AWS network. Using PrivateLink, customers can connect to Snowflake without going over the public Internet, and without requiring proxies to be setup between Snowflake and their network as a stand-in solution for egress traffic control. Instead, all communication between the customer VPC and Snowflake is performed within the AWS private network backbone.

Snowflake leverages PrivateLink by running its service behind a Network Load Balancer (NLB) and shares the endpoint with customers’ VPCs. The Snowflake endpoint appears in the customer VPC, enabling direct connectivity to Snowflake via private IP addresses. Customers can then accept the end point and choose which of their VPCs and subnets to have access to Snowflake. This effectively allows Snowflake to function like a service that is hosted directly on the customer’s private network. Figures 4 and 5 show PrivateLink connectivity from customer VPCs to Snowflake in both multi-tenant (ESD) and single-tenant (VPS) scenarios.

Additionally, customers can access PrivateLink endpoints from their on-premise network via AWS Direct Connect, allowing them to connect all their virtual and physical environments in a single, private network. As such, Direct Connect can be used in conjunction with PrivateLink to connect customer’s datacenter to Snowflake. See figure 6 below.

Snowflake already delivers the world’s most secure data warehouse built for the cloud. Our ESD and VPS product editions are designed to address the highest security needs and compliance requirements of organizations large and small. With PrivateLink, we’re taking that a step further by allowing our customers to establish direct and private connectivity to Snowflake, without ever exposing their data to the public Internet.

PrivateLink is available to all Snowflake customers with ESD and VPS product editions. You can visit our user guide for instructions on how to get started with PrivateLink.

You can also try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.

The Virtual Private Snowflake Story

Snowflake was built as a secure, multi-tenant SaaS data warehouse. We’ve been offering a multi-tenant product designed for high-security needs for years now. But we knew from previous discussions with Capital One and other financial services companies that a dedicated solution would be required to meet their regulatory requirements.

The input we received from the financial services industry gave rise to our idea for a new product: Virtual Private Snowflake (VPS). Those deep discussions with industry customers helped shape and define the fundamental requirements of VPS:

  • Certifiably Secure with PCI support, validated by the customer’s security team.
  • The customer is in complete Control of the data with comprehensive Auditability. Customer-managed keys are a must, as is a complete record of all operations performed by the data warehouse.
  • Resilience to failures with a roadmap to cross-region business continuity. The long-term goal is a 15-minute recovery time from a total regional failure.
  • Isolation that delivers a dedicated instance of Snowflake, running in a separate Virtual Private Cloud.

Snowflake and our financial services customers have worked together since that time to further define the product requirements. Today, I’m proud to announce that our journey to building the most secure cloud data warehouse has reached its first milestone. Virtual Private Snowflake is now commercially available, with Capital One as our first customer.

VPS delivers the full power of the Snowflake service in the form of a fully managed, dedicated pod running in Amazon Web Services (AWS). Financial Services, healthcare and companies from many other industries that handle sensitive data get the best of all worlds with VPS. A dedicated instance of the best cloud data warehouse.

There are many milestones that are still ahead for the most secure, flexible and powerful cloud data warehouse available. But helping customers succeed with their data projects is what we do at Snowflake. That commitment is true for all of our customers.

We thank Capital One and our other financial services customers for their support of VPS. And we look forward to working with all of our customers to help them solve their toughest data challenges.

To learn more about VPS and about Snowflake’s approach to securing sensitive data, click here.

Data Encryption with Customer-Managed Keys

The security of customer data is Snowflake’s first priority. All customer data is encrypted using industry-standard techniques such as AES-256. Encryption keys are organized hierarchically, rooted in a hardware security module (HSM). This allows complete isolation of customer data and greatly reduces the attack vectors.

For customers with the highest security requirements, we are adding another security component: customer-managed keys. With customer-managed keys, the customer manages the encryption key and makes it available to Snowflake. The customer has full control over this key. If the customer disables access to the encryption key, Snowflake can no longer access the customer’s data. Your data. Your encryption keys.

In this blog post, we will explain the benefits of customer-managed keys and their implementation in the Snowflake cloud data warehouse.

Benefits

Customer-managed keys provide the following benefits:

More Control over Data Access: Customer-managed keys make it impossible for Snowflake to comply with requests to access customer data. If data is encrypted using customer-managed keys and the customer disables access to the encryption key, it is technically impossible for Snowflake to decrypt the data. It is therefore the customer’s responsibility to comply with such requests directly.

Stop Data Breaches: If a customer experiences a data breach, they may disable access of customer-managed keys to Snowflake. This will halt all running queries in Snowflake, including queries that may inspect data or unload data. Disabling customer-managed keys allows customers to stop ongoing exfiltration of their data.

More Control over Data Lifecycle: The last reason why customers require this feature is lack of trust with any cloud provider. Customers may have sensitive data that they do not trust Snowflake to manage on their own. Using customer-managed keys, such sensitive data is ultimately encrypted with the customer’s key. It is impossible for Snowflake to decrypt this data without the customer’s consent. The customer has full control over the data’s lifecycle.

Implementation

Before we explain the implementation of customer-managed keys, we should first give a background of Snowflake’s key hierarchy and Amazon’s key management service.

Background 1: Snowflake’s Key Hierarchy

Snowflake manages encryption keys hierarchically. Within this key hierarchy, a parent key encrypts all of its child keys. When a key encrypts another key, it is called “wrapping”. When the key is decrypted again, it is called “unwrapping”.

Encryption key hierarchy - Snowflake

Figure 1: Encryption key hierarchy in Snowflake.

Figure 1 shows Snowflake’s hierarchy of encryption keys. The top-most root keys are stored in a hardware security module (or CloudHSM). A root key wraps account master keys. Each account master key corresponds to one customer account in Snowflake. Account master keys, in turn, wrap all data-level keys, including table master keys, stage master keys, and result master keys. In addition, every single data file is encrypted with a separate key. A detailed overview of Snowflake’s encryption key management is provided in this Blog post.

Background 2: AWS Key Management Service

Amazon’s AWS Key Management Service (KMS) is a service to store encryption keys and tightly control access to them. Amazon provides an audit log of all operations and interactions with KMS by using CloudTrail. This allows customers to manage their own encryption keys and validate their usage via the audit log. KMS also allows customers to disable access to any keys at any time. Combining KMS with Snowflake’s encryption key hierarchy allows us to implement customer-managed keys. More details about AWS KMS can be found on the Amazon website.

Implementation of Customer-Managed Keys

The implementation of customer-managed keys changes the way account master keys (AMKs) are stored within Snowflake’s encryption key hierarchy. Normally, as shown in Figure 1 above, an AMK is wrapped by the root key stored in CloudHSM. For customer-managed keys, this is only partly true. There are two AMKs involved: a first key is wrapped by the root key stored in the CloudHSM and a second key is wrapped by the customer key in KMS. Unwrapping and combining these two keys leads to the composed account master key, which then wraps and unwraps all underlying keys in the hierarchy (table master keys, result master keys, etc.).

Account master key - Customer-Managed Keys

Figure 2: Account master key composed of AMK-S and AMK-C. AMK-C is wrapped by KMS.

Figure 2 shows this concept in detail. With customer-managed keys, the AMK is composed of two keys: AMK-S and AMK-C. AMK-S is a random 256-bit key that is wrapped with the root key stored in HSM. AMK-C is a second random 256-bit key that is wrapped with the customer key stored in KMS. AMK-S and AMK-C are completely random and unrelated. Both wrapped keys are stored in Snowflake’s encryption key hierarchy.

Figure 3: Unwrapping and composing of AMK.

When the customer runs a query in Snowflake that requires access to customer data, the composed AMK is produced as follows (see Figure 3). Both wrapped keys, AMK-S and AMK-C, are retrieved from the encryption key hierarchy. AMK-S is unwrapped using the root key in HSM. AMK-C is unwrapped using the customer key in KMS. The KMS audit log logs an access event to the customer key. Both unwrapped 256-bit keys are combined using XOR to form the composed AMK. The composed AMK is then used to unwrap the underlying table master keys to access the customer data.

The composed AMK is cached within the Snowflake data warehouse for performance reasons. This cache has a timeout period after which the cached AMK is not accessible anymore. The cache is refreshed in the background such that continuous queries are not impacted by any latency to KMS. If access to KMS is revoked, refreshing the cache fails and the AMK is removed from the cache immediately. Any running queries are aborted. New queries fail to start because no AMK can be composed. The customer’s data can no longer be decrypted by the Snowflake service.

Summary

Customer-managed keys provide an extra level of security for customers with sensitive data. With this feature, the customer manages the encryption key themselves and makes it accessible to Snowflake. If the customer decides to disable access, data can no longer be decrypted. In addition, all running queries are aborted. This has the following benefits for customers: (a) it makes it technically impossible for Snowflake to comply with requests for access to customer data, (b) the customer can actively mitigate data breaches and limit data exfiltration, and (c) it gives the customer full control over data lifecycle.

Availability

Customer-managed keys are a primary component of Tri-Secret Secure, a Snowflake Enterprise Edition for Sensitive Data (ESD) feature. To enable Tri-Secret Secure for your ESD account, you need to first create a key in AWS KMS (in your AWS account) and then contact Snowflake Support.

Acknowledgements

We want to thank Difei Zhang for his contributions to this project.

For more information, please feel free to reach out to us at info@snowflake.net. We would love to help you on your journey to the cloud. And keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and developments here at Snowflake Computing.

Snowflake: Seriously Serious About Security

How Serious Are We About Security? Extremely.

No self-respecting security team is ever satisfied with the existing security controls it has in place. Some mistake this dissatisfaction as a personality disorder, referring to their security team members as “control-freaks” or “over-achievers”. Let’s face it: security professionals tend to be an eccentric group. However, for a truly committed and competent security team, this eccentricity is simply the symptom of the healthy paranoia that comes with being responsible for the protection of vital infrastructure and sensitive data.

Snowflake’s security team has channeled this paranoia into a program we call Seriously Serious Security Testing. There are several components of this program, including the audit of all the usual administrative and technical controls you would expect from a cloud company. However, where Snowflake’s eccentricity truly surfaces is in our embrace of the dreaded Penetration Test. Here are the highlights of Snowflake’s security testing program and the key role that penetration testing plays.

First: What is a Penetration Test?

A penetration test is a controlled attempt to exploit vulnerabilities to determine whether unauthorized access or other malicious activity is possible within the target environment. This is a requirement for PCI-DSS compliance, and is also considered best practice for any organization that takes security seriously. Snowflake engages globally-recognized experts to perform this activity within specific constraints and guidelines as described in the Methodology section below.

Frequency

Most companies avoid penetration tests altogether. Others perform them annually at best, which is the minimum frequency required to meet the standards and certifications their auditors tell them they need. What many auditors don’t challenge is whether or not adequate penetration testing has been performed after every “material change” to the company’s product or infrastructure. It’s unlikely that performing penetration tests annually would be sufficient in a cloud environment where most vendors take pride in the frequent deployment of new features and functionality (Snowflake is no different in this regard with releases several times a month, at least). Because of these frequent changes, it’s important to ensure your cloud vendors are performing frequent penetration testing to ensure no new vulnerabilities have inadvertently been introduced.

Security Penetration Test - Frequency

source: https://www.rapid7.com/globalassets/_pdfs/whitepaperguide/rapid7-research-report-under-the-hoodie.pdf

Much to the irritation of our Operations and Engineering teams, Snowflake has performed more than 5 penetration tests in the past 6 months.

Why would we do this to ourselves? Because we want to know what our weaknesses are! The frequency with which we perform these tests provides Snowflake with the assurance that changes to the Snowflake service, as well as newly discovered vulnerabilities within other components of our environment, are not putting Snowflake or (more importantly) Snowflake’s customers and their data at risk.

Methodology

Another example of Snowflake Security’s paranoia is the approach we take with our penetration testers. Typical penetration testing engagements at Snowflake are designed to simulate the compromise of an employee’s or customer’s credentials by providing the tester with limited access to a non-production environment. Engagements run a minimum of two weeks and begin with providing the testers not only with the aforementioned credentials, but also with substantial information about the architecture, network design, and, when applicable, our application code. (This method is sometimes referred to as White Box Testing.) If, after a specific period of time, the testers have not been able to find an entry point, Snowflake gradually provides the testers with slightly more access until they are able to uncover vulnerabilities, or until the time is up.

Why would we divulge so much information? We want to know what ALL our weaknesses are! This provides us with visibility into what would happen if, for example, we had an insider attempting to gain unauthorized access to data. How far would they get? How quickly could we detect them? How would we contain them? And so on. The information is invaluable.

Security Penetration Test - Vulnerabilities

Most common vulnerabilities found by penetration testers

source: https://www.rapid7.com/globalassets/_pdfs/whitepaperguide/rapid7-research-report-under-the-hoodie.pdf

Transparency

The final example of Snowflake’s Seriously Serious Security Testing program is the highly unusual practice of sharing penetration test reports and remediation documentation with qualified prospects and customers (under NDA, transmitted securely, and with the promise of their first born if there is a compromise). By sharing our reports we are able to solicit additional feedback on ways to improve our testing.

I’ve been on both sides of the audit fence for years, and I’ve yet to find an organization as willing to share as much information about its penetration testing frequency and methodology as Snowflake. However, it comes as no surprise to anyone who has worked with Snowflake. Snowflake’s corporate culture is based on teamwork and collaboration, which spills over into Snowflake’s relationships with customers and vendors. We believe that transparency is the cornerstone to trust, and trust is the cornerstone to a healthy partnership between Snowflake and our customers. Providing the penetration test report and remediation evidence allows customers to see for themselves how seriously we take security, as well as how effective we are at achieving it. This allows our customers and prospects to make an informed decision about the risks they’re taking.

Conclusion

Security is a constantly moving target. Our team will never stop this extreme security testing of our infrastructure because threats are constantly evolving.

So…
Call us control freaks.
Call us over-achievers.
Call us paranoid.

One thing you’ll never call us is complacent…seriously.

For more information, please feel free to reach out to us at info@snowflake.net. We would love to help you on your journey to the cloud, securely. And keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and happenings here at Snowflake Computing.

 

Is My Data Safe in the Cloud?

This post comes to us from a guest blogger, Robert Lockard. Robert is an Oracle ACE with over 30 years experience as an Oracle DBA / Designer and Developer. He has spent the past 10 years focusing on database security. Robert frequently speaks at conferences all over the world, teaching DBA’s and Developers how to secure their data. You can follow him on twitter at @YourNavionPilot and read more of his blogs at http://www.oraclewizard.com.

I have four mantras for any system I’m involved in: Stability, Security, Accuracy and Performance.

When it comes to information security in the cloud, there are lots of recommendations for vendors and customers to consider. When I am called into customer sites to ensure their data is properly protected, my first step is making sure the person connecting to the database is authenticated properly and that encryption has been properly set up, along with a host of other related things. The amount of work required just to get data encrypted, make sure there is no ghost data, and ensure there is no spillage is a difficult process. From my perspective, Snowflake makes the end-to-end encryption easy because encryption is turned on by default. Your data is being encrypted from day one.

Here is my best advice on how to insure your critical business data is secure in the cloud along with my review of how Snowflake addresses these issues.

#1 – Demand Multi-factor Authentication

Snowflake supports using DUO multi-factor authentication to ensure the user who is authenticated to the database is also authorized. Stolen credentials are one of the top ways bad actors get in where they do not belong. The username / password paradigm of authentication has been around for several decades. Username / password harvesting and many of the data breaches, could be reduced by following best practices and a best practice is to use multi-factor authentication. This is something you know and something you have. When you authenticate with a username / password, a token will display a code;  you enter that code into the system and then you are authenticated. This is something Snowflake has built into their system. Multi-factor authentication is not an add-on option; it comes standard with their product.

In addition, Snowflake’s implementation of DUO also allows a verification request to be sent to a mobile device. When you enter your username / password, a request is sent to your mobile device. If you approve the request, you are connected. The beauty of this is, most of us always have our mobile phones with us. If you get a request that you did not originate, you know someone else is trying to use your credentials to get into the system. When you reject the request, they will be locked out.

#2 – Demand End-to-end Encryption

The customer should require end-to-end encryption from any cloud provider they decide to use. Snowflake delivers AES-256 encryption at no extra cost. The AES standard was adopted in 2001 by the National Institute of Standards and Technology (NIST). The AES standard has displaced the older DES standard. The AES standard uses 128 block sizes with three keys sizes 128, 192 and 256 bits. AES encryption is the US Government standard for strong encryption and has been adopted by financial, government and other commercial entities world wide.

#2a – Demand Sophisticated Encryption Key Management

Snowflake uses the Amazon Hardware Security Module (HSM) to manage sophisticated key structures such as key wrapping, key rotation and re-keying.

Key Wrapping (aka Hierarchical Keys)

Key wrapping has become the industry standard. Oracle provides it in Transparent Data Encryption with a master key that is stored outside the database to decrypt the keys that are used to secure the data. Microsoft SQL Server uses a certificate that is stored outside the database to decrypt the key that is used to decrypt the data. Snowflake uses four levels of keys to encrypt customers data.

Using this model, each customer’s data is encrypted with a unique set of keys. This way, customer “A”s data is encrypted with a different set of keys then customer “B”. By using multiple customer keys, each customer’s data is segregated, and secured, from the others.

See the Snowflake Encryption Key Management blog post for more details.

Cryptologic Key Life Time

National Institute of Standards (NIST) recommends having a policy in place so each key has a limited lifetime. The longer a key is in use, the greater the odds it will be compromised (see NIST Encryption Key Management Recommendation for more details). Snowflake uses two methods to control the lifetime of keys: key rotation and rekeying.

1) Cryptologic Key Rotation

One way to minimize the impact of a cryptologic key being compromised is to rotate the keys at a set interval. Snowflake rotates keys at a system defined interval. This ensures if a cryptologic key is compromised then the amount of data at risk is minimized.

The easy way to understand key rotation is this. Data is encrypted and decrypted using a key and, after a set period of time, another key is added; all new data is encrypted and decrypted with this new key and old data is decrypted with the old key. Again, see the Snowflake Encryption Key Management blog post for more details.

The advantage of using key rotation is two fold: a) Because you are adding a key, you don’t have to change the keying on the data that had been encrypted; you get performance. b) Because now there are multiple keys; if any key is compromised only a subset of information is vulnerable.

2) Cryptologic Rekeying

As an additional security option, customers can choose to have their data rekeyed annually. By rekeying the data annually, Snowflake minimizes the amount of time a key is in use, thereby minimizing the odds a cryptologic key will be compromised. Once data is rekeyed, the old keys are destroyed.

Network Encryption

Now we need to deal with data that is moving over the network. All network communications between loading the data and analyzing the data is encrypted using TLS (see Advantages of Using TLS for more details). In addition, for ESD (Enterprise with Sensitive Data) customers, all internal network communications is encrypted using TLS.

By using the TLS standard, Snowflake has implemented the industry best practice.

What about Performance and Stability?

There exists a perception that encryption has a negative performance impact on CPU. The HSM performs a variety of functions. It manages encrypting and decrypting keys, handles key rotation and rekeying. In addition, the HSM is only used to wrap and unwrap account master keys. All remaining encryption is performed by other services in the background, resulting in no impact on customer workloads.

The hardware encryption module is also tamper-resistant. Efforts to tamper with the hardware encryption module will shut the HSM down, yet Snowflake has configured the HSM to be stable and reliable.

Conclusion

Based on my evaluation, by using Snowflake, the customer gets three of my four mantras:

Security: You get multi-factor authentication and end-to-end encryption right out of the box.

Stability: Snowflake has configured the HSM to be very stable, plus it lives on AWS which has proven to be a very stable cloud platform.

Performance: Snowflake’s use of the Amazon Elastic Compute Cloud gives performance on demand.

My fourth manta, Accuracy, is up to you.

Encryption Everywhere

Data encryption, while vital, has not traditionally been within reach of all organizations primarily due to budget constraints and implementation complexity. By providing a centralized cloud solution for managing an enterprise’s data sources, Snowflake now makes it possible for all companies to achieve best practice in protecting their data by encrypting everything. To highlight this new paradigm for their CTO customers, the Silicon Valley Software Group (SV/SG) asked our Director of Security Mario Duarte to discuss this concept of “data encryption everywhere” on their blog.

Automatic Encryption of Data

Hopefully you had a chance to read our previous top 10 posts. As promised, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:

#2 Automatic Encryption of Data

One of the biggest worries people have about moving to the cloud is security. One key piece of providing enterprise class security is the ability to encrypt the data in your data warehouse environment. With Snowflake, your data is automatically encrypted by default.

No setup, no configuration, no add-on costs for high security features. Data is encrypted during its entire lifecycle. From loading data to storing data at rest, we apply end-to-end encryption, such that only the customer can read the data, and no one else. It is just part of the Snowflake service! That is a huge win for anyone who has ever tried to set up database security of any kind. In addition, this gives Snowflake a significant advantage compared to environments like Hadoop, where encryption and security is almost almost entirely left up to the customer to implement and maintain.

So what level of encryption?

For all data within Snowflake, we use strong AES 256-bit keys. Your data is encrypted as you load it. That is the default, and you cannot turn it off. In addition, our Snowflake security framework includes additional security best practices such as the use of a hierarchical key model and regular key rotation. All of this is automatic and transparent to our customers. In this way we provide our customers best-in-class data security as a service. For even more details on our approach to end-to-end encryption and how we secure your data, check out these blogs; end to end encryption and encryption key management. 

Remember the #10 Top feature – persistent result sets? Well those query results are also encrypted with 256-bit encryption keys (all the data, all the time).

So do you have your data at rest encrypted in your data warehouse today? Are your loading tools, and staging environments also encrypted?

If not, then put your data into the Snowflake Elastic Data Warehouse and rest easy knowing your data is safe and secure. Snowflake brings enterprise grade security to data warehousing in the cloud, with end-to-end encryption as a major part of our offering.

As a co-writer of this series, I would like to thank Kent Graziano, who has put in a lot of effort into bringing the thoughts behind this series to the audience. Without his persistence and vision, this would not have been possible. As always, keep an eye on this blog site, our Snowflake Twitter feed (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.

Kent Graziano and Saqib Mustafa

End-to-End Encryption in the Snowflake Data Warehouse

By Martin Hentschel and Peter Povinec.

Protecting customer data is one of the highest priorities for Snowflake. The Snowflake data warehouse encrypts all customer data by default, using the latest security standards, at no additional cost. Snowflake provides best-in-class key management, which is entirely transparent to customers. This makes Snowflake one of the easiest to use and most secure data warehouses on the market.

In previous blog posts, we explained critical components of Snowflake’s security architecture, including:

  • How Snowflake manages encryption keys; including the key hierarchy, automatic key rotation, and automatic re-encryption of data (“rekeying”)
  • How Snowflake uses Amazon CloudHSM to store and use Snowflake’s master keys with the highest protection
  • How Amazon CloudHSM is configured to run in high-availability mode.

In this blog post, we explain Snowflake’s ability to support end-to-end encryption, including:

  • How customers can upload their files to Amazon S3 using client-side encryption
  • How to use Snowflake to import and export client-side encrypted data.

End-to-End Encryption

End-to-end encryption is a form of communication where only the end users can read the data, but nobody else. For the Snowflake data warehouse service it means that only the customer and runtime components of the Snowflake service can read the data. No third parties, including Amazon AWS and any ISPs, can see data in the clear. This makes end-to-end encryption the most secure way to communicate with the Snowflake data warehouse service.

End-to-end encryption is important because it minimizes the attack surface. In the case of a security breach of any third party (for example of Amazon S3) the data is protected because it is always encrypted, regardless whether the breach is due to the exposure of access credentials indirectly or the exposure of data files directly, whether by an insider or by an external attacker, whether inadvertent or intentional. The encryption keys are only in custody of the customer and Snowflake. Nobody else can see the data in the clear – encryption works!


encryption_blog_diagrams_Diagram 1_Diagram 1
Figure 1: End-to-end Encryption in Snowflake

Figure 1 illustrates end-to-end encryption in the Snowflake data warehouse. There are three actors involved: the customer in its corporate network, a staging area, and the Snowflake data warehouse running in a secure virtual private cloud (VPC). Staging areas are either provided by the customer (option A) or by Snowflake (option B). Customer-provided staging areas are buckets or directories on Amazon S3 that the customer owns and manages. Snowflake-provided stages are built into Snowflake and are available to every customer in their account. In both cases, Snowflake supports end-to-end encryption.

The flow of end-to-end encryption in Snowflake is the following (illustrated in Figure 1):

  1. The customer uploads data to the staging area. If the customer uses their own staging area (option A), the customer may choose to encrypt the data files using client-side encryption. If the customer uses Snowflake’s staging area (option B), data files are automatically encrypted by default.
  2. The customer copies the data from the staging area into the Snowflake data warehouse. Within the Snowflake data warehouse, the data is transformed into Snowflake’s proprietary file format and stored on Amazon S3 (“data at rest”). In Snowflake, all data at rest is always encrypted.
  3. Results can be copied back into the staging area. Results are (optionally) encrypted using client-side encryption in the case of customer-provided staging areas or automatically encrypted in the case of Snowflake-provided staging areas.
  4. The customer downloads data from the staging area and decrypts the data on the client side.

In all of these steps, all data files are encrypted. Only the customer and runtime components of Snowflake can read the data. Snowflake’s runtime components decrypt the data in memory for query processing. No third-party service can see data in the clear.

Customer-provided staging areas are an attractive option for customers that already have data stored on Amazon S3, which they want to copy into Snowflake. If customers want extra security, they may use client-side encryption to protect their data. However, client-side encryption of customer-provided stages is optional.

Client-Side Encryption on Amazon S3

Client-side encryption, in general, is the most secure form of managing data on Amazon S3. With client-side encryption, the data is encrypted on the client before it is uploaded. That means, Amazon S3 only stores the encrypted version of the data and never sees data in the clear.

end-to-end-encryption-05

Figure 2: Uploading data to Amazon S3 using client-side encryption

Client-side encryption follows a specific protocol defined by Amazon AWS. The AWS SDK and third-party tools such as s3cmd or S3 Browser implement this protocol. Amazon S3’s client-side encryption protocol works as follows (Figure 2):

  1. The customer creates a secret master key, which remains with the customer.
  2. Before uploading a file to Amazon S3, a random encryption key is created and used to encrypt the file. The random encryption key, in turn, is encrypted with the customer’s master key.
  3. Both the encrypted file and the encrypted random key are uploaded to Amazon S3. The encrypted random key is stored with the file’s metadata.

When downloading data, the encrypted file and the encrypted random key are both downloaded. First, the encrypted random key is decrypted using the customer’s master key. Second, the encrypted file is decrypted using the now decrypted random key. All encryption and decryption happens on the client side. Never does Amazon S3 or any other third party (for example an ISP) see the data in the clear. Customers may upload client-side encrypted data using any clients or tools that support client-side encryption (AWS SDK, s3cmd, etc.).

Ingesting Client-Side Encrypted Data into Snowflake

Snowflake supports reading and writing to staging areas using Amazon S3’s client-side encryption protocol. In particular, Snowflake supports client-side encryption using a client-side master key.

end-to-end-encryption-06
Figure 3: Ingesting client-Side encrypted data into Snowflake

Ingesting client-side encrypted data from a customer-provided staging area into Snowflake (Figure 3) is just as easy as ingesting any other data into Snowflake. To ingest client-side encrypted data, the customer first creates a stage object with an additional master key parameter and then copies data from the stage into their database tables.

As an example, the following SQL snippet creates a stage object in Snowflake that supports client-side encryption:

-- create encrypted stage
create stage encrypted_customer_stage
url='s3://customer-bucket/data/'
credentials=(AWS _KEY_ID='ABCDEFGH' AWS_SECRET_KEY='12345678')
encryption=(MASTER_KEY='aBcDeFgHiJkL7890=');

The master key specified in this SQL command is the Base64-encoded string of the customer’s secret master key. As with all other credentials, this master key is transmitted to Snowflake over TLS (HTTPS) and stored in a secure, encrypted way in Snowflake’s metadata storage. Only the customer and query-processing components of Snowflake know the master key and are therefore able to decrypt data stored in the staging area.

As a side note, stage objects can be granted to other users within a Snowflake account without revealing S3 access credentials and client-side encryption keys to those users. This makes a stage object an interesting security feature in itself that is a Snowflake advantage over alternatives.

After the customer creates the stage object in Snowflake, the customer may copy data into their database tables. As an example, the following SQL command creates a database table “users” in Snowflake and copies data from the encrypted stage into the “users” table:

-- create table and ingest data from stage
create table users (id bigint, name varchar(500), purchases int);
copy into table from @encrypted_customer_stage/users;

The data is now ready to be analyzed using the Snowflake data warehouse. Of course, data can be offloaded into the staging area as well. As a last example, the following SQL command first creates a table “most_purchases” as the result of a query that finds the top 10 users with the most purchases, and then offloads the table into the staging area:

-- find top 10 users by purchases, unload into stage
create table most_purchases as select * from users order by purchases desc limit 10;
copy into @encrypted_customer_stage/most_purchases from most_purchases;

Snowflake encrypts the data files copied into the customer’s staging area using the master key stored in the stage object. Of course, Snowflake adheres to the client-side encryption protocol of Amazon S3. Therefore, the customer may download the encrypted data files using any clients or tools that support client-side encryption.

Summary

All customer data in the Snowflake warehouse is encrypted at transit and at rest. By supporting encryption for all types of staging areas as well, whether customer-owned staging areas or Snowflake-owned staging areas, Snowflake supports full end-to-end encryption in all cases. With end-to-end encryption, only the customer and runtime components of the Snowflake service can read the data. No third parties in the middle, for example Amazon AWS or any ISPs, see data in the clear. Therefore, end-to-end encryption secures data communicated with the Snowflake data warehouse service.

Protecting customer data at all levels is one of the pillars of the Snowflake data warehouse service. As such, Snowflake takes great care in securing the import and export of data into the Snowflake data warehouse. When importing and exporting data in Snowflake, customers may choose to use customer-provided staging areas or Snowflake-provided staging areas. Using customer-provided staging areas, customers may choose to upload data files using client-side encryption. Using Snowflake-provided staging, data files are always encrypted by default. Thus, Snowflake supports the end-to-end encryption in both cases.

Making Data Warehousing Easy

Legacy Problems

Organizations with legacy on-premise data warehouses spend a lot of time and money managing their environments and keeping up with business demands. Because of the size of the investments, organizations often run their data warehouses close to full utilization. While this may meet their current needs, the inherent lack of scalability could mean compromises on performance or failure to meet SLAs when more workloads, data sources and users need to be added.  Then the journey begins to add more capacity.  Organizations often need to acquire specialized resources, or become reliant on legacy vendors to manage and maintain the environment. All of this means that if there is a spike in demand for the environment, these organizations cannot accommodate the growth without impacting performance or must absorb additional costs for a greater footprint that waits unused until needed for that brief spike in the future.

Dealing with Growth

With performance concerns come the typical headaches of any data warehouse environment. These include, but are not limited to growing the environment, finding qualified resources for  performance tuning, optimizing queries, and dealing with concurrency and user growth. On the other hand businesses are facing stiffer competition, and end users are clamoring for faster answers to their business questions. In the past, data warehousing was limited to a set of users typically in marketing or finance. Now even field sales reps want access to up-to-date data, creating more load on the data warehouse. Plus the more data you have, the more important security becomes and the cost for performance increases. So now organizations are not only keeping the lights on, but also increasing spending to get performance, and securing the environment. In short, data warehouses have become more difficult to maintain and run!

An example of an organization facing this challenge of scalability is CapSpecialty, a leading provider of specialty insurance for small to mid-sized businesses. CapSpecialty used a legacy data warehouse to support analytics needed by their actuarial users to understand how to price and package products in various geographies. With the increased demand for access to this data, this legacy environment required a significant upgrade. Performance impacts led to users having to start their queries before leaving the office for the weekend, hoping they would be completed when they returned to work on Monday. The legacy environment also limited their ability to report on important KPIs that were critical to running the business in a timely manner. As with any financial organization, the environment also needed to provide a very secure environment to store the crown jewels: customers’ risk profile and related financial data.Unfortunately, upgrading their environment to meet this increased demand was going to cost them $500K just for licensing, and that would only give them a 2X increase in performance. This does not even include the costs for deployment, management and hosting for the new environment.

Making Data Warehousing Easy

The need for a scalable, more cost effective solution led them to Snowflake. After evaluating a number of data warehouse options,CapSpecialty decided to implement the Snowflake cloud-based Elastic Data Warehouse. Besides offering an attractive cost structure, Snowflake’s true cloud solution delivered ease of migration and scalability. With Snowflake, CapSpecialty was up and running in less than a week. In addition to achieving an increase of 200x query performance, they leveraged existing infrastructure and were set up to scale for future growth. Snowflake also provided end to end enterprise level security to protect their sensitive financial data in the cloud.

CapSpecialty underwriters are now able to analyze 10 years’ worth of governed data in 15 minutes. The stage has also been set for CapSpecialty executives to view dashboards that display real-time profitability and KPIs. Using Snowflake, CapSpecialty can also bring semi-structured data to the environment, and serve the analytics to their field agents to effectively market their products in various geographies.

To learn the details of how Snowflake made data warehousing easy for CapSpecialty, we encourage you to read more in the case study. You can also attend the our webinar  on April 27th, 2016, 10:00 AM-PST/1:00 PM-EST, to find out how Snowflake and Microstrategy enable CapSpecialty analysts to understand data in real time.