Data Encryption with Customer-Managed Keys

The security of customer data is Snowflake’s first priority. All customer data is encrypted using industry-standard techniques such as AES-256. Encryption keys are organized hierarchically, rooted in a hardware security module (HSM). This allows complete isolation of customer data and greatly reduces the attack vectors.

For customers with the highest security requirements, we are adding another security component: customer-managed keys. With customer-managed keys, the customer manages the encryption key and makes it available to Snowflake. The customer has full control over this key. If the customer disables access to the encryption key, Snowflake can no longer access the customer’s data. Your data. Your encryption keys.

In this blog post, we will explain the benefits of customer-managed keys and their implementation in the Snowflake cloud data warehouse.

Benefits

Customer-managed keys provide the following benefits:

More Control over Data Access: Customer-managed keys make it impossible for Snowflake to comply with requests to access customer data. If data is encrypted using customer-managed keys and the customer disables access to the encryption key, it is technically impossible for Snowflake to decrypt the data. It is therefore the customer’s responsibility to comply with such requests directly.

Stop Data Breaches: If a customer experiences a data breach, they may disable access of customer-managed keys to Snowflake. This will halt all running queries in Snowflake, including queries that may inspect data or unload data. Disabling customer-managed keys allows customers to stop ongoing exfiltration of their data.

More Control over Data Lifecycle: The last reason why customers require this feature is lack of trust with any cloud provider. Customers may have sensitive data that they do not trust Snowflake to manage on their own. Using customer-managed keys, such sensitive data is ultimately encrypted with the customer’s key. It is impossible for Snowflake to decrypt this data without the customer’s consent. The customer has full control over the data’s lifecycle.

Implementation

Before we explain the implementation of customer-managed keys, we should first give a background of Snowflake’s key hierarchy and Amazon’s key management service.

Background 1: Snowflake’s Key Hierarchy

Snowflake manages encryption keys hierarchically. Within this key hierarchy, a parent key encrypts all of its child keys. When a key encrypts another key, it is called “wrapping”. When the key is decrypted again, it is called “unwrapping”.

Encryption key hierarchy - Snowflake

Figure 1: Encryption key hierarchy in Snowflake.

Figure 1 shows Snowflake’s hierarchy of encryption keys. The top-most root keys are stored in a hardware security module (or CloudHSM). A root key wraps account master keys. Each account master key corresponds to one customer account in Snowflake. Account master keys, in turn, wrap all data-level keys, including table master keys, stage master keys, and result master keys. In addition, every single data file is encrypted with a separate key. A detailed overview of Snowflake’s encryption key management is provided in this Blog post.

Background 2: AWS Key Management Service

Amazon’s AWS Key Management Service (KMS) is a service to store encryption keys and tightly control access to them. Amazon provides an audit log of all operations and interactions with KMS by using CloudTrail. This allows customers to manage their own encryption keys and validate their usage via the audit log. KMS also allows customers to disable access to any keys at any time. Combining KMS with Snowflake’s encryption key hierarchy allows us to implement customer-managed keys. More details about AWS KMS can be found on the Amazon website.

Implementation of Customer-Managed Keys

The implementation of customer-managed keys changes the way account master keys (AMKs) are stored within Snowflake’s encryption key hierarchy. Normally, as shown in Figure 1 above, an AMK is wrapped by the root key stored in CloudHSM. For customer-managed keys, this is only partly true. There are two AMKs involved: a first key is wrapped by the root key stored in the CloudHSM and a second key is wrapped by the customer key in KMS. Unwrapping and combining these two keys leads to the composed account master key, which then wraps and unwraps all underlying keys in the hierarchy (table master keys, result master keys, etc.).

Account master key - Customer-Managed Keys

Figure 2: Account master key composed of AMK-S and AMK-C. AMK-C is wrapped by KMS.

Figure 2 shows this concept in detail. With customer-managed keys, the AMK is composed of two keys: AMK-S and AMK-C. AMK-S is a random 256-bit key that is wrapped with the root key stored in HSM. AMK-C is a second random 256-bit key that is wrapped with the customer key stored in KMS. AMK-S and AMK-C are completely random and unrelated. Both wrapped keys are stored in Snowflake’s encryption key hierarchy.

Figure 3: Unwrapping and composing of AMK.

When the customer runs a query in Snowflake that requires access to customer data, the composed AMK is produced as follows (see Figure 3). Both wrapped keys, AMK-S and AMK-C, are retrieved from the encryption key hierarchy. AMK-S is unwrapped using the root key in HSM. AMK-C is unwrapped using the customer key in KMS. The KMS audit log logs an access event to the customer key. Both unwrapped 256-bit keys are combined using XOR to form the composed AMK. The composed AMK is then used to unwrap the underlying table master keys to access the customer data.

The composed AMK is cached within the Snowflake data warehouse for performance reasons. This cache has a timeout period after which the cached AMK is not accessible anymore. The cache is refreshed in the background such that continuous queries are not impacted by any latency to KMS. If access to KMS is revoked, refreshing the cache fails and the AMK is removed from the cache immediately. Any running queries are aborted. New queries fail to start because no AMK can be composed. The customer’s data can no longer be decrypted by the Snowflake service.

Summary

Customer-managed keys provide an extra level of security for customers with sensitive data. With this feature, the customer manages the encryption key themselves and makes it accessible to Snowflake. If the customer decides to disable access, data can no longer be decrypted. In addition, all running queries are aborted. This has the following benefits for customers: (a) it makes it technically impossible for Snowflake to comply with requests for access to customer data, (b) the customer can actively mitigate data breaches and limit data exfiltration, and (c) it gives the customer full control over data lifecycle.

Availability

Customer-managed keys are a primary component of Tri-Secret Secure, a Snowflake Enterprise Edition for Sensitive Data (ESD) feature. To enable Tri-Secret Secure for your ESD account, you need to first create a key in AWS KMS (in your AWS account) and then contact Snowflake Support.

Acknowledgements

We want to thank Difei Zhang for his contributions to this project.

For more information, please feel free to reach out to us at info@snowflake.net. We would love to help you on your journey to the cloud. And keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and developments here at Snowflake Computing.

Is My Data Safe in the Cloud?

This post comes to us from a guest blogger, Robert Lockard. Robert is an Oracle ACE with over 30 years experience as an Oracle DBA / Designer and Developer. He has spent the past 10 years focusing on database security. Robert frequently speaks at conferences all over the world, teaching DBA’s and Developers how to secure their data. You can follow him on twitter at @YourNavionPilot and read more of his blogs at http://www.oraclewizard.com.

I have four mantras for any system I’m involved in: Stability, Security, Accuracy and Performance.

When it comes to information security in the cloud, there are lots of recommendations for vendors and customers to consider. When I am called into customer sites to ensure their data is properly protected, my first step is making sure the person connecting to the database is authenticated properly and that encryption has been properly set up, along with a host of other related things. The amount of work required just to get data encrypted, make sure there is no ghost data, and ensure there is no spillage is a difficult process. From my perspective, Snowflake makes the end-to-end encryption easy because encryption is turned on by default. Your data is being encrypted from day one.

Here is my best advice on how to insure your critical business data is secure in the cloud along with my review of how Snowflake addresses these issues.

#1 – Demand Multi-factor Authentication

Snowflake supports using DUO multi-factor authentication to ensure the user who is authenticated to the database is also authorized. Stolen credentials are one of the top ways bad actors get in where they do not belong. The username / password paradigm of authentication has been around for several decades. Username / password harvesting and many of the data breaches, could be reduced by following best practices and a best practice is to use multi-factor authentication. This is something you know and something you have. When you authenticate with a username / password, a token will display a code;  you enter that code into the system and then you are authenticated. This is something Snowflake has built into their system. Multi-factor authentication is not an add-on option; it comes standard with their product.

In addition, Snowflake’s implementation of DUO also allows a verification request to be sent to a mobile device. When you enter your username / password, a request is sent to your mobile device. If you approve the request, you are connected. The beauty of this is, most of us always have our mobile phones with us. If you get a request that you did not originate, you know someone else is trying to use your credentials to get into the system. When you reject the request, they will be locked out.

#2 – Demand End-to-end Encryption

The customer should require end-to-end encryption from any cloud provider they decide to use. Snowflake delivers AES-256 encryption at no extra cost. The AES standard was adopted in 2001 by the National Institute of Standards and Technology (NIST). The AES standard has displaced the older DES standard. The AES standard uses 128 block sizes with three keys sizes 128, 192 and 256 bits. AES encryption is the US Government standard for strong encryption and has been adopted by financial, government and other commercial entities world wide.

#2a – Demand Sophisticated Encryption Key Management

Snowflake uses the Amazon Hardware Security Module (HSM) to manage sophisticated key structures such as key wrapping, key rotation and re-keying.

Key Wrapping (aka Hierarchical Keys)

Key wrapping has become the industry standard. Oracle provides it in Transparent Data Encryption with a master key that is stored outside the database to decrypt the keys that are used to secure the data. Microsoft SQL Server uses a certificate that is stored outside the database to decrypt the key that is used to decrypt the data. Snowflake uses four levels of keys to encrypt customers data.

Using this model, each customer’s data is encrypted with a unique set of keys. This way, customer “A”s data is encrypted with a different set of keys then customer “B”. By using multiple customer keys, each customer’s data is segregated, and secured, from the others.

See the Snowflake Encryption Key Management blog post for more details.

Cryptologic Key Life Time

National Institute of Standards (NIST) recommends having a policy in place so each key has a limited lifetime. The longer a key is in use, the greater the odds it will be compromised (see NIST Encryption Key Management Recommendation for more details). Snowflake uses two methods to control the lifetime of keys: key rotation and rekeying.

1) Cryptologic Key Rotation

One way to minimize the impact of a cryptologic key being compromised is to rotate the keys at a set interval. Snowflake rotates keys at a system defined interval. This ensures if a cryptologic key is compromised then the amount of data at risk is minimized.

The easy way to understand key rotation is this. Data is encrypted and decrypted using a key and, after a set period of time, another key is added; all new data is encrypted and decrypted with this new key and old data is decrypted with the old key. Again, see the Snowflake Encryption Key Management blog post for more details.

The advantage of using key rotation is two fold: a) Because you are adding a key, you don’t have to change the keying on the data that had been encrypted; you get performance. b) Because now there are multiple keys; if any key is compromised only a subset of information is vulnerable.

2) Cryptologic Rekeying

As an additional security option, customers can choose to have their data rekeyed annually. By rekeying the data annually, Snowflake minimizes the amount of time a key is in use, thereby minimizing the odds a cryptologic key will be compromised. Once data is rekeyed, the old keys are destroyed.

Network Encryption

Now we need to deal with data that is moving over the network. All network communications between loading the data and analyzing the data is encrypted using TLS (see Advantages of Using TLS for more details). In addition, for ESD (Enterprise with Sensitive Data) customers, all internal network communications is encrypted using TLS.

By using the TLS standard, Snowflake has implemented the industry best practice.

What about Performance and Stability?

There exists a perception that encryption has a negative performance impact on CPU. The HSM performs a variety of functions. It manages encrypting and decrypting keys, handles key rotation and rekeying. In addition, the HSM is only used to wrap and unwrap account master keys. All remaining encryption is performed by other services in the background, resulting in no impact on customer workloads.

The hardware encryption module is also tamper-resistant. Efforts to tamper with the hardware encryption module will shut the HSM down, yet Snowflake has configured the HSM to be stable and reliable.

Conclusion

Based on my evaluation, by using Snowflake, the customer gets three of my four mantras:

Security: You get multi-factor authentication and end-to-end encryption right out of the box.

Stability: Snowflake has configured the HSM to be very stable, plus it lives on AWS which has proven to be a very stable cloud platform.

Performance: Snowflake’s use of the Amazon Elastic Compute Cloud gives performance on demand.

My fourth manta, Accuracy, is up to you.

End-to-End Encryption in the Snowflake Data Warehouse

By Martin Hentschel and Peter Povinec.

Protecting customer data is one of the highest priorities for Snowflake. The Snowflake data warehouse encrypts all customer data by default, using the latest security standards, at no additional cost. Snowflake provides best-in-class key management, which is entirely transparent to customers. This makes Snowflake one of the easiest to use and most secure data warehouses on the market.

In previous blog posts, we explained critical components of Snowflake’s security architecture, including:

  • How Snowflake manages encryption keys; including the key hierarchy, automatic key rotation, and automatic re-encryption of data (“rekeying”)
  • How Snowflake uses Amazon CloudHSM to store and use Snowflake’s master keys with the highest protection
  • How Amazon CloudHSM is configured to run in high-availability mode.

In this blog post, we explain Snowflake’s ability to support end-to-end encryption, including:

  • How customers can upload their files to Amazon S3 using client-side encryption
  • How to use Snowflake to import and export client-side encrypted data.

End-to-End Encryption

End-to-end encryption is a form of communication where only the end users can read the data, but nobody else. For the Snowflake data warehouse service it means that only the customer and runtime components of the Snowflake service can read the data. No third parties, including Amazon AWS and any ISPs, can see data in the clear. This makes end-to-end encryption the most secure way to communicate with the Snowflake data warehouse service.

End-to-end encryption is important because it minimizes the attack surface. In the case of a security breach of any third party (for example of Amazon S3) the data is protected because it is always encrypted, regardless whether the breach is due to the exposure of access credentials indirectly or the exposure of data files directly, whether by an insider or by an external attacker, whether inadvertent or intentional. The encryption keys are only in custody of the customer and Snowflake. Nobody else can see the data in the clear – encryption works!


encryption_blog_diagrams_Diagram 1_Diagram 1
Figure 1: End-to-end Encryption in Snowflake

Figure 1 illustrates end-to-end encryption in the Snowflake data warehouse. There are three actors involved: the customer in its corporate network, a staging area, and the Snowflake data warehouse running in a secure virtual private cloud (VPC). Staging areas are either provided by the customer (option A) or by Snowflake (option B). Customer-provided staging areas are buckets or directories on Amazon S3 that the customer owns and manages. Snowflake-provided stages are built into Snowflake and are available to every customer in their account. In both cases, Snowflake supports end-to-end encryption.

The flow of end-to-end encryption in Snowflake is the following (illustrated in Figure 1):

  1. The customer uploads data to the staging area. If the customer uses their own staging area (option A), the customer may choose to encrypt the data files using client-side encryption. If the customer uses Snowflake’s staging area (option B), data files are automatically encrypted by default.
  2. The customer copies the data from the staging area into the Snowflake data warehouse. Within the Snowflake data warehouse, the data is transformed into Snowflake’s proprietary file format and stored on Amazon S3 (“data at rest”). In Snowflake, all data at rest is always encrypted.
  3. Results can be copied back into the staging area. Results are (optionally) encrypted using client-side encryption in the case of customer-provided staging areas or automatically encrypted in the case of Snowflake-provided staging areas.
  4. The customer downloads data from the staging area and decrypts the data on the client side.

In all of these steps, all data files are encrypted. Only the customer and runtime components of Snowflake can read the data. Snowflake’s runtime components decrypt the data in memory for query processing. No third-party service can see data in the clear.

Customer-provided staging areas are an attractive option for customers that already have data stored on Amazon S3, which they want to copy into Snowflake. If customers want extra security, they may use client-side encryption to protect their data. However, client-side encryption of customer-provided stages is optional.

Client-Side Encryption on Amazon S3

Client-side encryption, in general, is the most secure form of managing data on Amazon S3. With client-side encryption, the data is encrypted on the client before it is uploaded. That means, Amazon S3 only stores the encrypted version of the data and never sees data in the clear.

end-to-end-encryption-05

Figure 2: Uploading data to Amazon S3 using client-side encryption

Client-side encryption follows a specific protocol defined by Amazon AWS. The AWS SDK and third-party tools such as s3cmd or S3 Browser implement this protocol. Amazon S3’s client-side encryption protocol works as follows (Figure 2):

  1. The customer creates a secret master key, which remains with the customer.
  2. Before uploading a file to Amazon S3, a random encryption key is created and used to encrypt the file. The random encryption key, in turn, is encrypted with the customer’s master key.
  3. Both the encrypted file and the encrypted random key are uploaded to Amazon S3. The encrypted random key is stored with the file’s metadata.

When downloading data, the encrypted file and the encrypted random key are both downloaded. First, the encrypted random key is decrypted using the customer’s master key. Second, the encrypted file is decrypted using the now decrypted random key. All encryption and decryption happens on the client side. Never does Amazon S3 or any other third party (for example an ISP) see the data in the clear. Customers may upload client-side encrypted data using any clients or tools that support client-side encryption (AWS SDK, s3cmd, etc.).

Ingesting Client-Side Encrypted Data into Snowflake

Snowflake supports reading and writing to staging areas using Amazon S3’s client-side encryption protocol. In particular, Snowflake supports client-side encryption using a client-side master key.

end-to-end-encryption-06
Figure 3: Ingesting client-Side encrypted data into Snowflake

Ingesting client-side encrypted data from a customer-provided staging area into Snowflake (Figure 3) is just as easy as ingesting any other data into Snowflake. To ingest client-side encrypted data, the customer first creates a stage object with an additional master key parameter and then copies data from the stage into their database tables.

As an example, the following SQL snippet creates a stage object in Snowflake that supports client-side encryption:

-- create encrypted stage
create stage encrypted_customer_stage
url='s3://customer-bucket/data/'
credentials=(AWS _KEY_ID='ABCDEFGH' AWS_SECRET_KEY='12345678')
encryption=(MASTER_KEY='aBcDeFgHiJkL7890=');

The master key specified in this SQL command is the Base64-encoded string of the customer’s secret master key. As with all other credentials, this master key is transmitted to Snowflake over TLS (HTTPS) and stored in a secure, encrypted way in Snowflake’s metadata storage. Only the customer and query-processing components of Snowflake know the master key and are therefore able to decrypt data stored in the staging area.

As a side note, stage objects can be granted to other users within a Snowflake account without revealing S3 access credentials and client-side encryption keys to those users. This makes a stage object an interesting security feature in itself that is a Snowflake advantage over alternatives.

After the customer creates the stage object in Snowflake, the customer may copy data into their database tables. As an example, the following SQL command creates a database table “users” in Snowflake and copies data from the encrypted stage into the “users” table:

-- create table and ingest data from stage
create table users (id bigint, name varchar(500), purchases int);
copy into table from @encrypted_customer_stage/users;

The data is now ready to be analyzed using the Snowflake data warehouse. Of course, data can be offloaded into the staging area as well. As a last example, the following SQL command first creates a table “most_purchases” as the result of a query that finds the top 10 users with the most purchases, and then offloads the table into the staging area:

-- find top 10 users by purchases, unload into stage
create table most_purchases as select * from users order by purchases desc limit 10;
copy into @encrypted_customer_stage/most_purchases from most_purchases;

Snowflake encrypts the data files copied into the customer’s staging area using the master key stored in the stage object. Of course, Snowflake adheres to the client-side encryption protocol of Amazon S3. Therefore, the customer may download the encrypted data files using any clients or tools that support client-side encryption.

Summary

All customer data in the Snowflake warehouse is encrypted at transit and at rest. By supporting encryption for all types of staging areas as well, whether customer-owned staging areas or Snowflake-owned staging areas, Snowflake supports full end-to-end encryption in all cases. With end-to-end encryption, only the customer and runtime components of the Snowflake service can read the data. No third parties in the middle, for example Amazon AWS or any ISPs, see data in the clear. Therefore, end-to-end encryption secures data communicated with the Snowflake data warehouse service.

Protecting customer data at all levels is one of the pillars of the Snowflake data warehouse service. As such, Snowflake takes great care in securing the import and export of data into the Snowflake data warehouse. When importing and exporting data in Snowflake, customers may choose to use customer-provided staging areas or Snowflake-provided staging areas. Using customer-provided staging areas, customers may choose to upload data files using client-side encryption. Using Snowflake-provided staging, data files are always encrypted by default. Thus, Snowflake supports the end-to-end encryption in both cases.

Are Data Security Breaches Accelerating the Shift to the Cloud?

There is an old saying that there are two things certain in life: death and taxes. I would like to add a third one–data security breaches. The Identity Theft Resource Center (ITRC) defines a data security breach as “an incident in which an individual name plus a Social Security, driver’s license number, medical record or financial records (credit/debit cards included) is potentially put at risk because of exposure.” The ITRC reports that 717 data breaches have occurred this year exposing over 176 million records.

On the surface, finding a pattern across all such breaches may appear daunting considering how varied the targeted companies are. However, the ITRC argues that the impacted organizations are similar in that all of the data security breaches contained “personally identifiable information (PII) in a format easily read by thieves, in other words, not encrypted.” Based on my experience, I’d expect that a significant portion of the data breaches compromised data in on-premises systems. Being forced to realize the vulnerability of on-premises systems, organizations are beginning to rethink their cloud strategy.

For example, Tara Seals declares in her recent Infosecurity Magazine article that “despite cloud security fears, the ongoing epidemic data breaches is likely to simply push more enterprises towards the cloud.” Is the move to the cloud simply a temporary, knee-jerk reaction to the growing trend in security breaches or are we witnessing a permanent shift towards the cloud? Some industry experts conclude that a permanent shift is happening. Tim Jennings from Ovum for example, believes that a driving force behind enterprises’ move to the cloud is that they lack the in-house security expertise to deal with today’s threats and highly motivated bad actors. Perhaps the headline from the Onion, which declares “China Unable To Recruit Hackers Fast Enough To Keep Up With Vulnerabilities In U.S. Security Systems” is not so funny after all.

But are the cloud and cloud offerings more secure than their on-premises counterparts? Tara Seals appears to suggest that they can be when she writes that, “Modern cloud providers have invested large sums of money into end-to-end security” by providing sophisticated security intelligence.” Let’s consider data encryption as an illustration of her point.

The principle behind safeguarding information by leveraging encryption is as old as the Roman Empire, with most organizations agreeing that it is an effective way to minimize the impact of a security breach. But if that is true, what is behind ITRC’s observation that PII was not encrypted by the impacted organizations?

The truth of the matter is that encryption is hard. Take the example of storing encryption keys using Hardware Security Modules (HSMs). In general, using an HSM is a good security practice for safeguarding encryption keys and for meeting government standards and compliance requirements. However, its utility is as useful as an unlocked safe without the proper security and operational controls to protect it. To that end, organizations moving to the cloud need to understand their cloud provider’s encryption framework to measure their effectiveness in thwarting an intruder’s attack. Things to consider when assessing a cloud provider’s encryption solution include:

  1. Encryption key wrapping strategies
  2. Rotation encryption key frequency
  3. Methods for rekeying encryption keys
  4. Ability to monitor, log, and alert when suspicious activities are performed against the HSM

Tim Jennings and Tara Seals present compelling arguments for the possible security advantage of cloud providers over their on-premises counterparts. However, I feel that there are other equally or possibly more compelling reasons than just that cloud providers have more talented security experts.

The systems that organizations use to store and analyze data are often critical to the business. As a result, any planned or unplanned outage can significantly impact productivity and may even result in lost revenue. Now imagine the position that a CISO may find herself when requesting that an emergency security patch be deployed under the aforementioned situation. Even under the best conditions, coordinating and deploying a security update may take weeks if not months, which ultimately leaves the system vulnerable to a bad actor. That’s where a cloud solution can outperform its on-premises counterpart. An effective cloud solution allows one to almost instantly deploy security updates without impacting consumers of its services and thus reducing the time that the system is vulnerable.

Alas, PII data is so financially attractive of a target, whether the data is located on-premises or on the cloud, that one should more and more attempts—some of which will succeed—to breach systems in the cloud as organizations continue to leverage more cloud services. It is therefore imperative that organizations perform their due diligence when selecting the right security-focused cloud services partners.

Top 10 Cool Things I Like About Snowflake

I have now been with Snowflake Computing for a little over two months (my how time flies). In that time, I have run the demo, spoken at several trade shows, and written a few blogs posts. I have learned a ton about the product and what it means to be an Elastic Data Warehouse in the Cloud.

So for this post I am going to do a quick rundown of some of the coolest features I have learned about so far. 

#10 Persistent results sets available via History

Once you execute a query, the result set will persist for 24 hours (so you can go back and check your work). It may seem minor to some, but it sure is convenient to be able to pull up the results from a previous query without having to execute the query a second time. Saves on time and processing. Read more

#9 Ability to connect with JDBC

Again seems like a no brainer but very important. I had no real clear concept of how I would connect to a data warehouse in the cloud so this was good news.  After getting my favorite data modeling tool, Oracle SQL Developer Data Modeler (SDDM),  installed on my new Mac, I was able to configure it to connect to my Snowflake demo schema using JDBC and reverse engineer the design. 

So why is this cool? It means that whatever BI or ETL tool you use today, if it can talk over JDBC, you can connect it to Snowflake. Read more

#8 UNDROP

With UNDROP in Snowflake you can recover a table instantaneously with a single command:

UNDROP TABLE <tablename>

No need to reload last night’s backup to do the restore. No need to wait while all that data is pulled back in. It just happens!

Now that is a huge time (and life) saver. Read more

#7 Fast Clone

Even cooler than UNDROP is the fast clone feature.

The Snowflake CLONE command can create a clone of a table, a schema, or an entire database almost instantly. It took me barely a minute to create a clone of a 2TB database without using additional storage! And I am not a DBA, let alone a “cloud” DBA.

This means you can create multiple copies of production data without incurring additional storage costs. No need to have separate test/dev data sets.

Hence why I think it is way cool! Read more

#6 JSON Support with SQL

During the first demo of Snowflake I attended (before I even applied for a job here), this one got my attention.

Using the knowledge and skills I already had with SQL, I could quickly learn to query JSON data, and join it to traditional tabular data in relational tables.

Wow – this looked like a great stepping stone into the world of “Big Data” without having to learn complex technologies like Hadoop, MapReduce, or Hive! Read more

Yes, I call that a very cool feature. And the fact that the JSON documents are stored in a table and optimized automatically in the background for MPP and columnar access. This gives you the ability to combine semi-structured and structured data, in one location. For further details check out my detailed 2 part blog here and here.

#5 ANSI compliant SQL with Analytic Functions

Another key feature in Snowflake, that is required to be called a relational data warehouse, is of course the ability to write standard SQL. More so, for data warehousing, is access to sophisticated analytic and windowing functions (e.g., lead, lag, rank, stddev, etc.).

Well Snowflake definitely has these.  In fact we support everything you would expect including aggregation functions, nested virtual tables, subqueries, order by, and group by. This means it is fairly simple for your team to migrate your existing data warehouse technologies to Snowflake. Read more

#4 Separation of Storage and Compute

The innovative, patent-pending, Multi-Cluster, Shared Data Architecture in Snowflake is beyond cool. The architecture consists of three layers; storage, compute, and cloud services. Each layer is decoupled from the other, each layer is independently scalable. This enables customers to scale resources as they are required, rather than pre-allocating resources for peak consumption. In my 30+ years working in IT, I have not seen anything like it.  It is truly one of the advantages that comes from engineering the product, from the ground up, to take full advantage of the elasticity of the cloud. Read more

#3 Support for Multiple Workloads

With this unique architecture, Snowflake can easily support multiple disparate workloads. Because of the separation of compute and storage, you can easily spin up separate Virtual Warehouses of different sizes to run your ELT processes, support BI report users, data scientists, and data miners. And it makes total sense to be able to keep disparate workloads separate, to avoid resource contention, rather than just saying we support “mixed” workloads.

And even better – no special skills or secret configuration settings are required to make this work. It is the way Snowflake is built by design. Nice! Read more

#2 Automatic Encryption of Data

Security is a major concern for moving to the cloud. With Snowflake, your data is automatically encrypted by default. No setup, no configuration, no add-on costs for high security features.

It is just part of the service! To me that is a huge win. Read more

#1 Automatic Query Optimization. No Tuning!

As a long time data architect, and not a DBA, this is my favorite part of Snowflake. I do not have to worry about my query performance at all. It is all handled “auto-magically” via meta data and an optimization engine in our cloud services layer. I just model, load, and query the data.

So, no indexes, no need to figure out partitions and partition keys, no need to pre-shard any data for distribution, and no need to remember to update statistics.

This feature, to me, is one of the most important when it comes to making Snowflake a zero management Data Warehouse as a Service offering. Read more

Well, that is the short list of my top 10 favorite features in Snowflake. Keep a look out for future posts in the coming weeks, to provide details on these and other key features of the Snowflake Elastic Data Warehouse.

Now check out this short intro video to Snowflake!

If you want to learn more about Snowflake, sign up for one of our frequent webinars, or just drop me a line at kent.graziano@snowflake.net and I will hook you up!

P.S. Keep an eye on my Twitter feed (@kentgraziano) and the Snowflake feed (@SnowflakeDB) for updates on all the action and activities here at Snowflake Computing. Watch for #BuiltForTheCloud and #DWaaS.