Support for Multiple Workloads

Hopefully you had a chance to read our previous top 10 posts. As promised, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:

#3 Support for Multiple Workloads

With our unique Multi-cluster, shared data architecture, Snowflake can easily support multiple and disparate workloads.  This is a common issue in traditional data warehouses so it makes total sense to be able to keep disparate workloads separate, to truly avoid resource contention, rather than just saying we support “mixed” workloads.

In legacy data warehouse environments, we often found ourselves constrained by what we could run and when we could run it for fear of resource contention, especially with the CPUs. In many cases it was impossible to refresh the data during the day because the highly parallelized, batch ETL process, while tuned for maximum throughput, usually hogged all the CPUs while it ran. That meant virtually no reporting queries could get resources so they would just hang. Likewise a complex report with calculations and massive aggregations would cause normally fast, simple reports to languish. And there was no way you could let any business users in the system to do exploratory queries as those might also cause everything else to hang.

Because of the separation of compute and storage native to Snowflake’s architecture, as previously highlighted, you can easily spin up a set of compute nodes (we call them Virtual Warehouses) to run your ELT processes, and another set to support your BI report users, and a third set to support data scientists and data miners. In fact you can spin up (or down!) as many virtual warehouses as you need to execute all the workloads you have.

Virtual Warehouse

So not only does each virtual warehouse share the same data (insuring consistent results), they are able to do so without being affected by operations being launched in other virtual warehouses because they are using completely separate resources. Hence there is no more resource contention!

With the Snowflake Elastic Data Warehouse, there is no more need to run the data loads at night just to avoid slowing down the reports. No more worry that one runaway query will impact the loads or other users. You can now run loads (e.g., real time, micro-batch, etc) at any time and thus provide your analysts and users current data on a more frequent basis.

And even better – no special skills or secret configuration settings are required to make this work. It is the way Snowflake’s Data Warehouse as a Service (DWaaS) is built by design.

Nice!

For a quick look at how this works, check out this video.

Thanks to Saqib Mustafa for his help and suggestions on this post.

As always, keep an eye on this blog site, our Snowflake Twitter feeds (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.

 

Snowflake Fast Clone

Hopefully you had a chance to read our previous posts: Query Results Sets available in history,   Ability to connect with JDBC, and  Undrop a table, schema or a database. As promised in the original Top 10 post, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:

#7 Fast Clone

Even cooler than UNDROP is our fast clone feature.

Have you ever had to wait hours, days, maybe weeks to spin up a copy of your production data warehouse so you could have a test or a development environment? And have to pay extra for the test or development environment to able to hold all the replicated data? Or maybe you have some new data scientists that just want to play around with the data but they really need their own copy?

With the simple Snowflake CLONE command, our customers can create multiple copies of the data tables, schemas, and databases, without replicating the data itself. This gives our customers the ability to almost instantly make the data available to use for multiple user groups, without the additional cost (or time) of actually replicating the data.

Magic??

Almost –

Magic with Meta Data.

Based on our unique solution, Snowflake users are able to clone their tables, schema or databases without creating additional copies. Snowflake stores data in files that are immutable, and encrypted, as part of our architecture. Our cloud services layer, with the metadata repository, records the information regarding the files being stored, the file locations, and a reference to a certain version of the data. This is also kept encrypted.  In addition, when any data changes, the Metadata repository is automatically updated to provide a pointer to the changed data.  All of this is performed in the background by the software without any involvement from the user. The metadata repository still retains the record for all versions of the data set.

How Cloning works

Because of the data in the metadata store, the user can quickly create a clone of the table. All the user has to do is to submit the clone command. This command can be submitted at any time.

CREATE OR REPLACE TABLE MyTable_V2 CLONE MyTable

As a result of the CLONE command, the system simply creates a new entry in the metadata store to keep track of the new clone.

Time Traveling Clones! Oh My!

In addition to simple cloning of objects, you can blend cloning with Snowflake time travel to clone tables, schemas, or even databases at a point in time in the past AT or BEFORE a specific timestamp. Here is an example:

CREATE SCHEMA mytestschema_clone_restore CLONE testschema BEFORE (TIMESTAMP => TO_TIMESTAMP(40*365*86400));

This command once executed will create a clone of the entire schema (tables, views, etc.) as it existed before the specified timestamp.

Hmm…time traveling clones…sounds like an episode of the X-Files (also cool).

Because Snowflake maintains the history of queries performed and identifies them by unique ID, we can also submit a request to create a clone using the unique ID of the query, BEFORE or AT a certain time stamp. This would allow you to perhaps run a revised set of scripts against an older data set then compare the results to the current data set.

In addition, once cloned, cloned objects are independent of each other. Despite being independent, there are no additional storage requirements and thus no additional charge (unless you add or modify records), since these clones share files. Thus Snowflake allows its customers to clone at multiple levels: table, schema (file format, views, sequences, stages) and databases and over time. And because they are independent, updates to one are not visible in the others.

Very helpful for experimentation and data exploration!

Cloning Example

As an example of what it looks like in the Snowflake UI, here is a snapshot of one of my demo databases with Twitter data. It is about 2 TB of data.

Twitter db

Now here is me launching the Clone via our Web UI at 3:02:49 PM:

Start Clone

And here is the result showing the cloned db created at 3:03:55 PM. Barely a minute to create a clone of a 2TB database with 10 tables!

Clone Completed

And I am not a DBA let alone a “cloud” DBA. This is all part of Snowflake being a Data Warehouse as a Service (DWaaS). Another reason why I think it is way cool!

Want to see cloning in action? Check out this video:

Why is this hard to do in traditional systems?

In traditional RDBMS systems, if you wanted to clone an existing data warehouse and give a different set of users access to the environment, you typically have to create a whole new, separate, deployment of the environment. In Snowflake, as I have just shown, you can do this with ease with just a few commands (one of the many benefits resulting from having written Snowflake from scratch for the cloud). Moreover, while most traditional data warehouse systems allow you to create snapshots of the data, this generally results in consumption of more storage (which of course costs more $$).

All these reasons are why Fast Cloning made my Top 10 list of really cool features in the Snowflake Elastic Data Warehouse.

As always, keep an eye on this blog site, our Snowflake Twitter feeds (@SnowflakeDB), (@kentgraziano),  (@cloudsommelier), and #ElasticDW for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.

Special thanks to Ashish Motivala from Snowflake Engineering for helping us with some of the more technical details on how this works so well.

Kent Graziano and Saqib Mustafa

Top 10 Cool Things I Like About Snowflake

I have now been with Snowflake Computing for a little over two months (my how time flies). In that time, I have run the demo, spoken at several trade shows, and written a few blogs posts. I have learned a ton about the product and what it means to be an Elastic Data Warehouse in the Cloud.

So for this post I am going to do a quick rundown of some of the coolest features I have learned about so far. 

#10 Persistent results sets available via History

Once you execute a query, the result set will persist for 24 hours (so you can go back and check your work). It may seem minor to some, but it sure is convenient to be able to pull up the results from a previous query without having to execute the query a second time. Saves on time and processing. Read more

#9 Ability to connect with JDBC

Again seems like a no brainer but very important. I had no real clear concept of how I would connect to a data warehouse in the cloud so this was good news.  After getting my favorite data modeling tool, Oracle SQL Developer Data Modeler (SDDM),  installed on my new Mac, I was able to configure it to connect to my Snowflake demo schema using JDBC and reverse engineer the design. 

So why is this cool? It means that whatever BI or ETL tool you use today, if it can talk over JDBC, you can connect it to Snowflake. Read more

#8 UNDROP

With UNDROP in Snowflake you can recover a table instantaneously with a single command:

UNDROP TABLE <tablename>

No need to reload last night’s backup to do the restore. No need to wait while all that data is pulled back in. It just happens!

Now that is a huge time (and life) saver. Read more

#7 Fast Clone

Even cooler than UNDROP is the fast clone feature.

The Snowflake CLONE command can create a clone of a table, a schema, or an entire database almost instantly. It took me barely a minute to create a clone of a 2TB database without using additional storage! And I am not a DBA, let alone a “cloud” DBA.

This means you can create multiple copies of production data without incurring additional storage costs. No need to have separate test/dev data sets.

Hence why I think it is way cool! Read more

#6 JSON Support with SQL

During the first demo of Snowflake I attended (before I even applied for a job here), this one got my attention.

Using the knowledge and skills I already had with SQL, I could quickly learn to query JSON data, and join it to traditional tabular data in relational tables.

Wow – this looked like a great stepping stone into the world of “Big Data” without having to learn complex technologies like Hadoop, MapReduce, or Hive! Read more

Yes, I call that a very cool feature. And the fact that the JSON documents are stored in a table and optimized automatically in the background for MPP and columnar access. This gives you the ability to combine semi-structured and structured data, in one location. For further details check out my detailed 2 part blog here and here.

#5 ANSI compliant SQL with Analytic Functions

Another key feature in Snowflake, that is required to be called a relational data warehouse, is of course the ability to write standard SQL. More so, for data warehousing, is access to sophisticated analytic and windowing functions (e.g., lead, lag, rank, stddev, etc.).

Well Snowflake definitely has these.  In fact we support everything you would expect including aggregation functions, nested virtual tables, subqueries, order by, and group by. This means it is fairly simple for your team to migrate your existing data warehouse technologies to Snowflake. Read more

#4 Separation of Storage and Compute

The innovative, patent-pending, Multi-Cluster, Shared Data Architecture in Snowflake is beyond cool. The architecture consists of three layers; storage, compute, and cloud services. Each layer is decoupled from the other, each layer is independently scalable. This enables customers to scale resources as they are required, rather than pre-allocating resources for peak consumption. In my 30+ years working in IT, I have not seen anything like it.  It is truly one of the advantages that comes from engineering the product, from the ground up, to take full advantage of the elasticity of the cloud. Read more

#3 Support for Multiple Workloads

With this unique architecture, Snowflake can easily support multiple disparate workloads. Because of the separation of compute and storage, you can easily spin up separate Virtual Warehouses of different sizes to run your ELT processes, support BI report users, data scientists, and data miners. And it makes total sense to be able to keep disparate workloads separate, to avoid resource contention, rather than just saying we support “mixed” workloads.

And even better – no special skills or secret configuration settings are required to make this work. It is the way Snowflake is built by design. Nice! Read more

#2 Automatic Encryption of Data

Security is a major concern for moving to the cloud. With Snowflake, your data is automatically encrypted by default. No setup, no configuration, no add-on costs for high security features.

It is just part of the service! To me that is a huge win. Read more

#1 Automatic Query Optimization. No Tuning!

As a long time data architect, and not a DBA, this is my favorite part of Snowflake. I do not have to worry about my query performance at all. It is all handled “auto-magically” via meta data and an optimization engine in our cloud services layer. I just model, load, and query the data.

So, no indexes, no need to figure out partitions and partition keys, no need to pre-shard any data for distribution, and no need to remember to update statistics.

This feature, to me, is one of the most important when it comes to making Snowflake a zero management Data Warehouse as a Service offering. Read more

Well, that is the short list of my top 10 favorite features in Snowflake. Keep a look out for future posts in the coming weeks, to provide details on these and other key features of the Snowflake Elastic Data Warehouse.

Now check out this short intro video to Snowflake!

If you want to learn more about Snowflake, sign up for one of our frequent webinars, or just drop me a line at kent.graziano@snowflake.net and I will hook you up!

P.S. Keep an eye on my Twitter feed (@kentgraziano) and the Snowflake feed (@SnowflakeDB) for updates on all the action and activities here at Snowflake Computing. Watch for #BuiltForTheCloud and #DWaaS.