Accelerate Your Way to a Modern Cloud Analytics Platform

Many times, I see clients who have no current cloud footprint use the power of analytics to pull the rest of the organization toward the cloud. These new adopters leverage many powerful tools without making a massive, upfront investment. This can seriously accelerate their analytics abilities and their cloud maturity.

The depth of maturity on the curve determines the richness of your cloud analytics capabilities.


The Cloud and analytics maturity curve explained

The graph shown above illustrates a typical analytics maturity curve. As you can see, an analytics department that’s on the lower end of the maturity curve is typically relying upon “descriptive” analytics that can provide the business with information about the past. An analytics department that’s on the mid-range of the maturity curve can provide a  more “predictive” (insights-driven) view of the enterprise that offers a glimpse into what the future may look like for the business. An analytics department that’s on the mature end of the curve, however, is positioned to provide “prescriptive” analytics that offer deeper, actionable insights that enable a business to operate with greater agility and plan for the future with a higher degree of accuracy.

You can take the same concept of the analytics maturity curve and apply it directly to the cloud. For example, an enterprise with an immature cloud analytics platform can only source and provide descriptive analytics. An enterprise working with a cloud analytics platform in the mid-range of maturity, on the other hand, has typically migrated a critical mass of data but is only just starting to dive into all of the analytical capabilities the cloud can provide. The enterprise with a high degree of maturity on a cloud platform, however, is typically operating with proven storage and processing capabilities and is now starting to look towards cloud-native tools (e.g. Amazon Web Services Sagemaker, Azure Machine Learning, or Google’s Cloud AI) that can significantly accelerate an analytics platform. It’s only when a platform is mature enough that an organization can begin to fully leverage groundbreaking analytics concepts, such as Machine Learning (ML) and artificial intelligence (AI), to which every enterprise aspires.

Accelerate your journey along the maturity curve

For many organizations, the journey along both the analytics and cloud maturity curve can seem incredibly daunting, expensive and unfeasible. When a business finally reaches a point where they must start along the journey (or risk falling behind the competition) they generally experience trouble getting past the initial hurdle, which is resistance to change. Or, the business doesn’t want to provide any resources to help launch journey. It then becomes purely an a IT initiative, which often lacks the support to match the velocity needed.

So, how can an organization accelerate themselves along the cloud maturity curve and get to advanced analytics faster? By leveraging strong, cloud-native tools like Snowflake. What is essential, however, is a plan.

Too many times, a cloud analytics platform falls down because there is no concrete plan to constantly innovate. I’ve seen enterprises invest in a single solution, such as an on-premises data warehouse that’s been shifted to the cloud, because they think it will solve all their problems. They soon realize, however, that it’s not that simple. The problem with a “cloudified” version of a data warehouse solution is that it flies directly in the face of what a modern data architecture is. In fact, one of the key components of a modern data architecture is that it’s decoupled. That means you should be able to replace any one piece of your architecture, such as your ETL solution, with the newest, best-of-breed tool and do so with minimal impact to the rest of your architecture.

This is where a data warehouse built for the cloud like Snowflake becomes such a great accelerator. Snowflake’s multi-cluster, shared data architecture separates storage and compute, making it possible to scale up and down on-the-fly, without downtime or disruption. By providing a central storage repository that’s separate from your computing resources, only then can you move quickly through to the higher levels of the maturity curve. When you’re moving towards the prescriptive and predictive phases, Snowflake provides a foundation that allows you to build out a truly modern data architecture, leveraging all of the benefits of the cloud with tremendous benefits to the enterprise.

No matter where you are on the cloud maturity curve, data storage and warehousing will always be a central component of your architecture. By leveraging cloud-based solutions like Snowflake, you remove much of the developmental overhead that you would have with a solution that had originally been built to be deployed on-premises. And, by working with implementation partners like Slalom, you can get the most out of your cloud data warehouse, accelerate your journey to the top of the cloud maturity curve, then begin using truly revolutionary analytics, such as ML and AI, to which every enterprise aims.

Author Bio: James Anderson is a Solution Architect at Slalom, a Snowflake Implementation Partner headquartered in Seattle. James specializes in analytical data platforms built in the cloud, helping to enable enterprises get the most out of their data, and unlocking new and exciting technologies for them. James is based out of Slalom Boston, and is a frequent contributor to the Slalom technology blog.

Deliveroo Delivers with Real-time Data

In a field of struggling food delivery startups, one notable success story has emerged from the fray. Termed “the European unicorn” by TechCrunch, Deliveroo is a British startup that offers fast and reliable food delivery service from a premium network of restaurants.

Deliveroo recently raised a $385 million funding round, boasts an estimated $2 billion valuation and is credited with transforming the way people think about food delivery. What is this unicorn doing differently? How has it found success where so many others have failed?

“Data is baked into every aspect of the organization,” Deliveroo’s head of business intelligence, Henry Crawford said. “Having instant access to data reveals which geographic areas are experiencing a shortage of restaurants and a shortage of particular cuisines so we can create these hubs right at the consumer’s doorstep.”

Deliveroo analyzes customer behavior, gains insights into market trends and responds with swift decisions and rapid execution by using data-driven insights. Snowflake makes all of this possible.

“With data coming from a variety of sources, including web traffic, transactions and customer behavior, having a data warehouse built for the cloud provides one repository for a single source of truth,” Henry explains.“The shift to Snowflake’s cloud data warehouse has enabled us to make good on our promise that got Deliveroo started: To connect consumers with great food from great restaurants, wherever you are, and whatever it takes.“

Snowflake also accommodates Deliveroo’s 650% growth in 2016. Such rapid momentum prompted Deliveroo to expand its business intelligence team from two employees to 14. Additional team members triggered the need for more access to the same data but without impacting performance.

Since Snowflake is built for the cloud, an unlimited number of users can access all of an organization’s data from a single repository, which is critical to Deliveroo’s success. There’s no replicating data, shifting queries and other workloads to non-business hours, or queueing users to preserve performance. Instead, Snowflake’s true cloud elasticity means Deliveroo can automatically scale up, down and out (concurrency) to load and analyze data without disruption.

“None of these future plans would be possible without real-time, concurrent access to massive volumes of data,” Henry said.

What’s next for Deliveroo? Using real-time logistics algorithms to increase the number and the speed of deliveries. Deliveroo’s expansion plans also include an “Editions” program—delivery-only kitchens so partner restaurants can expand their footprint without opening brick-and-mortar locations.

Learn more about how Snowflake can accelerate your data storage and analytics initiatives.

Snowflake Vision Emerges as Industry Benchmark

Technology research and analysis firm Gigaom has ranked Snowflake as the #1 cloud data warehouse in a recent study. We surpassed enterprise data warehouse products including, Google BigQuery, Teradata, IBM dashDB, HPE Vertica, Microsoft Azure SQL, SAP HANA and Oracle Exadata. Snowflake emerged with a top score of 4.85 out of a possible 5.0. The competition averaged a score of 3.5. The six “disruption vectors” Gigaom used as its key scoring criteria are congruent with what we wanted to achieve back in the summer of 2012, when we started Snowflake.

But long before we wrote the first line of Snowflake code, we asked one another: “What should a data warehouse deliver that no other product has before? How can we enable organizations to make the best, data-driven decisions? And how will the world’s most powerful data warehouse help organizations achieve their existing goals and help reveal their future goals?” We then set out to answer those questions.

We wanted to enable organizations to easily and affordably store all of their data in one location, and make that data accessible to all concurrent users without degrading performance. We also wanted Snowflake to scale infinitely, with ease, and cost effectively so organizations would only pay for the compute and storage they used. And the product had to work with the tools that users already knew and loved. Finally, we wanted a data warehouse that required zero management by our customers – nothing to tweak, no tuning required. These defining qualities aligned with the new world of cloud services, and they are what formed the foundation of Snowflake.

What’s happened since the early days of Snowflake? We got to work, and we stuck to hiring the best engineers the world has to offer. We built Snowflake from the ground up, for the cloud, and incorporated all of these elements as the core of the product. In early 2015, we offered the first commercial version of Snowflake – the one and only data warehouse built for the cloud. Since then, our engineering team has added more and more industry-leading capabilities to Snowflake, leapfrogging the traditional data warehouse vendors.

Along the way, we’ve hired high-calibre teams to execute the sales, marketing and finance functions of the company so our customers and partners get the highest value from working with Snowflake. We also built a great customer support organization, providing the level of service our users love. In more recent times, we’ve expanded operations outside of North America to Europe, with Asia-Pacific and other regions coming online soon. We’ve also added Snowflake On Demand™ – the easiest way to get started with Snowflake by simply signing up on our website with just a credit card. All of these efforts over the past four years have led to Snowflake’s most recent inflection point – being chosen as the number one cloud data warehouse.

What does all this mean? Snowflake’s current and future customers have every opportunity to explore all of their data in ways they never thought possible. They can gain the insight, solve the problems and create the opportunities they simply couldn’t with their previous data platforms. We committed to building the world’s best data warehouse – the only data warehouse built for the cloud. Our customers, our partners and now the industry have indicated we’ve likely achieved what we set out to do back in the summer of 2012. Going forward, we’ll continue to serve our customers and partners with the best technology, the best solutions and the best services available.

Read the full report >

Migrating to the Cloud? Why you should start with your EDW

Many organizations we engage with are seriously considering transforming their business and moving some (or all) of their IT operations into the cloud. A lot of executives I have encountered are struggling with the same question: “How do I get started?” There is a strong case to be made that starting with your Enterprise Data Warehouse (EDW), or at least a data mart, is the fastest, and most risk-free path, with added upside potential to increase revenue and set you up for future growth. As operational data volumes continue to grow at exponential rates, it’s not a matter of if you go to the cloud to manage your enterprise data, but when.

Before going too far on your cloud journey, I would recommend an exercise in segmenting your business from an IT perspective in a very simple way. To get you started, let me suggest five possible categories, along with some risks to consider for each:

  • Customer-facing Applications – This is the heart and soul of your business. If something goes wrong, you lose business and revenue, and people potentially get fired. Risk: HIGH
  • Internal Applications – Mail, Payroll, General Ledger, AP, AR, things like that. Every person inside the organization relies on at least one of these services, and a lot of analysis needs to take place to figure out all the integration points to ensure nothing gets missed during a migration to the cloud. Risk: HIGH
  • Desktop/Laptop OS and Applications – There are whole books and schools of thought about how to migrate these, which means it’s a big decision and a big deal. Impacting everyone in the company on your first cloud initiative? Risk: HIGH
  • Operations Monitoring and Alerting – Got a Network Operation Center (NOC)? These guys are integrated with every system that is important, so moving them to the cloud could be a large undertaking. Risk: HIGH
  • Reporting and Analytics – Hmmm….if my constituents don’t get their weekly or monthly reports on time, is that a disaster? Can they get by with a small outage during the migration? Risk: LOW

Starting with the Data

Let’s take a closer look at why starting your cloud journey with your EDW could be a viable option, and even have some benefits that could help sell the idea (of the cloud) internally. In no particular order, I would highlight these points:

  • Doesn’t disrupt the business – Many EDW implementations are not mission critical today (as compared to enterprise applications). As more data becomes available through social media or Internet of Things (IOT) applications, businesses need access to much larger volumes of data and they will want access to it earlier in the data pipeline. Traditional DWs contain aggregations and are used for doing trend analysis, analyzing data over a period of time to make strategic, rather than tactical decisions. They are not architected to handle this new influx of raw data in a cost-effective manner. By starting your cloud journey with the EDW, you reduce risk (by going to a more flexible architecture) while getting your team early exposure to working with cloud services.
  • Doesn’t disrupt internal users – When moving to the cloud, you want to show incremental success and don’t want to add a lot of unnecessary risk. It’s simple to keep running your existing EDW in parallel with your new cloud DW, giving you a built-in fall-back plan for the early stages. Or you may decide to start with a small data mart as a pilot project.
  • Start-up costs are a fraction of on-premises, appliance solutions – Some of our customers invested as much as $10 million (or more) years ago on a data warehouse appliance that is now outdated technologically. And the renewal costs to keep that tech going are coming due. If they re-invest another huge sum of money, this will delay them getting to the cloud by another 4-5 years, putting them behind their competition. Rather than outlaying a large capital expenditure to extend the life of the older technology, it may make better sense to move to the cloud. The cloud offers a utility-based model, allowing you to pay for what you use and when you use it, as opposed to what you think you are going to need 2-3 years in the future. As a result, not only is the cost of entry lower, but you are not risking a huge sum of money to make the move.
  • Data is growing at an exponential rate – Will you ever have less data to worry about in your business? If you plan on being successful, I don’t think so. Many organizations are looking at new and different ways to manage and analyze ever-increasing volumes of data coming in various formats from multiple sources (such as semi-structured web logs). Your current on-premises EDW was not designed for this kind of workload or data.  If you are considering changing infrastructure platforms to accommodate it, why not select tools that were built for today’s modern data challenges instead of legacy-based architectures? Moving to the cloud also gives you the opportunity to consolidate operations and streamline business processes.
  • Enable new capability – There are some new analytic paradigms happening in the cloud (such as machine learning). Cloud-based platforms allow you to work with both detailed and aggregated data at scales never imaged (see the case study about DoubleDown as an example). Need to run a complex analytic job with a 256-node Massively Parallel Processing (MPP) cluster for an hour, and then shut it down? No problem. Can your platform support a thousand users without concurrency issues?  How would that change your business if it could dynamically adjust to handle those new demands?

As with any infrastructure move, the benefits have to be clear enough that the status quo mentality can be overcome and analysis paralysis doesn’t push out your journey to the cloud for months or even years. The beauty of the cloud model is that it is easy to start small and scale without risking a huge investment up front. Every business needs some proof before committing time and resources to move anything to the cloud and your EDW is a perfect candidate. Snowflake is the first and only EDW built for the cloud to be truly elastic for all of your analytic and big data needs.

Please feel free to reach out to us at We would love to help you on your journey to the cloud. And keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and happenings here at Snowflake Computing.

Elasticity & Separation of Compute and Storage

Hopefully you had a chance to read our previous top 10 posts. As promised, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:

#4 Elasticity & Separation of Storage and Compute

The innovative, patent-pending, Multi-Cluster, Shared Data Architecture in Snowflake takes full advantage of the elasticity of the cloud. With the advantage of building a data warehouse from the ground up, Snowflake has built an environment that really allows the customer to scale their data warehouse as they want. This unique environment gives the users the ability to add more compute power on demand, even in the middle of a process.

How it works

  1. Data is stored, once, on elastic disk storage in the cloud (for example it is currently stored AWS S3)
  2. Compute clusters (for example using AWS EC2 nodes) are allocated dynamically as required to do work (queries or loads)
  3. The compute clusters are connected to the data storage. The access to the data is based on the approved security credentials of the user that is running the query

The important point is that storage and warehouse are not tightly coupled together. This means you can grow and reduce both independently of each other.

More Power at the Push of a Button

Say you are running a series of complex SQL queries on a Small (2 node) warehouse in Snowflake. Normally you have until the close of business to run the reports and prepare the data. But today, at 8 AM, your management requests the results of that analysis for a board meeting at noon. So now you have half the time to get the reports to run. Yikes!

In a traditional data warehouse you would pretty much have only one option: kick everyone off the system and hope things run faster.

If however, you are running the reports on the Snowflake Elastic Data Warehouse, all you have to do is increase the size of the warehouse to a Medium which doubles the horsepower of the data warehouse (i.e. it doubles the number of nodes). And you can perform this change with a few clicks in our easy to use web-based interface, even if the process has already started.

Warehouse Resize 1

Warehouse Resize 2

Once you bump up the size of the warehouse, the next statement that starts will immediately use the additional resources. Notice that the first statement running with the Medium warehouse runs in about half the time of the prior statement, with about double the data!

Warehouse Resize 3

You don’t need to interrupt the process, start over, or restart the service. It just takes effect automatically, at the push of a button. And once the process is done, you can reduce the warehouse back to a Small or even just turn it off.

And in this way, the unique cloud-based architecture of Snowflake allows you to truly take advantage of the elasticity of the cloud for your data warehouse.


In many other architectures, cloud and on-premises appliances, you have a certain amount of disk space associated with the compute nodes. If you need more storage; you must add more nodes. If you need more compute power, likewise you add more nodes. Plus other architectures may not allow you to scale down. Also in most of these cases, when adding more nodes, you must also shut down the environment or make it read only for a time while you redistribute the data manually across the new nodes.

Thus in the legacy data warehouses, elasticity is limited. And this forces you to plan for, acquire and manage the resources for your expected, future peak capacity and peak data demand. Otherwise you risk having your queries or your hardware fail (because you run out of disk capacity or CPU power, or both). In addition, trying to scale beyond existing capacity may lead to disruptions to the service (as you rebuild/extend the hardware), or some workloads being rejected in favor of others. And if you over-provision, and buy too much capacity, there may not be a way to scale back after the fact.

The Snowflake Advantage

For Snowflake, this is not an issue. You can grow and shrink the environment dynamically. The data storage grows and shrinks as you add or remove data, while the compute nodes can be ramped up or down, or turned off, as you require. You are not forced to pay for capacity up front, or kick other workloads off, or plan downtimes when ramping up your data warehouse capacity. That is the promise of cloud. This is why Snowflake is the real elastic Data Warehouse as a Service.

Very cool indeed.

As always, keep an eye on this blog site, our Snowflake Twitter feed (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.

Kent Graziano and Saqib Mustafa

JSON Support with Snowflake

Hopefully you had a chance to read our previous top 10 posts. As promised, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake. 

#6 JSON support with Snowflake

One of the things that got people at #Strata Hadoop excited this week was our support for JSON and other semi-structured data types. For traditional data warehouse users, the world of the big data can be challenging. We are used to using SQL to query data, used to having a well defined data model, and knowing what the source and target schemas look like. We needed a simpler way to easily handle the flexible schemas that come with using semi-structured data like JSON documents. As much as they try, legacy data warehouse systems do not provide extensive support for JSON data, and big data systems require learning new extensive programming skills.

When our founders started out from scratch to build a data warehouse for the cloud, they wanted a solution that could combine all your data in one place without the need to resort to using multiple platforms or programming paradigms. As a result, combining structured and semi-structured in one place and making it available through standard ANSI SQL is a strong feature of the Snowflake service and extensively used by our customers.

Snowflake was built with features to simplify access to JSON data and provide the ability to combine it with structured data! Using Snowflake, you can learn to query JSON data using SQL, and join it to traditional tabular data in relational tables easily. Our innovative approach allows the user to store the JSON documents in a relational table using a new data type (VARIANT) that is optimized automatically in the background for MPP and columnar access.

This is a great way to eliminate  the gap between the big data world and the relational world and simplify access for users. Most legacy databases (with their legacy code base) can not do this efficiently.  Some legacy data warehouse providers cannot support JSON data at all, and you may have to acquire and manage a separate big data system.  Others may require some sort of pre-processing of the data such as conversion into simplified CSV type data. This may make it easier to ingest the data, but requires time and resources. And this also limits the ability to accommodate potential changes that come with a flexible schema data type into the relational data model easily. In addition, the JSON data may be stored in a text field, rather than an optimized data type, which has its cost in speed of query execution and data storage.

Snowflake makes semi-structured data available inside the data warehouse service seamlessly. The data can be ingested directly into a table in Snowflake and can then be queried easily. And any changes to the schema of the inbound JSON are accommodated automatically without impact to existing queries.

Example Code

In this scenario, we are going to use Snowflake SQL extensions for querying semi-structured data, and our innovative data type (VARIANT) to join data to other purely relational tables. We are going to combine Twitter data (JSON data) with  product data in relational tables.

The main table which stores the Twitter JSON data,, has two columns: tweet and created_at.  The column Tweet is defined as a VARIANT type and holds the JSON from a Twitter feed, while created_at is a relational column with a data type of TIMESTAMP_NTZ (NTZ = no time zone).

Tweet table

Here is an example showing a fairly simple SQL query with the JSON extensions. In this query we are joining some Twitter data to product data in relational tables in order to get a count of Tweets that contain hashtags related to a particular product:

select extract('day',created_at) janday,count(*) cnt
  from t,

     -- unnest a tweet on the hashtags of each entities
     lateral flatten (input=> t.tweet,'entities.hashtags')tags,
     (select distinct ph_hashtag
        where p_name ='Blue Sky'
        and   p_productkey = ph_productkey) p

     where tags.value:text::string = p.ph_hashtag
     and   created_at >= '2014-01-01 00:00:00'
     and   created_at >= '2014-02-01 00:00:00'

    group by 1
    order by 1

The following  section of the code pivots the elements in the JSON string into a set of rows so we can do traditional joins:

     -- unnest a tweet on the hashtags of each entities
     lateral flatten (input=> t.tweet,'entities.hashtags')tags,

Specifically it is pulling out a nested array of hastags within the entities element. Then the predicate is where we join those hashtag values in the Tweet string to the hashtag column within the Product table (aliased as “p”):

     where tags.value:text::string = p.ph_hashtag

In this case “tags” equal the virtual table alias created by the FLATTEN function and the keyword “value” indicates we want the content of that row. The rest of the specification indicates it is text data that we want cast as a STRING so it matches the data type of the column p.ph_hastag. 

Then the last part of the predicate is a normal filter for a date range using the date column in the the TWEETS table:

     and   created_at >= '2014-01-01 00:00:00'
     and   created_at >= '2014-02-01 00:00:00'

So there you have it, using SQL to combine semi-structured data with traditional structured data in a relational data warehouse in the cloud. No big data system required.

Not bad!

But that is just a flavor of how you can utilize Snowflake to easily get value from your JSON data. There’s way more than we can cover in a simple blog post (like building relational views on the JSON for example).

Want to find out more? Ask us for a demo or Check out the presentation by Grega Kaspret (@gregakespret) from Celtra Mobile at Strata Hadoop World (San Jose) this week, talking about simplifying a JSON data pipeline using Snowflake. And follow our Twitter feeds:  (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake and  updates on all the action at Snowflake Computing.

Kent Graziano and Saqib Mustafa

Snowflake Fast Clone

Hopefully you had a chance to read our previous posts: Query Results Sets available in history,   Ability to connect with JDBC, and  Undrop a table, schema or a database. As promised in the original Top 10 post, we continue the series with a deeper dive into another of the Top 10 Cool Features from Snowflake:

#7 Fast Clone

Even cooler than UNDROP is our fast clone feature.

Have you ever had to wait hours, days, maybe weeks to spin up a copy of your production data warehouse so you could have a test or a development environment? And have to pay extra for the test or development environment to able to hold all the replicated data? Or maybe you have some new data scientists that just want to play around with the data but they really need their own copy?

With the simple Snowflake CLONE command, our customers can create multiple copies of the data tables, schemas, and databases, without replicating the data itself. This gives our customers the ability to almost instantly make the data available to use for multiple user groups, without the additional cost (or time) of actually replicating the data.


Almost –

Magic with Meta Data.

Based on our unique solution, Snowflake users are able to clone their tables, schema or databases without creating additional copies. Snowflake stores data in files that are immutable, and encrypted, as part of our architecture. Our cloud services layer, with the metadata repository, records the information regarding the files being stored, the file locations, and a reference to a certain version of the data. This is also kept encrypted.  In addition, when any data changes, the Metadata repository is automatically updated to provide a pointer to the changed data.  All of this is performed in the background by the software without any involvement from the user. The metadata repository still retains the record for all versions of the data set.

How Cloning works

Because of the data in the metadata store, the user can quickly create a clone of the table. All the user has to do is to submit the clone command. This command can be submitted at any time.


As a result of the CLONE command, the system simply creates a new entry in the metadata store to keep track of the new clone.

Time Traveling Clones! Oh My!

In addition to simple cloning of objects, you can blend cloning with Snowflake time travel to clone tables, schemas, or even databases at a point in time in the past AT or BEFORE a specific timestamp. Here is an example:

CREATE SCHEMA mytestschema_clone_restore CLONE testschema BEFORE (TIMESTAMP => TO_TIMESTAMP(40*365*86400));

This command once executed will create a clone of the entire schema (tables, views, etc.) as it existed before the specified timestamp.

Hmm…time traveling clones…sounds like an episode of the X-Files (also cool).

Because Snowflake maintains the history of queries performed and identifies them by unique ID, we can also submit a request to create a clone using the unique ID of the query, BEFORE or AT a certain time stamp. This would allow you to perhaps run a revised set of scripts against an older data set then compare the results to the current data set.

In addition, once cloned, cloned objects are independent of each other. Despite being independent, there are no additional storage requirements and thus no additional charge (unless you add or modify records), since these clones share files. Thus Snowflake allows its customers to clone at multiple levels: table, schema (file format, views, sequences, stages) and databases and over time. And because they are independent, updates to one are not visible in the others.

Very helpful for experimentation and data exploration!

Cloning Example

As an example of what it looks like in the Snowflake UI, here is a snapshot of one of my demo databases with Twitter data. It is about 2 TB of data.

Twitter db

Now here is me launching the Clone via our Web UI at 3:02:49 PM:

Start Clone

And here is the result showing the cloned db created at 3:03:55 PM. Barely a minute to create a clone of a 2TB database with 10 tables!

Clone Completed

And I am not a DBA let alone a “cloud” DBA. This is all part of Snowflake being a Data Warehouse as a Service (DWaaS). Another reason why I think it is way cool!

Want to see cloning in action? Check out this video:

Why is this hard to do in traditional systems?

In traditional RDBMS systems, if you wanted to clone an existing data warehouse and give a different set of users access to the environment, you typically have to create a whole new, separate, deployment of the environment. In Snowflake, as I have just shown, you can do this with ease with just a few commands (one of the many benefits resulting from having written Snowflake from scratch for the cloud). Moreover, while most traditional data warehouse systems allow you to create snapshots of the data, this generally results in consumption of more storage (which of course costs more $$).

All these reasons are why Fast Cloning made my Top 10 list of really cool features in the Snowflake Elastic Data Warehouse.

As always, keep an eye on this blog site, our Snowflake Twitter feeds (@SnowflakeDB), (@kentgraziano),  (@cloudsommelier), and #ElasticDW for more Top 10 Cool Things About Snowflake and for updates on all the action and activities here at Snowflake Computing.

Special thanks to Ashish Motivala from Snowflake Engineering for helping us with some of the more technical details on how this works so well.

Kent Graziano and Saqib Mustafa

Are Data Security Breaches Accelerating the Shift to the Cloud?

There is an old saying that there are two things certain in life: death and taxes. I would like to add a third one–data security breaches. The Identity Theft Resource Center (ITRC) defines a data security breach as “an incident in which an individual name plus a Social Security, driver’s license number, medical record or financial records (credit/debit cards included) is potentially put at risk because of exposure.” The ITRC reports that 717 data breaches have occurred this year exposing over 176 million records.

On the surface, finding a pattern across all such breaches may appear daunting considering how varied the targeted companies are. However, the ITRC argues that the impacted organizations are similar in that all of the data security breaches contained “personally identifiable information (PII) in a format easily read by thieves, in other words, not encrypted.” Based on my experience, I’d expect that a significant portion of the data breaches compromised data in on-premises systems. Being forced to realize the vulnerability of on-premises systems, organizations are beginning to rethink their cloud strategy.

For example, Tara Seals declares in her recent Infosecurity Magazine article that “despite cloud security fears, the ongoing epidemic data breaches is likely to simply push more enterprises towards the cloud.” Is the move to the cloud simply a temporary, knee-jerk reaction to the growing trend in security breaches or are we witnessing a permanent shift towards the cloud? Some industry experts conclude that a permanent shift is happening. Tim Jennings from Ovum for example, believes that a driving force behind enterprises’ move to the cloud is that they lack the in-house security expertise to deal with today’s threats and highly motivated bad actors. Perhaps the headline from the Onion, which declares “China Unable To Recruit Hackers Fast Enough To Keep Up With Vulnerabilities In U.S. Security Systems” is not so funny after all.

But are the cloud and cloud offerings more secure than their on-premises counterparts? Tara Seals appears to suggest that they can be when she writes that, “Modern cloud providers have invested large sums of money into end-to-end security” by providing sophisticated security intelligence.” Let’s consider data encryption as an illustration of her point.

The principle behind safeguarding information by leveraging encryption is as old as the Roman Empire, with most organizations agreeing that it is an effective way to minimize the impact of a security breach. But if that is true, what is behind ITRC’s observation that PII was not encrypted by the impacted organizations?

The truth of the matter is that encryption is hard. Take the example of storing encryption keys using Hardware Security Modules (HSMs). In general, using an HSM is a good security practice for safeguarding encryption keys and for meeting government standards and compliance requirements. However, its utility is as useful as an unlocked safe without the proper security and operational controls to protect it. To that end, organizations moving to the cloud need to understand their cloud provider’s encryption framework to measure their effectiveness in thwarting an intruder’s attack. Things to consider when assessing a cloud provider’s encryption solution include:

  1. Encryption key wrapping strategies
  2. Rotation encryption key frequency
  3. Methods for rekeying encryption keys
  4. Ability to monitor, log, and alert when suspicious activities are performed against the HSM

Tim Jennings and Tara Seals present compelling arguments for the possible security advantage of cloud providers over their on-premises counterparts. However, I feel that there are other equally or possibly more compelling reasons than just that cloud providers have more talented security experts.

The systems that organizations use to store and analyze data are often critical to the business. As a result, any planned or unplanned outage can significantly impact productivity and may even result in lost revenue. Now imagine the position that a CISO may find herself when requesting that an emergency security patch be deployed under the aforementioned situation. Even under the best conditions, coordinating and deploying a security update may take weeks if not months, which ultimately leaves the system vulnerable to a bad actor. That’s where a cloud solution can outperform its on-premises counterpart. An effective cloud solution allows one to almost instantly deploy security updates without impacting consumers of its services and thus reducing the time that the system is vulnerable.

Alas, PII data is so financially attractive of a target, whether the data is located on-premises or on the cloud, that one should more and more attempts—some of which will succeed—to breach systems in the cloud as organizations continue to leverage more cloud services. It is therefore imperative that organizations perform their due diligence when selecting the right security-focused cloud services partners.

Snowflake Query Result Sets Available to Users via History

As promised in the previous post from Kent Graziano, here is a deeper dive into one of the top 10 cool features from Snowflake:

#10 Result sets available via History

There are a lot of times when you want to make a small change to your large query, and want to be able to see the effect of a change quickly without rerunning the previous query. This is hard in most systems because you have to rerun the previous query, using up resources and time. Our solution allows users to view the result sets from queries that were executed previously, via history. One benefit users get is that if they had already executed a complex query that took some amount of time to execute, the user doesn’t have to run the query again to access the previous results. They can just go back to the history, and access the result set. This is also beneficial when working on a development project using the data warehouse. Developers can use the result set history to compare the effects of changes to the query or to the data set, without running the previous queries again.

How to access the Result Set History

Once you execute a query, the result set will persist in the history of the user for 24 hours. This even includes queries executed through interfaces outside the Snowflake UI, like BI tools, JDBC, ODBC, etc.

To get to these results, you can go to the History page in the Snowflake web UI:


Find the query you want to recall then click on the hyper-linked Query Id. This will then let you see the details about the query plus the data it pulled.

Persitant Result Set


How does it work?

Every time a query is run in Snowflake, the query is assigned an ID. This ID along with the resulting data set is typically stored in the same place as the Metadata repository. All data is encrypted at rest and in transit. This is performed as a simultaneous operation to sending the result back to the user. In case of a large resultant data set, the results may be stored in S3. Because of the elasticity of S3 storage, this approach allows Snowflake to retain any size result set. For a query, that was run successfully, the result is kept for 24 hours, after which the results are cleaned up based on a FIFO method.

What does this mean for the user?

It may seem like a minor addition to some but it is very convenient to pull up a resultset from a previous query without having to execute the query a second time. This saves time and processing (and therefore saves $$$), another way in which Snowflake provides significant value over traditional data warehouse solutions, slashing costs, and eliminating complexity.

P.S. Keep an eye on this blog site, our Snowflake Twitter feed (@SnowflakeDB), (@kentgraziano), and (@cloudsommelier) for more Top 10 Cool Things About Snowflake.

– Kent Graziano and Saqib Mustafa