How to Make Data Protection and High Availability for Analytics Fast and Easy

When moving enterprise data warehouse analytic workloads to the cloud, it’s important to consider data protection and high availability (HA) services that will keep your valuable data preserved and your analytics running. Possible events such as human error, infrastructure failure, an unfortunate act of nature or any activity that places your data at risk can’t be ignored.

All the while, data protection and HA should be fast and easy. It shouldn’t take days, weeks or months, nor an army of technical specialists or a big budget, to make these critical safety measures happen.  

What data-driven companies depend on

With real-time dashboards, organizations depend on data analytics to communicate the status of business operations. Increasingly, companies embed self-service analytics into customer-facing applications. Therefore, enterprises spend enormous amounts of effort, energy and resources to gather and cultivate data about their customers. With all this activity around data, loss of any data or processing capability could result in catastrophic consequences for an organization.

How Snowflake protects your data and services

For these reasons, Snowflake innovates and integrates data protection and HA as core capabilities of our cloud-built data warehouse-as-a-service. In Figure 1, you’ll see that what makes Snowflake different. Our protection capabilities are all built-in and orchestrated with metadata across your entire service. The figure also illustrates, how Snowflake resilience is automatically distributed across three availability zones.

  • Built-in data protection: Over and above standard cloud data protection, Snowflake Time Travel enables you to recover data from any point, up to 90 days.  In addition, it’s all accomplished automatically. Other than specifying the number of days for Time Travel retention at setup (default is 24 hours for Snowflake Enterprise and above), you do not have to initiate a thing or manage snapshots.


        

    Figure 1. Snowflake Built-in Data Protection and High Availability

      

This brings significant advantages for business analysts performing scenario-based analytics on changed datasets, or for data scientists who want to train new models and algorithms on old data.

  • Built-in service protection against node failures:  The impact of node failures can be tricky to figure out with different cloud implementations offered by different cloud data warehouse vendors. While other cloud data warehouse or querying services may provide some level of redundancy for current data, mechanisms to protect against data corruption or data loss in the event of a node failure vary.

    In most cases, the burden is on you to create a cluster (i.e., a system with a node count of greater than one) to protect against node failures. Typically, this means added cost (hardware, storage, and software instances), as well as added complexity, to account for the additional nodes. Some competing services may have a performance penalty on data writes. This exists because, under the covers, redundant nodes are being written using compute resources. We see this most frequently with on-premises data warehouse environments retrofitted for the cloud. Moreover, there also could be hidden costs in the form of your cluster going down and not being accessible for queries or updates during the time a failed node is being reconstructed.

    Because the Snowflake architecture separates the compute, storage and service layers, Snowflake assures resiliency and data consistency in the event of node failures. Depending on the severity of failures, Snowflake may automatically reissue (retry) without a users’ involvement. And there is also no impact on write (or read) performance. In addition, you can take advantage of lower cost storage. Competing services may highly encourage or restrict you to use premium-cost storage.
  • Built-in high availability: Providing an even higher degree of data protection and service resilience, within the same deployment region, Snowflake provides standard failover protection across three availability zones (including the primary active zone). Your data and business are protected. As you ingest your data, it is synchronously and transparently replicated across availability zones. This protection is automatically extended from Snowflake to customers, at no added charge.

    Further, all the metadata, the magic of Snowflake services, is also protected.

 

Table 1. Snowflake Data Protection and High Availability

Summary

Bottom line, within the same deployment region, you do not have to configure or struggle with manually building an HA infrastructure. Our data warehouse-as-a-service takes care of this for you, automatically. Snowflake makes data protection and high availability fast and easy. You can mitigate risks with speed, cost-effectiveness, confidence and peace of mind.

 

Subscribe to the the snowflake blog

Virtual Warehouse Billing Made Smarter

One of our most important commitments to our users is reducing/eliminating the management and tuning tasks imposed by other systems. We are always on the lookout for new and innovative ways of making our service easier to use.

Most recently, we looked at how our users manage their Snowflake virtual warehouses and realized we could be smarter about how we charge for the compute resources used in these warehouses.

First, Some Background Information…

To begin, here are some details about what happens when a virtual warehouse (or simply “warehouse”) is resumed and suspended in Snowflake:

  1. Each time the warehouse is resumed:
    • Snowflake provisions servers from AWS EC2 for a minimum of one hour. This is based on how AWS charges for EC2 resources.
    • The number of Snowflake credits charged depends on the number of servers provisioned, which is determined by the size of the warehouse (XS, S, M, L, XL, etc.) and the number of clusters in the warehouse (if multi-cluster warehouses are enabled).
  2. When the warehouse is suspended, the servers are shut down and credits are no longer charged.
  3. When the warehouse is resumed again, the servers are re-provisioned and the charges start over, regardless of how much time has passed since the warehouse was last charged. As a result, if the same warehouse is resumed multiple times within the same hour, credits are charged each time. As stated earlier, this follows the AWS EC2 model.

What We Changed

We noticed that some of our users were spending time and effort managing their virtual warehouses to ensure that credits were not consumed unnecessarily. We decided to eliminate this extra work by introducing Warehouse Billing Continuation (WBC).

With WBC, we now track the last time each individual server in a warehouse was charged and, if the warehouse is suspended and resumed within 60 minutes of the last charge, we don’t charge again for the server. The charge is continued from the last time as if the warehouse had never been suspended. This eliminates any extra charges, thereby reducing the need for strictly monitoring and controlling when warehouses are suspended and resumed.

How Does WBC Work?

The simple answer is it just works, regardless of how often you resume and suspend your virtual warehouses. If this answer satisfies your curiosity, you can skip now to the end of this post. Otherwise, read on for the gory details…

The best way to explain WBC is with examples. Say you have a Small warehouse (2 servers) that’s been suspended for longer than an hour. Now, imagine this warehouse goes through the following status changes:

Resumed Suspended Credits Charged before WBC Credits Charged with WBC
09:15 09:25 2 2
09:40 09:50 2
10:05 10:10 2
10:30 10:50 2 2
11:20 11:40 2 2 (at 11:30)

Before WBC, every time the warehouse was resumed, we charged 2 credits (1 credit per server in the warehouse). For example, in the scenario described above, between 09:20 and 11:40, the warehouse incurred 5 charges for a total of 10 credits.

With WBC, the behavior is different. The warehouse only incurs 3 charges for a total of 6 credits. The following diagram provides a more detailed explanation of what actually happens:

Example of charges for resuming and suspending a Small virtual warehouse
Figure 1: Example of charges for resuming and suspending a Small virtual warehouse
  1. At 09:15, 2 credits are charged for the servers in the warehouse because the warehouse has been suspended for longer than an hour and there’s no previous charge to continue.
  2. At 9:40 and 10:05, we recognize that the warehouse was charged within the last hour so no additional charges are incurred. In other words, the current charge doesn’t expire until 60 minutes after it was first incurred, so the earliest the warehouse will be charged again is 10:15, regardless of how many times the warehouse is suspended and resumed during this period.
  3. At 10:15, the warehouse isn’t charged because it’s not running at that time.
  4. At 10:30, the 2 servers are charged again because more than 60 minutes have elapsed since the initial charge. More importantly, this new time is now used to track all subsequent charges, i.e. the earliest time for the next charge is 11:30.
  5. At 11:20, no charge is incurred because 60 minutes haven’t elapsed since the last charge.
  6. At 11:30, the warehouse is charged again because 60 minutes have now elapsed since the last charge and the warehouse is running.

This example covers a relatively simple case for a Small warehouse. Each successively larger warehouse has more servers, so the scenarios are slightly more involved, especially if the warehouse is resized (or multi-cluster warehouses are being used); however, the mechanics and calculations are all the same. The most important thing to remember is that every server in a warehouse is charged independently.

WBC Example with Virtual Warehouse Resizing

Consider the same example from earlier, but starting with a Medium warehouse (4 servers) and the following resize events:

  • Warehouse resized down to Small (2 servers) at 09:30, 10:00, and 11:00.
  • Warehouse resized back to Medium (4 servers) at 09:45, 10:45, and 11:15.
Resumed Credits Charged for Resume Resized Credits Charged for Resize
09:15 4 09:30 (Small)
09:40 09:45 (Medium)
10:00 (Small)
10:05
10:30 2 10:45 (Medium) 2
11:00 (Small)
11:15 (Medium)
11:20 2 (at 11:30) 2 (at 11:45)

The total credits charged would be (4+2+2) + (2+2) = 12. The following diagram shows how the charges are incurred (remember that I warned you earlier about the gory details):

Example of charges for resuming and suspending a Medium virtual warehouse with resizing
Figure 2: Example of charges for resuming and suspending a Medium virtual warehouse with resizing
  1. At 09:15, the initial charge for the Medium warehouse is 4 credits.
  2. At 10:30, the new charge is 2 credits (reflecting the Small size at the time the warehouse is resumed). These 2 servers will be charged next in 60 minutes at 11:30.
  3. At 10:45, a new, separate charge for 2 additional servers is incurred (due to the resize to Medium). These 2 additional servers will be charged next at 11:45, independently of the other 2 servers.
  4. All the other resizing events increase or decrease the number of servers running at that time, but incur no additional charges.

Note that this diagram illustrates how we remove servers from a warehouse, i.e. we always start with the most recently-provisioned servers (i.e. LIFO) and we add them back in the same order. This is important because servers are each charged according to their individual start times. This is also important because there’s no benefit to reducing the size of a warehouse within each hour that it runs because the servers have already been charged for the hour.

So, What Next?

There’s no need to make any changes to your virtual warehouses. We’ve enabled Warehouse Billing Continuation by default for all accounts and warehouses, including any existing warehouses. In fact, we implemented WBC near the end of March so you probably have already noticed reduced charges for some of your warehouses, particularly the ones that you resume and suspend frequently.

However, to take full advantage of this change, you might want to revisit your warehouse procedures and settings. For example, you can now set auto-suspend for a warehouse to a shorter value (e.g. 5 or 10 minutes) without worrying about being charged additional credits each time the warehouse resumes within the hour.

Interested in learning more? Check out the Snowflake documentation for more in-depth information about warehouses:

Also, keep an eye on this blog or follow us on Twitter (@snowflakedb) to keep up with all the news and happenings here at Snowflake Computing.