5 Must-Have Features for High Concurrency Environments

Whether you’re new to cloud data warehousing or comparing multiple cloud data warehouse technologies, it’s critical to assess whether your data warehouse environment will need to support concurrent processing of any sort. Unless you’re a lone database shop, in all likelihood, the answer is yes you will. Concurrency, or concurrent data processing, is simultaneous access and/or manipulation of the same data. This is not to be confused with parallel processing, which is multiple operations happening at the same time, but not against the same data.

Concurrency can take the form of multiple users interactively exploring and manipulating a particular data set, concurrent applications querying and visualizing the same data set, transactional updates to the data, or even concurrent loading of new data or change data into the data set. And if you thought to yourself, “What? I can concurrently load new data, while supporting queries on the same data set?,” then keep reading.

If you require concurrency, at any capacity — you will want these five cloud data warehouse features:

  • High relational performance across a broad range of data types: Of course, you want high performance–that’s a given. However, the more challenging aspect to plan for is fast queries on a broad range of data types, including semi-structured/JSON data. You don’t want your relational data warehouse to bog down and hold up corporate users because now it must handle non-traditional data from groups like the web team or product engineering. JSON is the new normal.  
  • Automatic and instant warehouse scaling: Let’s say your warehouse can handle a high number of concurrent accesses, but it bogs down during a period of high demand. Now what? This is a very important question to have answered as you compare technologies. Will you have to kick-off users? Will you have to schedule after-hour jobs? Will you have to add nodes? Will this require you to redistribute data? If so, redistributing data takes time and it’s a double-whammy because the existing data in the warehouse has to be unloaded. This is a huge disruption to your users. Instead, you want the ability to load balance across new virtual warehouses (compute engines) and have your cloud data warehouse continue to execute at fast speeds–including loading new data–all against the same data set. The more automatic this load balancing happens, the better.
  • ACID compliance: Right up there with poor performance is inaccurate results or reporting due to inconsistent data or dirty reads. With more people, workgroups, or applications accessing the same data simultaneously, the more pressure there is to maintain data consistency. A data warehouse with ACID compliance ensures consistency and data integrity are validated without having to write scripts or manually managing data integrity and consistency yourself.
  • Multi-statement transaction support: Tied to ACID compliance is multi-statement transaction support. If you have users that nest multiple transactions within a single query, you want a warehouse solution that you can trust will completely execute the transactions with integrity or will automatically rollback transactions should another transaction in the line fail.  
  • Data sharing: Data integrity and freshness should not stop just because you have to share data with an external stakeholder. Traditional approaches require engaging in an ETL process or spending time manually deconstructing, securing and transferring data. Data consumers on the receiving end must also spend time to reverse your steps and reconstruct data. This process is slow and manual, and does not ensure your data consumers are always working with live and up to date data. Data sharing allows you to eliminate all of this effort and ensure access to live data.

The business value for proactively planning for concurrency is that you want to ensure your cloud data warehouse can support your environment, regardless of what it throws at you. Especially during times of sudden, unpredictable, heavy query loads against a common data set.

Try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.

Financial Services: Welcome to Virtual Private Snowflake

Correct, consistent data is the lifeblood of the financial services industry. If your data is correct and consistent, it’s valuable. If it’s wrong or inconsistent, it’s useless and may be dangerous to your organization.

I saw this firsthand during the financial meltdown of 2007/08. At that time, I had been working in the industry for nearly 20 years as a platform architect. Financial services companies needed that “single source of truth” more than ever. To remain viable, we needed to consolidate siloed data sets before we could calculate risk exposure. Most financial services companies were on the brink of collapse. Those that survived did so because they had access to the right data.

At my employer, we looked for a way to achieve this single source with in-house resources, but my team and I quickly realized it would be an extraordinary challenge. Multiple data marts were sprawled across the entire enterprise, and multiple sets of the same data existed in different places, so the numbers didn’t add up. In a global financial services company, even a one percent difference can represent billions of dollars and major risk.

We ultimately built an analytics platform powered by a data warehouse. It was a huge a success. It was so successful that everybody wanted to use it for wide-ranging production use cases. However, it couldn’t keep up with demand, and no amount of additional investment would solve that problem.

That’s when I began my quest to find a platform that could provide universal access, true data consistency and unlimited concurrency. And for financial services, it had to be more secure than anything enterprises were already using. I knew the cloud could address most of these needs. However, even with the right leap forward in technical innovation, would the industry accept it as secure? Then I found Snowflake. But my story doesn’t end there.

I knew Snowflake, the company, was onto something. So, I left financial services to join Snowflake and lead its product team. Snowflake represents a cloud-first approach to data warehousing, with a level of security and unlimited concurrency that financial services companies demand.

We’ve since taken that a step further with Virtual Private Snowflake (VPS) – our most secure version of Snowflake. VPS gives each customer a dedicated and managed instance of Snowflake within a separate Amazon Web Services (AWS) Virtual Private Cloud (VPC). In addition, customers get our existing, best-in-class Snowflake security features including end-to-end encryption, at rest and in-transit. VPS also includes Tri-Secret Secure, which combines a customer-provided encryption key, a Snowflake-provided encryption key and user credentials. Together, these features thwart an attempted data decryption attack by instantly rendering data unreadable. Tri-Secret Secure also includes user credentials to authenticate approved users.

VPS is more secure than any on-premises solution and provides unlimited access to a single source of data without degrading performance. This means financial services companies don’t have to look at the cloud as a compromise between security and performance.

To find out more, read our VPS white paper and solution brief: Snowflake for Sensitive Data.


Empowering Data Driven Businesses via Scalable Delivery

The accessibility of data has changed the way that businesses are run. Data has become essential to decision makers at all levels, and getting access to relevant data is now a top priority within many organizations. This change has created huge demands on those who manage the data itself. Data infrastructures often cannot handle the demands of the large numbers of business users, and the result is often widespread frustration.

The concept of an Enterprise Data Warehouse offers a framework to provide a cleansed and organized view of all critical business data within an organization. However, this concept requires a mechanism to distribute the data efficiently to a wide variety of audiences. Executives most often require a highly summarized version of the data for quick consumption. Analysts want the ability to ‘slice and dice’ the data to find interesting insights. Managers need a filtered view that allows them to track their team’s data in the context of the organization as a whole. Strategists and economists need highly specialized data that can require highly complex algorithms and tools to generate. If one system is to serve the data for all of these audiences, along with numerous others, it must be able to scale to provide access to all of them.

Traditional Data Warehouse architecture has not been able to adequately respond to this demand for data. Many organizations have been forced to move to hybrid architectures in response. OLAP tools and datamarts became commonplace as mechanisms to distribute data to business users. Statisticians and later data scientists often had to create and maintain entirely separate systems to handle data volumes and workloads that would overwhelm the data warehouse. Data is copied repeatedly into a variety of systems and formats in order to make it available to wider audiences. A vice president at a large retail company openly referred to their data warehouse as a data pump that was entirely consumed by the loading of data on one side and the copying of that data out to all sorts of systems on the other side. This also leads to challenges in ensuring there is one vision of the data inside the organization and synchronizing all the siloed data.

During my time working in a business intelligence role for a number of the largest organizations in the world, I encountered all of these problems. While supporting the Mobile Shopping Team at a major online retailer, I had access to a wide variety of powerful tools for delivering data to my coworkers that relied on that data for making key decisions about the business. However coordinating all of these systems so that they produced consistent results across the board was a huge headache. On top of that, many of the resources were shared with other internal teams, which meant that we were regularly competing for access to a limited pool of resources. There were a number of situations where I could not get data that was requested by my VP in a timely manner because other teams had consumed all available resources in the warehouse environment.
The architects behind Snowflake were all too familiar with these problems, and they came to the conclusion that the situation demanded an entirely new architecture. The rise of cloud computing provided the raw materials: vast, scalable storage completely abstracted from the idea of server hardware or storage arrays and elastic compute. More importantly, this storage is self-replicating, thus automatically creating additional copies of items in storage if they become ‘hot’. This new storage, combined with the availability to leverage nearly unlimited computing power, is at the heart of the multi-cluster shared data architecture that is central to Snowflake’s Elastic Data Warehouse.

data_warehouse (2)

At Snowflake, data is stored once, but can be accessed by as many different compute engines as are required to provide responses to any number of requests. Data workers no longer need to focus on building out frameworks to deal with concurrency issues. They can keep all of their data in one system and provide access without having to consider how different groups of users might impact one another. The focus can then be placed where it belongs, on deriving value from the data itself. Thus you can load data and at the same time have multiple groups, each with their own computing resources, query the data. No more waiting for some other groups to free up resources. Your business can now deliver analytics to the various users: executives, analysts, data scientists, and managers, without the environment getting in the way of performance.

As always, keep an eye on our Snowflake-related Twitter feeds (@SnowflakeDB and (@ToddBeauchene) for continuing updates on all the action and activities here at Snowflake Computing.