Why You Need a Cloud Data Warehouse

Are you new to the concepts of data warehousing? Do you want to know how your enterprise can benefit from using a data warehouse to streamline your business, better serve your customers and create new market opportunities? If so, this blog is for you. Let’s cover the basics.

It begins with production data

Day-to-day, nearly all enterprise-class companies process data as part of core business operations. Banks process debit and credit transactions for account holders. Brick-and-mortar and online retailers process in-store and website purchases, respectively. Insurance companies maintain and update customer profile and insurance policy information for policyholders.

The nature of these production systems is transactional and require databases that can capture, write, update or delete information at the pace of business operations. The systems behind these transactions are online transaction processing (OLTP) databases. For example, OLTP databases for investment firms must operate at lightning speed to keep up with high-volume stock and bond trading activity that occur in fractions of a second.

The need for a data warehouse solution

In addition to capturing transactions, another aspect of business operations is to understand what’s happening, or what has happened, based on the information captured with OLTP databases. By this, I mean companies must not only know how much revenue is coming in, they must know where revenue is coming from, the profile of customers making the purchases, business trends (up or down), the products and services being purchased and when those transactions are taking place. And, certainly businesses need to know what it will take for customers to remain loyal and buy more. Answers and insights to these questions are necessary to develop strategic business plans and develop new products that will keep businesses growing.

Why transactional (OLTP) systems are not optimized for data warehousing

Acquiring these insights requires accumulating, synthesizing and analyzing the influx of data from OLTP databases. The aggregation of all this data results in very large data sets for analytics. In contrast, when OLTP systems capture and update data, the amount of data transacted upon is actually very small. However, OLTP systems will execute thousands upon thousands of  small transactions at a time. This is what OLTP systems are optimized to do; however, OLTP systems are not optimized for the analysis of large to extremely large data sets.  

This is why data warehousing solutions emerged. Data warehouse solutions will hold a copy of data stored in OLTP databases. In addition, data warehouses also hold exponentially larger amounts of data accessed by enterprises, thanks to the enormous amount of Internet and cloud-born data. Ideally, data warehouses should be optimized to handle analytics on data sets of any size.  A typical data warehouse will have two primary components: One, a database (or a collection of databases) to store all of the data copied from the production system; and two, a query engine, which will enable a user, a program or an application to ask questions of the data and present an answer.

Benefits of deploying a data warehouse

As previously stated, with a data warehouse, you ask and find answers to questions such as:

  • What’s the revenue?
  • Who’s buying?
  • What’s the profile of customers?
  • What pages did they visit on our website?
  • What caught their attention?
  • Which customers are buying which products?

With native language processing and other deep learning capabilities gaining popularity, you can even develop insights about the sentiment of prospects and potential customers as they journey towards your enterprise.

Benefits of data warehousing… in the cloud

Many data warehouses deployed today were developed during the 1980s and were built for on-premises data centers typical of the time. These solutions still exist, including availability of “cloud-washed” versions. Both options typically involve upfront licensing charges to buy and to maintain these legacy data warehouses. Yet, neither legacy data warehouses (0r current generation data lakes based on Hadoop) can elastically scale up, down, or suspend as needed to meet the continuously varying demands of today’s enterprises.

 

As result, these types of solutions require a lot attention on low-level infrastructure tasks that divert IT and data science teams from truly strategic analytics projects that advance the business.

With modern, cloud-built data warehouse technology now available, such as Snowflake, you can gather even more data from a multitude of data sources and instantly and elastically scale to support virtually unlimited users and workloads.

All of this is accomplished while ensuring the integrity and consistency of a single source of truth without a fight for computing resources. This includes a mix of data varieties, such as structured data and semi-structured data. As a modern cloud service, you can have any number of users query data easily, in a fully relational manner using familiar tools, all with better security, performance, data protection and ease-of-use that are built-in.

For these reasons, you can expect enterprises to turn to companies like Snowflake to help propel insights from your data in new directions and at new speeds, regardless the size of the business or industry in which you compete.

subscribe to the snowflake blog

What makes Snowflake a data warehouse?

One of the most common questions I get when speaking to people about Snowflake is: “Why do you call it a data warehouse and not a database?” This is a very reasonable question given some of the characteristics of Snowflake.

At Snowflake, in part, we say we are a full relational database management system (RDBMS) built for the cloud. We are ACID compliant and we support standard SQL. Sounds like a database to me, too. Let’s take a closer look just to be sure.

What is a database?

A database is a collection of information organized to be easily accessed, managed and updated. While there are many types of databases available today, the most common is an RDBMS. But when most folks say “database”, they usually mean a traditional RDBMS that handles Online Transaction Processing (OLTP).

So, what are some of the defining characteristics of an OLTP database?

  • Designed for rapid storage and retrieval of small sets of current data records in support of transactions and interactions within an enterprise.
  • Data is organized in tables and columns, allowing users access via structured query language (SQL).
  • Handles quick, real-time activity such as entering a customer name, recording a sale and recording all accounting activity of that sale.
  • Works well for basic operational reporting of a limited number of records. Analytic reporting is relegated to simple, static reports often driven by IT.

What is a data warehouse?

Some of the defining characteristics of a data warehouse are:

  • A database designed to store and process large volumes of current and historical data collected from multiple sources inside and outside the enterprise for deep analysis.
  • Organizes data into tables and columns, and allows users access via SQL.
  • Optimized for loading, integrating and analyzing very large amounts of data.
  • Designed to support descriptive, diagnostic, predictive and prescriptive analytic workloads.

Snowflake definitely includes the overlapping characteristics of both a database and a data warehouse-ACID compliant, support for standard SQL, etc. But Snowflake also embodies all of the defining characteristics of a data warehouse.

One of the key differentiators of Snowflake, from other solutions, is that it’s specifically designed for data warehousing and high speed analytic processing. Rather than a generalized SQL database that has been “tuned” or even adapted to handle these type of workloads, Snowflake was built from the ground up for the cloud to optimize loading, processing and query performance for very large volumes of data. Therefore, hands down, Snowflake is a data warehouse.

So, why do we still need a specialized data warehouse engine?

As OLTP databases have been able to scale higher and innovations like in-memory databases have emerged, some organizations have questioned whether they still need a separate technology or specialized system for reporting and analytics. The answer, again, requires us to look at the basics: What benefits emerge from storing and analyzing data in a separate system?

  1. It eases the burden of reporting from transactional systems by removing the contention for limited and expensive resources.
  2. It produces more business-friendly data results by allowing the data to be restructured to a more suitable format.
  3. It provides access to a wider array of reports more quickly because all the resources in the data warehouse are dedicated to reporting and analysis.
  4. It integrates valuable data from across the enterprise for richer insight. Something that can’t (and shouldn’t) be done in an OLTP system.

For more information on how you can up your data warehousing game with a modern, built-for-the-cloud approach, check out some of our free resources such as our ebook The Data Warehouse: The Engine That Drives Analytics. We would love to help you on your journey to the cloud so keep an eye on this blog or follow us on Twitter (@snowflakedb and @kentgraziano) to keep up with all the news and happenings here at Snowflake.