How the Modern Data Warehouse Helps Ease the Burden of GDPR

While many activities driving an organization’s GDPR compliance are in the organization’s own hands, its IT vendors should help satisfy their customers’ compliance requirements. At a minimum, an organization’s SaaS vendors should satisfy the security requirements that wholly reside in their domain but impact their customer’s business and data security. Snowflake was built from the ground up and in ways that can ease the burden of complying with GDPR, especially for security – and customer-focused organizations.

Snowflake was designed from the beginning to handle an enormous amount of structured and semi-structured data with the ease of standard SQL. The accessibility and simplicity of SQL gives organizations the flexibility to seamlessly make any updates, changes or deletions required under GDPR. Snowflake’s support for semi-structured data can make it easier to adapt to new fields and other record changes. In addition, delivering industry-best security has been fundamental to the architecture, implementation and operation of Snowflake’s data warehouse-as-a-service since day one.

A core principle of GDPR

A major GDPR compliance factor is understanding what data an organization holds and to whom it relates. This requirement demands that data is structured, organized and easy to search.

Snowflake’s relational, SQL database architecture provides a significantly simplified structure and organization, ensuring that each record has a unique and easily identified location within the database. Snowflake customers can also combine relational storage with Snowflake’s Variant column type for semi-structured data. This approach extends the simplicity of the relational format to the schema flexibility of semi-structured data.

Snowflake is made even more powerful by its ability to support massive concurrency. With larger organizations, there may be dozens or even hundreds of concurrent data modifications, queries and searches occuring at any one time. Traditional data warehouses can’t scale beyond a single cluster of compute at any given time, leading to long queues and delayed compliance. Snowflake’s multi-cluster, shared data architecture solves this problem by enabling as many unique clusters of compute resources for any purpose, leading to more efficient workload isolation and query throughput. Anyone can store, organize, modify, search and query very large amounts of data with as many concurrent users or operations as necessary.

Data subject rights

Organizations affected by GDPR must ensure they can comply with data subject requests. Individuals now have significantly expanded rights for learning about what type of data an organization holds about them and the right to request accessing and/or correcting their data, having the data deleted, and/or porting the data to a new provider. When providing these services, organizations must respond fairly quickly, generally within 30 days. Therefore they must quickly search their business systems and data warehouse to locate all personal data related to an individual and take action.

Organizations can greatly benefit from storing all their data in a data warehouse-as-a-service with full DML and SQL capabilities. It eases the burden of searching various discrete business systems and data stores to locate the relevant data. This helps to ensure that individual records can be searched, deleted, restricted, updated, truncated, split and otherwise manipulated to align with data subject requests. It also makes it possible to move data to comply with a “right to portability” request. From the beginning, Snowflake was architected with ANSI-standard SQL and full DML to ensure these types of operations are possible.

Security

Unfortunately, many traditional data warehouses require security to be home-built and cobbled together with services outside of the core offering. What’s more, they may not even enable encryption out of the box.

As the modern data warehouse built for the cloud, Snowflake was built to ensure stringent security as a key feature of the service. Snowflake has key, built-in protections and security features, including:

  • Zero management – Snowflake reduces complexity with built-in performance, security and high availability so there’s no infrastructure to tweak, no knobs to turn and no tuning required.
  • Encryption everywhere – Snowflake automatically encrypts all data at rest and in transit.
  • Comprehensive protection – Security features include multi-factor authentication, role-based access control, IP address whitelisting, federated authentication and annual rekeying of encrypted data.
  • Tri-Secret Secure – Ensures customer control and data protection by combining a customer-provided encryption key along with a Snowflake-provided encryption key and user credentials.
  • Support for AWS Private LinkCustomers can transmit data between their virtual private network and Snowflake without accessing the Internet, making inter-network connectivity secure and easier to manage.
  • Stronger intra-company data demarcation through Snowflake Data Sharing – Leverage Snowflake’s data sharing features to share non-PII data with other teams in your organization who don’t need access by enforcing stronger security and GDPR controls.
  • Private deployment – Enterprises can get a dedicated and managed instance of Snowflake within a separate AWS Virtual Private Cloud (VPC).

Accountability

To add to the complexity, organizations must also ensure they and the organizations and tools they work with are able to demonstrate compliance. Snowflake aggressively audits and perfects its security practice on an ongoing basis, with regular penetration testing. The Snowflake data warehouse-as-a-service is SOC 2 Type II certified, PCI DSS compliant and supports HIPAA compliance and customers can audit data as it has been manipulated to comply with data subject requests.  

In addition to these out of the box capabilities and validations, Snowflake also provides customers with our Data Protection Addendum, which is tightly aligned with GDPR requirements. Snowflake also adheres to robust contractual security commitments to facilitate more efficient transactions and simplified due diligence.

Conclusion

Under the GDPR, companies must implement technical measures that will help them respond to the data protection and privacy needs of their customers. Snowflake provides not only the benefit of storing all critical customer data in a single location, it enables rapid location and retrieval of that data so businesses can take action.

 

subscribe to the snowflake blog

Snowflake Europe: What a Difference a Year Makes

Since Snowflake Computing first opened its doors in London a year ago, we’ve seen unprecedented growth. Starting from zero presence in Europe, we have onboarded 130 new customers in 9 countries, including Capital One and Deliveroo. In the first quarter of 2018 alone, we grew revenue nearly 1.5x compared to all of 2017. We’ve also opened offices in four additional European locations – Paris, Munich, Amsterdam and Stockholm.

This time last year, it was just four of us building the European arm of the business. We’ve now grown to 34 employees, spread across our five locations, developing an impressive portfolio of European customers and partners in the process.

In the past year, we’ve held 22 industry events in EMEA alone, including the European leg of our Cloud Analytics World Tour, helping us better connect with customers, partners and other industry leaders.

One of the key drivers of our success has been our ability to understand the value of data for businesses. Traditional, big data platform vendors are trying to adapt to the changing landscape of the cloud but retrofitting their infrastructure isn’t enough. Rather than patching and adding to old systems, we’ve created a data warehouse built for the cloud to handling any data analytics challenge our customers face.

But this is only the beginning of our story. The future is very promising for Snowflake. We’ll continue to deliver new innovations for our data warehouse-as-a-service to meet every aspect of our customers’ needs, especially with the challenges GDPR legislation brings. There’s already some impressive customers on the horizon that we’re excited to serve.

A big focus for us will be meeting the needs of the data-intensive financial services industry. Our product version dedicated to serving financial services better helps our customers navigate through the complex PSD2 and Open Banking regulations. Additionally, the retail sector will also be a significant focus for us as we help retailers of all sizes to capitalise on their vast data stores to better personalise their customers’ experiences.

Since our Silicon Valley inception in 2012, Snowflake has launched in eight countries, attracting more than 1000 customers globally. We also secured our most recent round of funding – US$263M in January 2018. With such unprecedented growth in just 12 months, we can’t wait to see where the next 12 months takes us!

subscribe to the snowflake blog

Why You Need a Cloud Data Warehouse

Are you new to the concepts of data warehousing? Do you want to know how your enterprise can benefit from using a data warehouse to streamline your business, better serve your customers and create new market opportunities? If so, this blog is for you. Let’s cover the basics.

It begins with production data

Day-to-day, nearly all enterprise-class companies process data as part of core business operations. Banks process debit and credit transactions for account holders. Brick-and-mortar and online retailers process in-store and website purchases, respectively. Insurance companies maintain and update customer profile and insurance policy information for policyholders.

The nature of these production systems is transactional and require databases that can capture, write, update or delete information at the pace of business operations. The systems behind these transactions are online transaction processing (OLTP) databases. For example, OLTP databases for investment firms must operate at lightning speed to keep up with high-volume stock and bond trading activity that occur in fractions of a second.

The need for a data warehouse solution

In addition to capturing transactions, another aspect of business operations is to understand what’s happening, or what has happened, based on the information captured with OLTP databases. By this, I mean companies must not only know how much revenue is coming in, they must know where revenue is coming from, the profile of customers making the purchases, business trends (up or down), the products and services being purchased and when those transactions are taking place. And, certainly businesses need to know what it will take for customers to remain loyal and buy more. Answers and insights to these questions are necessary to develop strategic business plans and develop new products that will keep businesses growing.

Why transactional (OLTP) systems are not optimized for data warehousing

Acquiring these insights requires accumulating, synthesizing and analyzing the influx of data from OLTP databases. The aggregation of all this data results in very large data sets for analytics. In contrast, when OLTP systems capture and update data, the amount of data transacted upon is actually very small. However, OLTP systems will execute thousands upon thousands of  small transactions at a time. This is what OLTP systems are optimized to do; however, OLTP systems are not optimized for the analysis of large to extremely large data sets.  

This is why data warehousing solutions emerged. Data warehouse solutions will hold a copy of data stored in OLTP databases. In addition, data warehouses also hold exponentially larger amounts of data accessed by enterprises, thanks to the enormous amount of Internet and cloud-born data. Ideally, data warehouses should be optimized to handle analytics on data sets of any size.  A typical data warehouse will have two primary components: One, a database (or a collection of databases) to store all of the data copied from the production system; and two, a query engine, which will enable a user, a program or an application to ask questions of the data and present an answer.

Benefits of deploying a data warehouse

As previously stated, with a data warehouse, you ask and find answers to questions such as:

  • What’s the revenue?
  • Who’s buying?
  • What’s the profile of customers?
  • What pages did they visit on our website?
  • What caught their attention?
  • Which customers are buying which products?

With native language processing and other deep learning capabilities gaining popularity, you can even develop insights about the sentiment of prospects and potential customers as they journey towards your enterprise.

Benefits of data warehousing… in the cloud

Many data warehouses deployed today were developed during the 1980s and were built for on-premises data centers typical of the time. These solutions still exist, including availability of “cloud-washed” versions. Both options typically involve upfront licensing charges to buy and to maintain these legacy data warehouses. Yet, neither legacy data warehouses (0r current generation data lakes based on Hadoop) can elastically scale up, down, or suspend as needed to meet the continuously varying demands of today’s enterprises.

 

As result, these types of solutions require a lot attention on low-level infrastructure tasks that divert IT and data science teams from truly strategic analytics projects that advance the business.

With modern, cloud-built data warehouse technology now available, such as Snowflake, you can gather even more data from a multitude of data sources and instantly and elastically scale to support virtually unlimited users and workloads.

All of this is accomplished while ensuring the integrity and consistency of a single source of truth without a fight for computing resources. This includes a mix of data varieties, such as structured data and semi-structured data. As a modern cloud service, you can have any number of users query data easily, in a fully relational manner using familiar tools, all with better security, performance, data protection and ease-of-use that are built-in.

For these reasons, you can expect enterprises to turn to companies like Snowflake to help propel insights from your data in new directions and at new speeds, regardless the size of the business or industry in which you compete.

subscribe to the snowflake blog

Top 9 Best Practices for Data Warehouse Development

When planning for a  modern cloud data warehousing development project, having some form or outline around understanding the business and IT needs and pain points will be key to the ultimate success of your venture. Being able to tell the right story will give the business the structure it needs to be successful in data warehousing efforts.

Here are 9 things you should know about staying current in data warehouse development, but won’t necessarily hear from your current IT staff and consultants.

1) Have a data model. Getting a common understanding of what information is important to the business will be vital to the success of the data warehouse. Sometimes the businesses themselves don’t know their own data needs or landscape. They will be using different words for the same data sets, the same words for different data sets, etc. Modeling the business’ information can be a real eye opener for all parties concerned.

2) Have a data flow diagram. Knowing where all the business’ data repositories are and how the data travels within the company in a diagram format allows everyone to determine the best steps for moving forward. You can’t get where you want to be if you don’t know where you are.

3) Build a source agnostic integration layer. The integration layers’ sole purpose is to pull together information from multiple sources. This is generally done to allow better business reporting. Unless the company has a custom application developed with a business-aligned data model on the back end, choosing a 3rd party source to align to defeats that purpose. Integration MUST align with the business model.

4) Adopt a recognized data warehouse architecture standard.(i.e. 3NF, star schema [dimensional], Data Vault). Regardless of the actual approach chosen, picking a standard and sticking with it will enable efficiency within a data warehouse development approach. Supporting a singular methodology for support and troubleshooting allows new staff to join the team and ramp-up faster.

5) Consider adopting an agile data warehouse methodology. Data warehouses no longer have to be large, monolithic, multi quarter / year efforts. With proper planning aligning to a single integration layer, data warehouse projects can be broken down into smaller, faster deliverable pieces that return value much more quickly. This also allows you to prioritize the warehouse as the business needs change.

6) Favor ELT over ETL. Moving corporate data, as is, to a single platform should be job #1. Then legacy systems can be bypassed and retired along the way, helping the business realize savings faster. Once data is colocated, it is much more efficient to let the power of a single cloud engine do integrations and transformations (i.e. fewer moving parts, push down optimizations, etc.).

7) Adopt a data warehouse automation tool. Automation allows you to leverage your IT resources more fully, iterate faster through projects and enforce coding standards (i.e. Wherescape, AnalytixDS, Ajilius, homespun, etc.) for easier support and ramp-up. 

8) Get your staff trained in modern approaches. Giving your team knowledge of the advantages of newer technologies and approaches lets your IT staff become more self-sufficient and effective. This will also open up more understanding and options in hiring and contracting with the best resources that the IT industry has to offer.

9) Pick a cloud-based data warehouse environment. For the least initial investment, the storage and compute elasticity coupled with the pay-as-you-go nature of cloud-based services provide the most flexible data warehousing solution on the market. 

 

Subscribe to the snowflake blog

Boost Your Analytics with Machine Learning and Advanced Data Preparation

Enterprises can now harness the power of Apache Spark to quickly and easily prepare data and build Machine Learning (ML) models directly against that data in Snowflake. Snowflake and Qubole make it easy to get started by embedding required drivers, securing credentials, simplifying connection setup and optimizing in-database processing. Customers can focus on getting started quickly with their data preparation and ML initiatives instead of worrying about complex integrations and the cost of moving large data sets.

Setting up the Snowflake connection and getting started takes only a few minutes. Customers first create a Snowflake data store in Qubole and enter details for their Snowflake data warehouse. All drivers and packages are preloaded and kept up to date, eliminating manual bootstrapping of jars into the Spark cluster. There is no further configuration or tuning required and there are no added costs for the integration. Once the connection is saved, customers can browse their snowflake tables, view metadata and see a preview of the Snowflake data all from the Qubole interface. They can then use Zeppelin notebooks to get started reading and writing data to Snowflake as they begin exploring data preparation and ML use cases.

Below is an example of the object browser view showing the available tables and properties:

Security is also handled seamlessly so customers can focus on getting started with their data, without the worry of over-protecting their credentials. Qubole provides centralized and secure credential management which eliminates the need to specify any credentials in plain text. Username and password are entered only when setting up the data store, but are otherwise inaccessible.

The solution is also designed for enterprise requirements and allows customers to use federated authentication and SSO via the embedded Snowflake drivers. With SSO enabled, customers can authenticate through an external, SAML 2.0-compliant identity provider (IdP) and achieve a higher level of security and simplicity. These capabilities help customers more easily share notebooks and collaborate on projects with little risk of sensitive information being exposed.

Below is a sample Scala program showing how to read from Snowflake using the data store object without specifying any credentials in plain text:

Beyond the simplicity and security of the integration, which helps customers get started quickly, customers will also benefit from a highly optimized Spark integration that uses Snowflake query pushdown to balance the query-processing power of Snowflake with the computational capabilities of Apache Spark. From simple projection and filter operations to advanced operations such as joins, aggregations and even scalar SQL functions, query pushdown runs these operations in Snowflake (where the data resides) to help refine and pre-filter the data before it is read into Spark. The traditional performance and cost challenges associated with moving large amounts data for processing are thereby eliminated without additional overhead or management.

With Snowflake and Qubole, customers get an optimized platform for advanced data preparation and ML that makes it simple to get started. Customers can complement their existing cloud data warehouse strategy or get more value out of ML initiatives by opening access to more data. To learn more about ML and advanced data preparation with Qubole, visit the Qubole blog.

Try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.

Accelerate Your Way to a Modern Cloud Analytics Platform

Many times, I see clients who have no current cloud footprint use the power of analytics to pull the rest of the organization toward the cloud. These new adopters leverage many powerful tools without making a massive, upfront investment. This can seriously accelerate their analytics abilities and their cloud maturity.

The depth of maturity on the curve determines the richness of your cloud analytics capabilities.

 

The Cloud and analytics maturity curve explained

The graph shown above illustrates a typical analytics maturity curve. As you can see, an analytics department that’s on the lower end of the maturity curve is typically relying upon “descriptive” analytics that can provide the business with information about the past. An analytics department that’s on the mid-range of the maturity curve can provide a  more “predictive” (insights-driven) view of the enterprise that offers a glimpse into what the future may look like for the business. An analytics department that’s on the mature end of the curve, however, is positioned to provide “prescriptive” analytics that offer deeper, actionable insights that enable a business to operate with greater agility and plan for the future with a higher degree of accuracy.

You can take the same concept of the analytics maturity curve and apply it directly to the cloud. For example, an enterprise with an immature cloud analytics platform can only source and provide descriptive analytics. An enterprise working with a cloud analytics platform in the mid-range of maturity, on the other hand, has typically migrated a critical mass of data but is only just starting to dive into all of the analytical capabilities the cloud can provide. The enterprise with a high degree of maturity on a cloud platform, however, is typically operating with proven storage and processing capabilities and is now starting to look towards cloud-native tools (e.g. Amazon Web Services Sagemaker, Azure Machine Learning, or Google’s Cloud AI) that can significantly accelerate an analytics platform. It’s only when a platform is mature enough that an organization can begin to fully leverage groundbreaking analytics concepts, such as Machine Learning (ML) and artificial intelligence (AI), to which every enterprise aspires.

Accelerate your journey along the maturity curve

For many organizations, the journey along both the analytics and cloud maturity curve can seem incredibly daunting, expensive and unfeasible. When a business finally reaches a point where they must start along the journey (or risk falling behind the competition) they generally experience trouble getting past the initial hurdle, which is resistance to change. Or, the business doesn’t want to provide any resources to help launch journey. It then becomes purely an a IT initiative, which often lacks the support to match the velocity needed.

So, how can an organization accelerate themselves along the cloud maturity curve and get to advanced analytics faster? By leveraging strong, cloud-native tools like Snowflake. What is essential, however, is a plan.

Too many times, a cloud analytics platform falls down because there is no concrete plan to constantly innovate. I’ve seen enterprises invest in a single solution, such as an on-premises data warehouse that’s been shifted to the cloud, because they think it will solve all their problems. They soon realize, however, that it’s not that simple. The problem with a “cloudified” version of a data warehouse solution is that it flies directly in the face of what a modern data architecture is. In fact, one of the key components of a modern data architecture is that it’s decoupled. That means you should be able to replace any one piece of your architecture, such as your ETL solution, with the newest, best-of-breed tool and do so with minimal impact to the rest of your architecture.

This is where a data warehouse built for the cloud like Snowflake becomes such a great accelerator. Snowflake’s multi-cluster, shared data architecture separates storage and compute, making it possible to scale up and down on-the-fly, without downtime or disruption. By providing a central storage repository that’s separate from your computing resources, only then can you move quickly through to the higher levels of the maturity curve. When you’re moving towards the prescriptive and predictive phases, Snowflake provides a foundation that allows you to build out a truly modern data architecture, leveraging all of the benefits of the cloud with tremendous benefits to the enterprise.

No matter where you are on the cloud maturity curve, data storage and warehousing will always be a central component of your architecture. By leveraging cloud-based solutions like Snowflake, you remove much of the developmental overhead that you would have with a solution that had originally been built to be deployed on-premises. And, by working with implementation partners like Slalom, you can get the most out of your cloud data warehouse, accelerate your journey to the top of the cloud maturity curve, then begin using truly revolutionary analytics, such as ML and AI, to which every enterprise aims.

Author Bio: James Anderson is a Solution Architect at Slalom, a Snowflake Implementation Partner headquartered in Seattle. James specializes in analytical data platforms built in the cloud, helping to enable enterprises get the most out of their data, and unlocking new and exciting technologies for them. James is based out of Slalom Boston, and is a frequent contributor to the Slalom technology blog.

Gartner Positions Snowflake as a Challenger in Magic Quadrant

The number of different metrics that enterprise SaaS providers must measure themselves against change and grow by the week. Most of those metrics are generated internally. To balance that mix, it’s crucial that SaaS providers undergo rigorous reviews from reputable and time-honored industry analysts.

For example, Gartner’s 2018 Magic Quadrant for Data Management Solutions for Analytics (DMSA) report* has named Snowflake as a Challenger. We feel this is a reflection of how revolutionary the built-for-the-cloud data warehouse is for our customers.

We believe Snowflake’s improved position on the Ability to Execute and Completeness of Vision axes of Gartner’s Magic Quadrant has been achieved by increasing our business and market presence, and delivering the technology and innovation our customers need.

It’s our assertion that this latest achievement for Snowflake further reveals that our cloud-built data warehouse continues to challenge the legacy providers in the data warehousing industry, and continues to serve the current and future needs of the data-driven enterprise.

Snowflake continues to achieve great things and our internal metrics reveal that. We at Snowflake are dedicated to serving our customers to the best of our abilities, which encompass our number one company value: Put the customer first.

In addition, Snowflake has recently received its largest amount of growth funding of $263 million, advancing our company valuation to $1.5 billion – another indicator from outside of Snowflake, from the venture capital community, that Snowflake continues to enable the data economy.

But there is much more work to do. Many barriers still exist for enterprises to access all of their data, and to share live, secure and governed data between themselves and their business partners. We’re keen to remove these barriers on a global scale with one of our latest features, Snowflake Data Sharing.

In addition, Snowflake Data Sharing enables enterprises to transform their data into a business asset. Snowflake customers can now monetize and easily share data, creating new market opportunities that were previously unforeseen. They can also benefit from data shared with them to enhance their products and services, lead their industries and streamline their operations.

Gartner’s DMSA report is a welcomed evaluation of Snowflake’s continued pursuit to innovative solutions that enterprises need in order to get all the insight from all their data. We are challenging the large, legacy data warehouse vendors but they are not our primary focus. Instead, we’re targeting the evolving needs of our customers, who must be the true winners in all of this.

*Gartner “Magic Quadrant for Data Management Analytics Solutions” by Adam Ronthal, Roxane Edjlali, Rick Greenwald, and Donald Feinberg. February 2, 2018.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

 

Dresner ADI Report: Snowflake Recommended by 100% of Customer Survey Respondents

Snowflake has ranked highly among the data warehouse vendors in the quadrants of the 2018 Analytical Data Infrastructure Market Study. Snowflake has achieved the highest or second highest marks across all of the metrics within the study’s two models: Customer Experience (product/technology, sales and service), and Vendor Credibility (product value and ethics).

The ADI study is based on a detailed survey completed by customers from each competing vendor. The customer responses formed the basis for how each vendor ranked a specific data warehouse across the two models.

The first is the Customer Experience Model. Within the survey, customers report their experience with the sales and service staff of their data warehouse vendor, along with their experience with their vendor’s technology. Snowflake had the best combined score across both of these attributes. Download the report to see our placement.

The Vendor Credibility Model illustrates the overall perception of the value customers receive from a product along with their confidence in their chosen data warehouse vendor. Vendors that rated highly on the vertical axis of this quadrant are perceived as highly trustworthy, while a placement to the right shows significant value being driven by the product within the customer organization. Snowflake also ranked high in this model. Download the report to see our placement.

Most importantly, 100 percent of the Snowflake customer respondents said they would recommend Snowflake to others. We’re thankful for the confidence our customers have placed in us. Snowflake is a values-driven company and our most important value is “Put Customers First”. It’s the value that drove us to create the only cloud-built data warehouse so customers could experience an infinitely scalable cloud data warehouse. More recently, it’s the value that led us to develop Instant Elasticity, lowering customer costs by up to 80% in some cases. We believe this report validates our approach and our focus on our customers. Download the 2018 Analytical Data Infrastructure Market Study to learn more.

Try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.

Data is Only Transformative with Transformative Technology

At the recent AWS re:Invent show in Las Vegas, The Cube host, Lisa Martin, had the chance to sit down with Bob Muglia, CEO and President of Snowflake. Bob shared his thoughts on Snowflake’s latest addition to its cloud-built data warehouse, Snowpipe, while looking back at Snowflake’s origins and ahead to its future in order to enable the data-driven enterprise.

What is Snowpipe, and how do customers get started with it?

Muglia: Snowpipe is a way of ingesting data into Snowflake in a streaming, continuous way. You simply drop new data that’s coming into S3 and we ingest it for you automatically. Snowpipe makes it simple to bring the data into your data warehouse on a continuous basis, ensuring that you’re always up-to-date and that your analysts are getting the latest insights and the latest data.

In the five years since you launched, how has the opportunity around cloud data warehousing changed? How has Snowflake evolved to become a leader in this space?

Muglia: If you go back five years, this was a timeframe where NoSQL was all the rage. Everybody was talking about how SQL was passé and something you’re not going to see in the future. Our founders had a different view. They had been working on true relational databases for almost 20 years, and they recognized the power of SQL and relational database technology. But they also saw that customers were experiencing significant limitations with existing technology. They saw in the cloud, and in what Amazon had done, the ability to build an all new database that takes advantage of the full elasticity and power of the cloud to deliver whatever analytics the business requires. However much data you want, however many queries you want to run simultaneously, Snowflake takes what you love about a relational database and allows you to operate in a very different way. Our founders had that vision five years ago and successfully executed on it. The product has worked beyond the dreams of our customers, and that response from our customers is what we get so excited about.

How did you identify what data should even be streamed to Snowpipe?

Muglia: As an example, in entertainment we’re experiencing a data explosion. You have streaming video data, subscription data, billing data, social media data and on and on. None of this is arriving in any sort of regular format. It’s coming as semi-structured data, like JSON or XML. Up until Snowflake came onto the scene with a truly cloud-based solution for data warehousing, everyone was struggling to wrangle all these data sets. Snowpipe lets you bring in multiple data sets, merge them in real-time and get the analytics back to your business in an agile way that’s never been seen before.

How does your partnership with AWS extend Snowflake’s capabilities?

Muglia: People don’t want their data scattered all over the place. With the cloud, with what Amazon’s done and with a product like Snowflake, you can bring all of your data together. That can change the culture of a company and the way people work. All of a sudden, data is not power. Data is available to everyone, and it’s democratized so every person can work with that data and help to bring the business forward. It can really change the dynamics around the way people work.

Tell us little bit about Snowflake’s collaboration with its customers. How are they helping to influence your future?

Muglia: As a company, we run Snowflake on Snowflake. All of our data is in Snowflake, all of our sales data, our financial data, our marketing data, our product support data and our engineering data. Every time a user runs a query, that query is logged in Snowflake and the intrinsics about it are logged. When you have a tool with the power of Snowflake, you can effectively answer any business question in just a matter of minutes. And that’s transformative to the way people work. And to me, that’s what it means to build a data-driven culture: The answers to business questions are inside what customers are doing and are encapsulated in the data.

Try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.

How Snowpipe Streamlines Your Continuous Data Loading and Your Business

For anyone who harbors a love-hate relationship with data loading, it’s time to tip the scales.

We all know data can be difficult to work with. The challenges start with the varying formats and complexity of the data itself. This is especially the case with semi-structured data such as JSON, Avro and XML, and it continues with the significant programming skills needed to extract and process data from multiple sources. Making matters worse, traditional on-premise and cloud data warehouses require batch loading of data (with limitations on the size of data files ingested) and huge manual efforts to run and manage servers.

The results? Poor, slow performance and the inability to extract immediate insights from all your data. Data scientists and analysts are forced to wait days or even weeks before they can use the data to develop accurate models, spot trends and identify opportunities. Consequently, executives don’t get the necessary up-to-minute insights to make real-time decisions with confidence and speed.

Common problems that affect data loading include:

  • Legacy architecture – Tightly coupled storage and compute necessitate contention with queries as data is loading.
  • Stale data – Batch loading prevents organizations from acquiring instant, data-driven insight.
  • Limited data – Lack of support for semi-structured data requires transforming newer data types and defining a schema before loading, which introduces delays.
  • Manageability – Dedicated clusters or warehouses are required to handle the loading of data.
  • High-maintenance – Traditional data warehouse tools result in unnecessary overhead in the form of constant indexing, tuning, sorting and vacuuming.

These obstacles all point to the need for a solution that allows continuous data loading without impacting other workloads, without requiring the management of servers and without crippling the performance of your data warehouse.

Introducing Snowpipe, our continuous, automated and cost-effective service that loads all of your data quickly and efficiently without any manual effort. How does Snowpipe work?

Snowpipe automatically listens for new data as it arrives in your cloud storage environment and continuously loads it into Snowflake. With Snowpipe’s unlimited concurrency, other workloads are never impacted , and you benefit from serverless, continuous loading without ever worrying about provisioning. That’s right. There are no servers to manage and no manual effort is required. Snowpipe makes all this happen automatically.

The direct benefits of Snowpipe’s continuous data loading include:

  • Instant insights – Immediately provide fresh data to all your business users without contention.
  • Cost-effectiveness – Pay only for the per-second compute utilized to load data rather than running a warehouse continuously or by the hour. 
  • Ease-of-use – Point Snowpipe at an S3 bucket from within the Snowflake UI and data will automatically load asynchronously as it arrives.
  • Flexibility – Technical resources can interface directly with the programmatic REST API, using Java and Python SDKs to enable highly customized loading use cases.
  • Zero management – Snowpipe automatically provisions the correct capacity for the data being loaded. No servers or management to worry about.

Snowpipe frees up resources across your organization so you can focus on analyzing your data, not managing it. Snowpipe puts your data on pace with near real-time analytics. At Snowflake, we tip the scales on your love-hate relationship with data so you can cherish your data without reservation.

Read more about the technical aspects of Snowpipe on our engineering blog. For an in-depth look at Snowpipe in action, you can also join us for a live webinar on December 14th.

Try Snowflake for free. Sign up and receive $400 US dollars worth of free usage. You can create a sandbox or launch a production implementation from the same Snowflake environment.