Modern Data Sharing: The Opportunities Are Endless

Data sharing between organizations for commercial purposes has been around for over 100 years. But until very recently, enterprises have been forced to rely on traditional data sharing methods that are labor-intensive, costly and error-prone. These methods are also more open to hackers and produce stale data. Snowflake Data Sharing, one of the newest innovations to Snowflake’s cloud-built data warehouse, has eliminated those barriers and enabled enterprises to easily share live data in real time via one-to-one, one-to-many and many-to-many relationships. Best of all, the shared data between data providers and data consumers doesn’t move.

Below is an example of how Snowflake Data Sharing reduced the time to create a live, secure data share to a fraction of the time and cost of a standard method. Most interestingly, the application of Snowflake Data Sharing in this instance reveals that the solutions addressed by modern data sharing are endless.

The Challenge

The Federal Risk and Authorization Management Program (FedRAMP), “is a government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services.” Complying with the program’s approximately 670 security requirements and collecting supporting evidence is a significant challenge. But having to do so monthly, as required, is a Sisyphean task if you attempt it manually. Cloud providers must complete an inventory of all of their FedRAMP assets, which include, binaries, running network services, asset types and more. Automation is really the only logical approach in solving this FedRAMP inventory conundrum.

The Run-of-the-mill Method

Often, people gravitate to what’s familiar so it’s no surprise we initially considered a solution that comprised the combination of an AWS tool, an IT automation tool and some Python to clean up the data. However, we estimated a significant effort to develop, test, and deploy the code, which is separate from the required, ongoing maintenance. Instead, we took a different approach.

The Solution

Snowflake’s Data Sharing technology is a fast, simple and powerful solution that allows us to maintain our security governance posture while sharing live data in real time without having to move the data. With modern data sharing, there is the concept of a data provider and a data consumer. For this project we engaged Lacework, a Snowflake security partner, as our initial data provider.

Lacework provides us with advanced threat detection by running their agents on our servers to help capture relevant activities, organize events logically, baseline behaviors, and identify deviations. In doing so, Lacework looks for file modifications, running processes, installed software packages, network sockets, and monitors for suspicious activities in one’s AWS account. All of that data is then analyzed and stored in their Snowflake account. Lacework is both a Snowflake vendor and a Snowflake data warehouse-as-a-service customer. They use Snowflake to provide their security services to Snowflake. Basically, Lacework already collects the data required to complete the FedRAMP system inventory collection task.

We contacted Lacework and presented them with our FedRAMP challenge and suggested leveraging data sharing between their Snowflake account and our Security Snowflake account. Within a couple of hours, they provided us with live FedRAMP data through Snowflake Data Sharing. Yes, you read that right. It only took a couple of hours. The following describes the steps for creating, sharing, and consuming the data:                                                                                                                               

Data Provider (Lacework) steps

  1. Create a share and give it a name
  2. Grant privilege to the database and its objects (schemas, tables, views)
  3. Alter share to add other Snowflake accounts to the share

Data Consumer (Snowflake Security) steps

  1. Create Database from the share
  2. Perform some SQL commands and voila! We have our FedRAMP System inventory data (we redacted the results in this image for security reasons)

 

 

 

Again, the whole data sharing effort took just a few hours.

Beyond FedRAMP

You are probably asking yourself at this point, “why didn’t you just ask Lacework to generate a FedRAMP report instead doing data sharing?” I would wholeheartedly agree with you if we were dealing with a conventional data warehouse built with a 90’s philosophy of share nothing. But Snowflake is the farthest thing from a conventional data warehouse and data sharing is nothing short of transformational. How so?

In addition to consuming data from Lacework, we also consume data from other data providers that share application logs, JIRA cases and more. We combine these data sources to automatically track software packages to determine if they are approved or not. Before data sharing, this activity was a time-consuming, manual process. Now, the team is free to focus on more critical security activities since data sharing has helped with FedRAMP and has improved our overall security posture.

Conclusion

As I wrote this blog, I watched my kids enthusiastically playing with their new Lego set. It’s remarkable how simple blocks can form such complex structures. Modern data sharing displays similar extrinsic properties because it offers data consumers and data providers with infinite ways of solving challenges and thus creating boundless business opportunities.

Learn more about Lacework and Fedramp.

 

Let History Halve Your Next Data Analytics Purchase

If you’re lucky, you’ll spend just six to 12 months considering and buying your next enterprise software solution. If it’s a seven-figure purchase, plan for an additional six to 12 months to confirm your organization has made the best investment possible.

During that process, dozens of your IT and business leaders will engage your shortlist of on-premises and software-as-a-service (SaaS) vendors to compare these competing technologies based on architecture, features, performance, business benefits and cost of ownership. Buying a data warehouse is no different.

But what if you could shorten that process? What if you had fact-based information arranged in a non-traditional but highly effective method to help you confidently narrow your search and quicken your time to market (TTM) with your chosen solution? What would that be worth to your organization and your peace of mind?

All enterprise software decisions include alternatives that span the decades. Meaning, your organization is likely to consider upgrading an existing technology you already own, and consider solutions that represent today’s latest SaaS offerings.

How so? One of your alternatives might be to upgrade your existing, on-premises data warehouse your organization purchased 10 to 15 years ago. A solution that wasn’t much different when it first emerged in the 1990s, and hasn’t advanced much since first deployed in your data center.

You may also own or are considering an on-premises NoSQL solution such as Hadoop. This technology emerged just over a decade ago, challenging the very existence of the legacy data warehouse in order to accommodate the exponential increase in the volume, variety and velocity of existing and new data types.

Since the advent of Hadoop, many traditional data warehouse vendors now offer a cloud version of their on-premises solution. With all this said, there are now many more solution categories, and many new vendors, that you must consider for your next data warehouse purchase.

Herein lies the rub. Nearly all data warehouse purchase decisions, and all enterprise software decisions for that matter, take the form of a side-by-side-by-side, laundry-list comparison. That’s a significant amount of ground to cover, especially for data warehousing, which is four decades old. Your review of competing alternatives becomes even more protracted when the architectures and features of traditional and more modern products don’t align, which they never do.

Instead, think linear. Think like a historian. Many of the solutions you’ll consider are a response to the drawbacks, and benefits, of preceding, competing technologies. For example, at Snowflake, many of our customers had previously used a legacy data warehouse for years and then added a Hadoop solution to expand their data analytics platform. Unfortunately, that combination of technologies did not meet their ever-increasing requirements.

So, before you kick off a necessary, side-by-side comparison, consider your initial group of alternative technologies as building blocks, from bottom to top, from the oldest to the most recent. Then eliminate from contention those that do not add anything new or innovative. This approach enables you to focus on more recent technologies that truly deliver better value, and will enable you to continue to innovate well into the future. In the end, you’ll get to a shorter shortlist, and quickly, thus speeding your decision-making process and TTM by up 50 percent.

At Snowflake, we’ve done the hard work for you. We invite you to read this short but revealing ebook that details the benefits and drawbacks of each succeeding data analytics technology – from the birth of the legacy data warehouse, all the way to today’s modern, built-for-the-cloud data warehouse. We’re confident it will provide the insight you need to quicken your next data warehouse purchase by rapidly reviewing every technology that got us to where we are today.

Financial Services: Welcome to Virtual Private Snowflake

Correct, consistent data is the lifeblood of the financial services industry. If your data is correct and consistent, it’s valuable. If it’s wrong or inconsistent, it’s useless and may be dangerous to your organization.

I saw this firsthand during the financial meltdown of 2007/08. At that time, I had been working in the industry for nearly 20 years as a platform architect. Financial services companies needed that “single source of truth” more than ever. To remain viable, we needed to consolidate siloed data sets before we could calculate risk exposure. Most financial services companies were on the brink of collapse. Those that survived did so because they had access to the right data.

At my employer, we looked for a way to achieve this single source with in-house resources, but my team and I quickly realized it would be an extraordinary challenge. Multiple data marts were sprawled across the entire enterprise, and multiple sets of the same data existed in different places, so the numbers didn’t add up. In a global financial services company, even a one percent difference can represent billions of dollars and major risk.

We ultimately built an analytics platform powered by a data warehouse. It was a huge a success. It was so successful that everybody wanted to use it for wide-ranging production use cases. However, it couldn’t keep up with demand, and no amount of additional investment would solve that problem.

That’s when I began my quest to find a platform that could provide universal access, true data consistency and unlimited concurrency. And for financial services, it had to be more secure than anything enterprises were already using. I knew the cloud could address most of these needs. However, even with the right leap forward in technical innovation, would the industry accept it as secure? Then I found Snowflake. But my story doesn’t end there.

I knew Snowflake, the company, was onto something. So, I left financial services to join Snowflake and lead its product team. Snowflake represents a cloud-first approach to data warehousing, with a level of security and unlimited concurrency that financial services companies demand.

We’ve since taken that a step further with Virtual Private Snowflake (VPS) – our most secure version of Snowflake. VPS gives each customer a dedicated and managed instance of Snowflake within a separate Amazon Web Services (AWS) Virtual Private Cloud (VPC). In addition, customers get our existing, best-in-class Snowflake security features including end-to-end encryption, at rest and in-transit. VPS also includes Tri-Secret Secure, which combines a customer-provided encryption key, a Snowflake-provided encryption key and user credentials. Together, these features thwart an attempted data decryption attack by instantly rendering data unreadable. Tri-Secret Secure also includes user credentials to authenticate approved users.

VPS is more secure than any on-premises solution and provides unlimited access to a single source of data without degrading performance. This means financial services companies don’t have to look at the cloud as a compromise between security and performance.

To find out more, read our VPS white paper and solution brief: Snowflake for Sensitive Data.

 

Rethink What You Know About Creating a Data Lake for JSON

Over the last 10 years, the notion has been that to quickly and cost-effectively gain insights from a variety of data sources, you need a Hadoop platform. Sources of data could be weblogs, clickstreams, events, IoT and other machine-born JSON or semi-structured data. The proposition with Hadoop-based data processing is having a single repository (a data lake) with the flexibility, capacity and performance to store and analyze an array of data types.

It shouldn’t be complicated

In reality, analyzing data with an Hadoop-based platform is not simple. Hadoop platforms start you with an HDFS file system, or equivalent. You then must piece together about a half-dozen software packages (minimum) just to provide basic enterprise-level functionality. Functionality such as provisioning, security, system management, data protection, database management and the necessary interface to explore and query data.

Despite the efforts of open-source communities to provide tools to improve the capabilities of Hadoop platforms to operate at the highest enterprise-class level, there is the constant need for highly skilled resources. Skilled resources to continually support Hadoop to keep it up and running, while enabling users to do more than just explore data. This all adds up to unnecessary complexity.

A much simpler proposition

Snowflake, which is built for the cloud and delivered as a service, provides you with a different option for handling JSON and semi-structured data. Just point your data pipelines to Snowflake, land the data in our elastic storage repository and you have instant access to a bottomless data lake. You also have access to a full-fledged data warehouse. With Snowflake, you can easily load JSON and query the data with relational, robust SQL. You can mix JSON with traditional structured data and data from other sources, all from within the same database. Moreover, you can also support endless concurrent analytic workloads and work groups against the JSON data. Whether it is one workload or 1,000 workloads, Snowflake can handle it all with ease.

As a combined data lake and data warehouse platform, Snowflake allows you do much more. Read more about it with our new eBook, Beyond Hadoop: Modern Cloud Data Warehousing.

Try Snowflake for free. Sign up and receive $400 dollars of free usage credits. You can create a sandbox or launch a production implementation from the same Snowflake account.

Rethink what you’ve been told

In order to gain insights from JSON or other machine data, Hadoop is not a prerequisite

When you need to store, warehouse and analyze JSON and other machine data, rethink what you’ve been told. Snowflake, easily, allows you to develop insights or uncover relationships that can drive business forward. You can support all of your structured and semi-structured data warehousing and analytic workload needs with a single tool. A single tool that is built for the cloud and is ACID-compliant. Unlike the special skills often needed to operate an Hadoop platform, Snowflake is a fully relational SQL environment that utilizes the familiar semantics and commands that are known to millions of SQL users and programmers, and thousands of SQL tools.

Be sure to keep an eye on this blog or follow us on Twitter (@snowflakedb and @miclnixon1) for all the news and happenings here at Snowflake.

 

 

 

 

3 Ways a Data Sharehouse™ Can Catapult Your Data-Driven Initiatives

How would your organization benefit from effortlessly sharing limitless amounts of data with business partners and commercializing that data to share with other organizations?

With most data sharing methods today, the best you can do is imagine these benefits because it’s cumbersome, time consuming and costly to share even small slices of data. If you share data using an FTP approach, you will spend time deconstructing, scripting, securing and governing the data, and your data consumers will spend time reconstructing, scripting and rebuilding the data. Sharing data via email can be very slow and insecure. Email is not even a practical option for large data sizes and is exponentially more difficult if sharing a large database.

Sharing data using a basic cloud storage service is equally inefficient, including the lack of ability for you or for your data consumers to query the data directly. And if you want to enable “direct” data queries, without loading, you will likely evaluate a traditional cloud data warehouse or Hadoop platform that is “integrated” with a data lake storage platform. Do you think this will be simple? Not so fast! You’ll have to factor in the need to manage a separate data catalogue and external tables to share data, or you’ll have to contend with data inconsistency and performance issues.

These limitations all add up to mean that traditional options to share data are more difficult to take data sharing to the next level of capabilities, which goes beyond finding insights to commercializing your data. If any of this feels familiar, consider these three reasons why Snowflake Data Sharing – the technology making it possible to create a Data Sharehouse™ – is so compelling.

1 – Drive simplicity: Allocate more of your time and resources to strategic data sharing projects

Traditional options to share data create unnecessary data processing and platform complexity. Complexity adds burdens and requires a lot of extra time and resources, including infrastructure costs, for you and for your data consumers. 

Snowflake Data Sharing, on the other hand, is an all-in-one, data warehouse-as-service solution that makes sharing data simple and easy. The vision for the Snowflake architecture, from day one, featured separation of compute, storage, and services to allow unlimited scaling, concurrency and data sharing.  

 

 

For example, with Snowflake, it is not necessary to manage an external data catalogue and not required to have a separate effort to build data security. It’s all built-in. In addition, Snowflake metadata management makes it possible to keep tabs and track all data warehouse activities. Thus, to share data, follow these steps:

  • Point your pipeline to land your data into Snowflake and set up your warehouse.
  • To share any portions of your data warehouse, use Snowflake’s included SQL client to CREATE a new database shell that could contain the entire database or any portions of it. CREATE a reference list for the objects you wish to share.
  • Issue GRANT statements that will enable access to the shared-database and any objects referenced within “the share”.

These are all the steps you will need to quickly and easily share data. All of the related metadata activity, including cataloguing, happens automatically within the Snowflake data warehouse service. No additional infrastructure or separate cataloguing effort required.

2 – Enable more valuable insights: Your data consumers will always operate with live and current data

Most options to share data require you to unload the data from your data warehouse in order to transmit or email to your data consumers. Or if using a cloud data warehouse platform to avoid this, it relies on a physically separate storage pool in order to scale and share data. The downside with either option is that the data set is read-only and disconnected from any updates that occur within the data warehouse. If you modify the data set, or if the data set is regularly updated, you must reload (ETL) it into the data warehouse, operate on it, and then unload it again to retransmit or place it in the shared storage pool.

For data consumers that received or were connected to the old data set, this means there is a period of time during which they’ll be exposed to stale and inconsistent data. Fresh data won’t be available until a new transmission is made or until a new connection is made to the fresh data. No one wants to run analytics on stale data.

Snowflake Data sharing delivers a better choice. Because data sets shared with data consumers within the Snowflake environment are live, in real time, and from the data warehouse, data consumers will immediately see fresh data as soon as an update or transaction is committed and successfully executed. Data consistency is maintained, without any extra effort from you. Your data consumers will not have struggle with decisions to run analytics now or wait for a new update.

3 – Support any scale, with ease: Seamlessly and cost-effectively share data with any number of data consumers

At the end of the day, it’s all about growing and expanding your business and services while providing excellent experiences for the customers and consumers of your data. If you anticipate sharing data with tens, hundreds, or thousands of data consumers, each of which with unique data sharing requirements, how can you easily support this? And support the growth without manually building more clusters, managing external tables or metadata stores, suffering through performance penalties or creating data inconsistencies? It would be very difficult or impossible with other architecture approaches to accomplish these objectives.

Snowflake data sharing allows you to scale and easily add data consumers and specify granular secure views, all at the highest performance profile and with data consistency. On the other end, with Snowflake, your data consumers can immediately use and query the shared data, also at the highest performance profile.

Summary: Share and imagine more

These are just a few compelling examples of how data Snowflake Data Sharing, and the Data Sharehouse approach, can transform the creation of high-value business assets from data. It’s a new, exciting, powerful and easy-to-use feature of all Snowflake data warehouse deployments.

Also check out data sharing blogs from Snowflake customers Snagajob and Playfab. To learn more, you can easily signup for Snowflake here. Or jump to our webpage to review our data sharing ebook, white paper, demo, and other resources. In future posts, we’ll cover more details about the technical capabilities and business-value advantages of Snowflake Data Sharing – built for the cloud. Be sure to follow us on Twitter (@snowflakedb and @miclnixon1) for all the news and happenings here at Snowflake. Stay tuned!