Snowflake is a fully relational SQL data warehouse. It is built for the cloud on Amazon AWS and is all new. Snowflake provides complete relational database support for both structured and semi-structured data (JSON, AVRO, XML) and implements comprehensive support for the SQL language. It requires no administration and is delivered as a turn-key cloud service. Snowflake provides broad support for ETL and BI tools and enables developers to build modern data applications. It is secure by design.
Beyond these attributes, what makes Snowflake different from a traditional legacy data warehouse, Hadoop system, or other cloud database?
In one word, architecture.
Snowflake has introduced a patent-pending, multi-cluster, shared data architecture which was born and built for the cloud to revolutionize data analysis.
Built on cloud-native S3 storage, Snowflake utilizes micro-partitions to securely and efficiently store customer data. When loaded into Snowflake, data is automatically split into modest-sized micro-partitions and metadata is extracted to enable efficient query processing. The micro-partitions are then columnar compressed and fully encrypted using a secure key hierarchy.
All data processing within Snowflake is performed by virtual warehouses. A virtual warehouse is one or more clusters of compute nodes. When performing a query, the virtual warehouse retrieves the minimum data required from the storage micro-partitions to satisfy the query. As data is retrieved, it is cached locally to improve the performance of future queries. The Compute layer is designed to process enormous quantities of data with maximum speed and efficiency.
Completely unique to Snowflake, multiple virtual warehouses can simultaneously operate on the same data at the same time while fully enforcing global system-wide transactional integrity. Read operations (select) always see a consistent view of the data and write operations never block readers. Transactional integrity across virtual warehouses is achieved by maintaining all transaction states within the services layer.
The ability to simultaneously operate on the same data across multiple virtual warehouses enables Snowflake to achieve effectively unlimited scale and concurrency.
In addition to fully separating Storage and Compute, Snowflake utilizes a Service layer to authenticate user sessions, provide management and security functions, perform query compilation and optimization, and coordinate all transactions. The Service layer consists of a set of stateless nodes running across multiple AWS availability zones and utilizes a highly available, distributed metadata store for global state management.
If the Compute layer is the brawn of Snowflake, then the Service layer is the brain. It provides all security and encryption key management and enables all DDL functions. Queries are compiled within the Service layer and metadata is used to determine the micro-partitions columns that need to be scanned. All operational state is maintained within the services layer, which performs transaction coordination across all virtual warehouses.
(Snowflake) just works, getting us answers an order of magnitude faster, without manual tuning and management. As a result we can do 100 times more queries per day, helping us give our clients richer analysis.