Snowflake builds a bigger, simpler data warehouse

Snowflake's mission: Add new, powerful enterprise features to a cloud-native data warehouse system without forcing the customer to turn knobs for the best performance

Cloud-based data warehousing system Snowflake has unveiled a slew of new features designed to make it more powerful for enterprises minus extra complexity.

Founded by Microsoft alumnus Bob Muglia, the service threw open its doors to the public last year with its Elastic Data Warehouse system. Its main appeal is for analytics users who want to work with loads of cloud-native data, but don't want the management and performance-tweaking hassles associated with such a task.

Size (and speed and convenience) matters

The new Snowflake features continue to push the idea that a high-performance system can be high performance by default -- that neither the user nor Snowflake needs to tweak anything to get the best possible results.

One such new feature is the multicluster warehouse function, where databases can be automatically scaled out across multiple clusters to satisfy incoming demand. If similar queries come in from multiple users, query data can be cached and reused between them to further accelerate performance. Data is also automatically sharded and partitioned to give even more of a speed boost.

Snowflake didn't touch on data protection when it first debuted. Now, aside from automatically syncing data across multiple availability zones, previous versions of data stored in Snowflake are automatically retained for a period of time set by the customer.

To retrieve that data, the customer doesn't restore and mount an earlier copy of a database. Instead, she uses a proprietary SQL syntax -- SELECT AS OF -- to access earlier versions of a given table. Other proprietary commands, CLONE and UNDELETE, allow earlier versions of data to be re-created in place or recovered.

Touch me not

Snowflake CEO Muglia pointed out in a phone call how the company's aggressive use of monitoring allowed it to accomplish this hands-off approach.

"We instrument all the user queries that we do," he said. "We don't see the user data, as that's all encrypted automatically. But we have forensic information about the queries they run." This, he said, allows the company to continually improve its algorithms and query planning mechanisms.

One possible downside to Snowflake's hands-off system is that its query processing and fulfillment mechanisms are a black box compared to other database platforms, cloud or not. That's doubly true given that the entire Snowflake platform is proprietary -- a custom-written engine developed in C++ and Java.

Muglia's view is that the instrumentation the company has into its own product can allow them to make changes to satisfy customer requests when they arise.

"It's not uncommon if a user comes to us and says, 'Hey, can you let us know what's happening on this query?'" said Muglia. "We can explain to them quickly what's going on, and we can make changes and improve things for them."

This in turn, he explained, allows Snowflake to iterate quickly and make changes based on the feedback.

Next read this:

Serdar Yegulalp is a senior writer at InfoWorld, focused on machine learning, containerization, devops, the Python ecosystem, and periodic reviews.