When data is spread across S3, Azure Blob, and Google Cloud Storage, analytics teams face several issues. These span inconsistent formats, duplicated
The illustration below visualizes this shift as data scattered across silos moves through ingestion, processing, and analysis. It is ultimately centralized into a unified data platform with Snowflake.
In this newsletter, we will cover:
Snowflake’s three-layer architecture and how separating storage, compute, and services improves elasticity and consistency.
Multi-cloud deployment patterns that enable secure data sharing and disaster recovery.
Snowpark, Unistore, and the shift from analytics to broader application workloads.
Architecture lessons for designing reliable and portable multi-cloud data platforms.
This unification is made possible by a unique architecture that decouples a data system’s core functions. We will break down its layers to understand how they work together.
Snowflake’s performance and scalability come from its hybrid design, which separates storage, compute, and services into independent, scalable layers. This design moves away from the past’s tightly coupled shared-disk or shared-nothing architectures and offers greater flexibility. Traditional data warehouses often combined compute and storage, meaning you had to scale both, even if you only needed more processing power. Snowflake’s model breaks this constraint, allowing you to scale each resource independently.
To understand how this works in practice, let’s look at the three core layers of Snowflake’s architecture.
At the foundation is a storage layer that uses the object storage of the underlying cloud (S3, Azure Blob Storage, or GCS). Data is automatically converted into compressed, columnar
For example, suppose that you load a table of 100 million sales transactions. Snowflake automatically breaks it into thousands of order_date = Jan 1–Jan 7, 2023). If you run a query filtering for WHERE order_date = ‘2023-01-05’, the query engine scans only the relevant partitions instead of the entire dataset.
Note: Building on cloud object storage gives Snowflake virtually unlimited scalability, durability, and cost-effectiveness. It abstracts the underlying complexity, providing a simple, SQL-based interface for data physically stored in S3, Azure Blob Storage, or GCS.
This storage layer also powers two critical features, as mentioned below:
Zero-copy cloning: Create instant clones of tables, schemas, or entire databases without duplicating data. A clone is essentially a new set of metadata pointers referencing existing micro-partitions; only changes result in creating new ones. This makes it easy to spin up development or test environments without extra storage costs.
Time travel: Preserve historical versions of data. Standard time travel offers a 1-day retention, while extended time travel with Snowflake Enterprise Edition and above allows up to 90 days. You can query past snapshots, recover dropped tables with UNDROP, or roll back accidental changes using the immutable micro-partitions.
The slides below show how zero-copy cloning works in Snowflake:
This storage architecture provides scalability and enables powerful features for data management and recovery.
The compute layer is where queries are executed, powered by independent
Different teams can run workloads of varying sizes against the same data simultaneously. For example, a data science team might spin up a large warehouse for machine learning while the BI team uses a smaller one for dashboards, without either impacting the other.
These warehouses are stateless. They hold no persistent data and can be started, stopped, resized, or scaled on demand. When concurrency spikes, Snowflake can automatically add more clusters to a warehouse and scale them back down when demand subsides. This elasticity ensures that you only pay for the compute resources that you need.
The diagram below illustrates this model. A query scheduler distributes requests across multiple compute clusters inside a virtual warehouse, allowing workloads to scale seamlessly without bottlenecks.
To make this possible at scale, Snowflake relies on a services layer that manages query optimization, metadata, transactions, and security across the platform.
The cloud services layer acts as the regional control plane for Snowflake. It coordinates and manages all operations within a given region, tying together storage, compute, and metadata services to process user requests from login to query dispatch. These services run on compute instances provisioned by Snowflake from the underlying cloud provider.
Its core responsibilities are as mentioned below.
Authentication, security, and access control: All security functions are managed within this framework. This includes user authentication,
Query optimization and management: When you submit a query, the services layer parses it, optimizes it using the metadata from the storage layer, and generates an efficient execution plan. It then dispatches the compiled plan to the appropriate virtual warehouse for execution.
Infrastructure and transaction management: It ensures
Metadata management: Maintains a global catalog of all objects and micro-partitions. This centralized store enables advanced features such as zero-copy cloning and time travel.
The following diagram provides a high-level view of how these three layers interact to form Snowflake’s unique architecture.
Note: Snowgrid is Snowflake’s cloud-agnostic layer that connects the services layer across AWS, Azure, and GCP. It enables unified governance, replication, and secure data sharing across regions, creating a single global data cloud. The underlying cloud services layer still manages security, metadata, and queries within each region.
Technical quiz!
Snowflake’s decoupled architecture separates storage, compute, and services. From a System Design perspective, what trade-off does this decoupling introduce?
Increased latency in metadata access due to an extra coordination layer.
Reduced elasticity in scaling compute clusters.
Loss of fault isolation between workloads.
Inability to perform near-real-time analytics.
This architectural foundation solves problems of scalability and concurrency. It also directly enables Snowflake’s powerful multi-cloud capabilities. Next, we will explore how this design pattern extends across different public clouds.
Snowflake’s design philosophy naturally extends into a multi-cloud model. The same separation of storage, compute, and services that enables scalability within one cloud, also allows the platform to operate consistently across many. Each deployment integrates closely with the built-in infrastructure of providers such as AWS, Azure, and GCP, while remaining abstracted from users through the cloud services layer.
This multi-cloud deployment strategy provides organizations with significant flexibility. You can deploy Snowflake in the same cloud region as your existing applications to minimize data latency, or you can choose a different provider to avoid vendor lock-in or meet specific regulatory requirements.
Educative byte: The consistency of Snowflake’s platform across clouds simplifies a multi-cloud data strategy immensely. You can use the same SQL, tools, and governance policies for your data on AWS as you do for your data on Azure. This eliminates the need to manage disparate systems.
Building on this multi-cloud foundation, Snowflake enables powerful cross-cloud features that were previously difficult, if not impossible, to implement.
On this foundation, Snowflake enables two powerful cross-cloud capabilities.
Secure data sharing: Providers can grant consumers live, read-only access to their data without moving or copying it. The consumer queries the data as if it were local, but the underlying micro-partitions stay with the provider.
The cloud services layer manages this process securely, ensuring that access is governed and revocable. As Snowflake operates a globally connected platform, sharing can span both regions and clouds. For example, an account on AWS us-east-1 can share data directly with a partner on Azure West Europe without ETL pipelines or data transfer.
Cross-cloud replication and failover: The same architecture also supports asynchronous replication of databases across regions or providers. You can configure one or more replicas of a primary account to ensure high availability and disaster recovery.
For instance, a primary deployment in AWS us-west-2 could replicate to Azure East US. If AWS experiences an outage, workloads can quickly fail over to Azure, maintaining business continuity.
To visualize this, the image below depicts a common cross-cloud data sharing pattern; this is where a provider first replicates its database to the consumer’s region before securely sharing it.
Snowflake abstracts cloud boundaries through Snowgrid, but what architectural risks remain when scaling across providers?
While this architecture has redefined the data warehouse, Snowflake’s ambitions extend further. The platform is evolving to support a wider range of workloads, effectively transforming it into a comprehensive data platform.
Snowflake’s decoupled architecture enables new features that push the platform beyond the confines of traditional data warehousing. By leveraging the scalable storage and elastic compute layers, Snowflake is building capabilities to handle a much broader set of data workloads, including data engineering, machine learning, and application development.
This evolution is driven by the core idea of bringing computation to the data, rather than moving data to the computation. Moving large datasets for processing is expensive, slow, and creates security risks. Snowflake is evolving into a unified environment by enabling developers to run complex, non-SQL code directly within the platform.
The diagram below illustrates how multi-language processing and data consolidation move from disjointed pipelines to a single integrated architecture:
Several key initiatives, underlined below, are driving this transformation.
Snowpark: Extends Snowflake to languages like Python, Java, and
Unistore: Introduces hybrid tables that support both
Native app framework: Allows developers to build and distribute applications that run inside customer Snowflake accounts. This creates a marketplace for secure, data-native solutions ranging from analytics to governance tools.
Note:
This evolution sets the stage for broader architectural takeaways that can be leveraged beyond Snowflake itself.
Snowflake’s architecture is a strong and efficient model for building scalable, resilient, and flexible data systems in a multi-cloud environment. By decoupling storage and compute and coordinating them through a services layer, it avoids the constraints of legacy warehouse designs that forced everything to scale together.
The main takeaway is the power of abstraction and the separation of concerns. Each layer can scale independently, improving performance and isolating workloads. Multi-cloud and cross-cloud capabilities emerge naturally from this structure. Features such as secure data sharing work because a global metadata layer governs access without physically moving data.
For anyone designing modern data platforms, Snowflake demonstrates how scalability, concurrency, and resilience follow when these architectural principles are in place from the start.
To deepen your understanding of System Design and data-intensive platforms, explore the following resources: