Snowflake
POWERS THE DATA CLOUD
Snowflake is a cloud-based data warehousing platform that is built on top of AWS and is a true SaaS offering.With its unique features, it soon became a leader in data management solutions for analytics.
Architecture
Snowflake architecture is built for the cloud. Its unique multi-cluster shared data architecture delivers the performance, concurrency, and elasticity that organizations require. It handles all aspects of authentication, resource management, optimization, data protection, configuration, availability, and more. Snowflake features compute, storage, and global service layers which are physically separated but logically integrated.Snowflake has central data repository for storage of structured and semi-structured data. These data can be accessed from all available compute nodes in the Snowflake platform. It uses virtual warehouse as compute environment for processing the queries. While processing queries, it utilizes multi-cluster, micro-partitioning and advanced cache concepts. Snowflake’s cloud services are responsible to provide end to end solution to the user like logging validation of user to result of select queries.
Database Storage
Snowflake supports Amazon S3, Azure and Google Cloud to load data into Snowflake using file system. User should upload a file (.csv, .txt, .xlsx etc.) into the cloud and after they create a connection in Snowflake to bring the data. Data size is unlimited, but file size is up to 5GB as per cloud services. Once data is loaded into Snowflake, it utilizes its internal optimization and compression techniques to store the data into central repository as columnar format. The central repository is based on cloud where data stores.
Query Processing
Query execution is a part of processing layer or compute layer. To process a query, Snowflake requires compute environment, known as “Virtual Warehouse” in Snowflake’s world. Virtual warehouse is a compute cluster. A virtual warehouse consists of CPU, Memory and temporary storage system so that it could perform SQL execution and DML (Data Manipulation Language) operations.
- SQL SELECT executions
- Updating of data using Update, Insert, Update
- Loading data into tables using COPY INTO <tables>
- Unloading data from tables using COPY INTO <locations>
Cloud Services
Cloud Service is the ‘Brain’ of the Snowflake. It coordinates and manages activities across Snowflake. It brings all components of Snowflake together to process user requests from logging validation to deliver query’s response.
The following services are managed at this layer
- It is the centralized management for all storage.
- It manages the compute environments to work with storage.
- It is responsible for upgrades, updates, patching and configuration of Snowflake at cloud.
- It performs cost-based optimizers on SQL queries.
- It gathers statistics automatically like credit used, storage capacity utilization
- Security like Authentication, Access controls based on roles and users
- It performs encryption as well as key management services.
- It stores metadata as data is loaded into the system.
Methods to load data to snowflake
Method 1: Using SQL Commands for Loading Data to Snowflake:You can bulk load large amounts of data using SQL commands in SnowSQL using the Snowflake CLI. Although many different formats can be used as input in this method, CSV Files are used most commonly.
Method 2: Using Snowpipe for Loading Data to Snowflake:You can also automate the bulk loading of data using Snowpipe in Snowflake. It uses the COPY command and is beneficial when you need to input files from external sources into Snowflake.
Method 3: Using the Web Interface for Loading Data to Snowflake:You can use the web interface to load a limited amount of data. It has an in-built LOAD button that can input data into Snowflake. This method works only for small-sized data.