First and foremost, this blog deals with the basic nuances of Snowflake software. This blog will provide you with a cakewalk to learn how external applications can connect with Snowflake and work alongside and load data into and unload from Snowflake tables. You can also know how to use Snowflake and securely manage it in detail. Apart from the mentioned, a lot of information awaits you in this blog. Let’s see them in detail.
Snowflake is the SaaS-based data warehouse cloud platform. It provides advanced, faster, and easier data processing, storing, and analytic solutions. And it is built up of a new SQL query engine designed for cloud applications with an innovative architecture. On top of all, you don't need to install, configure and manage any hardware or software because all these activities can be managed by Snowflake itself.
Now, we shall be looking into right from the features of Snowflake architecture up to securely managing Snowflake software in this Snowflake documentation blog. Let's dive in now!
Snowflake has a hybrid shared disk and shared-nothing architecture in which computing nodes can access the required data from a central repository. Here, you can store a portion of data in computing nodes of Snowflake internally based on shared-nothing architecture. According to the Snowflake architecture, three layers are Database storage, Query processing, and Cloud services.
Let’s know the functions of these layers now.
Database Storage: This is the data storage layer. In general, the data is stored in a columnar format in cloud storage while data loading. And Snowflake manages all the aspects of data storage such as data organisation, file size, data compression, structure, and statistics. Here, you can access data using SQL query operations only.
Query Processing: This is the query processing layer in which virtual warehouses process queries. A virtual warehouse is nothing but an MPP (Massively Parallel Processing) compute cluster composed of multiple computing nodes. Especially, warehouses are separated from each other, so that performance of a warehouse cannot affect the performance of other warehouses.
Cloud Services: This layer helps coordinate Snowflake's various activities. The activities include authentication, infrastructure management, metadata management, access control, and query parsing and optimisation.
Want to enhance your skills to become a master in Snowflake Certification, Enroll in our Snowflake Certification Online Course |
The cloud services such as AWS, Google Cloud Platform, and Microsoft Azure can host Snowflake accounts. Snowflake offers account provisions in the location, known as regions, of the mentioned cloud services. Also, Snowflake loads data stored in stages such as Snowflake internal stages, Amazon S3, Google cloud storage, and Microsoft Azure blob storage. In this way, you can make both bulk and continuous loading in Snowflake, similar to Snowpipe. Similarly, Snowflake supports data unloading from tables to internal stages too.
In general, Snowflake forms an ecosystem including various tools and technologies. This ecosystem includes various data integration tools, BI tools, ML and Data Science tools, etc. Snowflake can work with these technologies and tools with the help of Snowflake connectors and drivers. For instance, data integration is achieved in Snowflake based on the ETL method – Extract, Transform, and Load. Similarly, BI tools help analyse and retrieve insights and analytics from the data. BI tools help to make informed decisions by which you can improve your business performance. And ML and data science help analyse data to derive statistical outputs and make predictions. Likewise, other technologies can be connected with Snowflake and effectively perform their operations with Snowflake.
In general, Snowflake connectors serve as the interface between Snowflake and other applications such as Python, Spark, and Kafka. The connector for Python helps to develop python applications, and it is a pure, native python package. And it supports developing applications using python database API v2 specification. Connector for Spark supports reading data from Snowflake and writing data into Snowflake. Similarly, the connector for Kafka helps to load data into Snowflake tables.
When drivers are concerned, many drivers connect Snowflake with external applications. On this note, the drivers that connect Snowflake with other applications are Node.js driver, Go Snowflake driver,.NET driver, JDBC driver, ODBC driver, and PHP PDO Driver.
[Also Read: Snowflake Tutorial]
Let’s have a look into the types of drivers now.
Node.js Driver – It provides an asynchronous node.js interface to Snowflake
GoSnowflake Driver – It helps to interface GO programming platform to connect with Snowflake for developing applications
.NET Driver – It supports to interface .NET open-source software framework to connect with Snowflake for developing applications
JDBC Driver – It helps you to interface Snowflake with the applications that use JDBC to connect with a database server
ODBC Driver – It allows connecting Snowflake with ODBC based applications
PHP PDO Driver – It helps you to connect Snowflake with the PHP platform for developing applications
Snowflake supports both bulk and continuous data loading. It allows loading data from any external stages such as Amazon S3, Google cloud storage, and Microsoft Azure. In addition, Snowflake uses internal stages such as user, table, and named stage to load data. The data file is compressed before loading it. While compressing files, you must specify the compression format of the data files without missing it. Besides, Snowflake supports both structured as well as semi-structured file formats. Here, structured formats include CSV and TSV, whereas Semi-structured formats include JSON, Avro, ORC, Parquet, and XML.
Snowflake loads data into database tables in bulk form using the COPY INTO <TABLE> command. In this loading type, data can be loaded from either a local storage system or external cloud storage locations such as Amazon S3, Google cloud storage, or Microsoft Azure. While making data transfer from a local file system, data is loaded into Snowflake tables through internal stages. On the contrary, data is directly loaded into tables while transferring them from external stages.
Snowpipe has a serverless computing model that supports managing the load capacity and optimising the use of resources according to the demands. You can load data into Snowflake tables from the internal stages as soon as they are available. The Snowflake web interface supports loading a small set of flat files into Snowflake tables. Though Snowflake uses PUT and COPY commands to load data into Snowflake tables, it executes the commands as a single operation.
[Related Article: Snowflake Architecture]
Like data loading, Snowflake executes data unloading from Snowflake database tables. Here, the COPY INTO command exports data from Snowflake tables into Snowflake or external stages. When you look into the partitioned unloading, it is executed by using the COPY INTO command that includes the PARTITION BY copy option. Here, both structured and semi-structured file formats can be used to unload data into internal stages. Besides, Snowflake supports Named file formats, which will contain all the information about the format.
At first, Snowflake uses the COPY INTO command to load data into internal stages; then, it uses the GET command to load the data into local files. Similarly, you can unload data into the cloud storage services using the COPY INTO command, either specifying the external stages or directly into the storage location using the URI (Uniform Resource Identifier).
For example, the following steps can be followed to unload data into Named internal stages from a Snowflake table.
Let’s get into the steps right away.
Step:1 - Creating internal stages using named file format object my_csv_format, created to unload data.
create or replace stage my_unload_stage
file_format my_csv_unload_format;
Step:2 - Use the COPY INTO command to unload the rows from a table into named internal stages. The command is given by:
copy into @mystage/unload/ from mytable;
Step:3 - You can use the LIST command to see the files that have been stored in the internal stages. The command is given by:
list @mystage;
+----------------------------------+------+----------------------------------+-------------------------------+
| name | size | md5 | last_modified |
|----------------------------------+------+----------------------------------+-------------------------------|
| mystage/unload/data_0_0_0.csv.gz | 112 | 6f77daba007a643bdff4eae10de5bed3 | Mon, 11 Sep 2017 18:13:07 GMT |
+----------------------------------+------+----------------------------------+-------------------------------+
Step:4 - Use the GET command to download the files into the local storage. The command is given by:
get @mystage/unload/data_0_0_0.csv.gz file:///data/unload;
get @mystage/unload/data_0_0_0.csv.gz file://c:\data\unload;
Classic web interfaces of Snowflake support to carry out all the tasks that SQL and the command line can perform. It also helps carry out various operations, such as creating and managing users, creating and managing account-level objects, creating and using virtual warehouses, creating and modifying databases and database objects, loading data into Snowflake tables, and managing queries. In short, you can say, Snowflake classic web interface is a powerful and easy tool.
With classic web interfaces, you can change passwords, perform session-related tasks, and manage user preferences of the Snowflake accounts. Also, you can add MFA (Multi-Factor Authentication) to increase the security of your account in addition to the standard credentials. MFA is the second level of security control, which can be leveraged into your account to secure the ‘logging in’ process to the Snowflake account. To use this feature, you need to enroll to use MFA as the additional security control in the following two steps:
Step:1 – You can view the dropdown menu next to your login name, where you choose 'preferences'.Then, the preference page will be appeared as below:
Step:2 – Next, enroll in MFY by clicking the link. To complete this, you need a smartphone with a Duo mobile application.
For more details click on this URL: https://docs.snowflake.com/en/user-guide/ui-preferences.html
This feature allows you to switch roles if you have more than one. You can access and manage databases, tables, and other objects based on the chosen role. And you can use Snowsight, which is nothing but the replacement for SQL worksheets; it allows executing data analytic tasks. Also, using the compute area in the new interface, you can manage warehouses and resource monitors. In addition, the feature Snowflake marketplace supports to analyse the query available datasets and derive insights to make informed decisions through automated updates.
A virtual warehouse is nothing but a cluster of computing resources such as CPUs, memories, and storage. And it supports executing operations such as SQL SELECT and DML. Here, SQL SELECT helps retrieve rows from Snowflake tables; DML operations include updating rows in tables and loading and unloading data into tables. Precisely, warehouses can be scaled according to the need for resources, even during running.
You can create a warehouse using the following command or option.
SQL: Executing the command CREATE WAREHOUSE
Web Interface: Click on Warehouses » Create
Also, you can resume the inactive warehouse using the following command and option.
SQL: Executing the command ALTER WAREHOUSE with the keyword RESUME
Web Interface: Click on Warehouses » <suspended_warehouse_name1>» Resume
Similarly, a warehouse can be suspended using the following command or option.
SQL: Executing the command ALTER WAREHOUSE command with the keyword suspend
Web Interface: Click on Warehouses » <started_warehouse_name1>» Suspend
Next, you can resize a warehouse as follows.
SQL: Executing the command ALTER WAREHOUSE with SET WAREHOUSE_SIZE = …..
Web Interface: Click on Warehouses » <warehouse_name1>» Configure
In general, each database in Snowflake consists of one or more schemas. And there is no limit to using databases, schemas, and objects. Also, you can create temporary and transient tables, besides permanent tables. When external tables are concerned, they have the metadata information about the data files, which in turn simplifies the querying process. Here, the metadata includes file path, version identifier, and partitioning information.
Let’s see how to create a temporary table using a temporary keyword.
create temporary table mytemptable (id number, creation_date date);
And the transient table can be created as follows;
create transient table mytranstable (id number, creation_date date);
Similarly, external tables can be created using the command below;
CREATE EXTERNAL TABLE
<table name>
( <part_col_name> <col_type> AS <part_expr> )
[, ...]
[ PARTITION BY ( <part_col_name> [, <part_col_name> ... ])]
..
Snowflake can recover the data that was deleted or modified. Using Snowflake time travel, you can clone, query, or restore historical data for up to 90 days. Similarly, using the Snowflake fail-safe mechanism, you can recover historical data when there is a disaster. According to Snowflake time travel, the data retention period is essential because the deleted or modified data can be preserved only for the retention period. After that, it will be moved to the Snowflake failsafe. Here, time-travel commands such as SELECT, CREATE, CLONE, UNDROP are used to recover the data.
If you would like to change the retention period of an object, then you can use the ‘alter table’ command to execute the same. For example:
create table mytable (coll number, col2 date) data_retention_time_in_days=90;
alter table mytable set data_retention_time_in_days=30;
Always, you can share database objects such as tables, external tables, secure views, and UDFs with other Snowflake account holders. A database share will consist of privileges to access databases and schema, specific objects, and the consumer account details. Besides, privileges can be given to other roles as and when required.
Here, the command GRANT <privileges>...TO ROLE can be used to grant access privileges to other roles
Also, you can offer MODIFY privileges to a user by using the ‘grant modify’ command.
grant the privilege to the SYSADMIN role
grant modify on data exchange profile "<provider_profile_name>" to role sysadmin;
Similarly, ownership for a profile can be given to a user using the ‘grant ownership’ command. For example:
grant the privilege to the SYSADMIN role
grant ownership on data exchange profile "<provider_profile_name>" to role sysadmin;
If you have the ACCOUNT ADMIN role, then you can access the ‘shared data page’ in the Snowflake web interface to proceed with the tasks related to data sharing. This page has the options of ‘data shared with me’, ‘data shared by you’, requests, and manage exchanges. Here, the ‘data shared with me’ mentions the data of direct shares, data exchange, and the Snowflake marketplace. Moreover, there are two types of data requests - Inbound and Outbound. Inbound is the data request made by consumers to access your data. On the other hand, outbound is the data you request from other providers in data exchange.
As a Snowflake account holder, you can share data with the group members that you have chosen. In this way, you can publish data that consumers can access. Using the data exchange feature, data can be shared between internal groups, as well as with external parties. As a result, you can avoid data silos and get business insights. And you can secure the data by managing membership, granting and revoking access for consumers, auditing how they use data, and applying security controls to the data.
Typically, account identifiers help to identify your Snowflake account within your organisation; and it also helps to identify from cloud platforms and cloud regions externally. In this scenario, you can see two types of identifiers. One identifier is the account name in the organisation, and the other is the Snowflake-assigned locator in a cloud region. Moreover, you can use account identifiers in URLs, SnowSQL, drivers, connectors, third-party applications, secure data sharing, and failover to connect with Snowflake.
You can measure the cost for Snowflake usage in terms of usages on data storage, virtual warehouse, and cloud services. In general, a Snowflake credit is a measure that helps to calculate how a Snowflake account holder uses various resources. So, payment for the service is made by calculating the Snowflake credits. Also, the cost is accounted for when data is transferred from one region to another within a cloud platform as well as out of the cloud platform.
There are three types of parameters used in Snowflake accounts. They are described as follows:
In general, parameters have default values that account administrators can override.
If you want to see the list of parameters and their current and default values, you can use the SHOW PARAMETERS commands to achieve the same. It is given as below:
SHOW PARAMETERS [ LIKE ‘<pattern>’] IN ACCOUNT
For example, you can see a sample list of account level parameters as below:
show parameters in account;
+-------------------------------------+----------------------------------+----------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| key | value | default | level | description |
|-------------------------------------+----------------------------------+----------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ABORT_DETACHED_QUERY | false | false | | If true, Snowflake will automatically abort queries when it detects that the client has disappeared. |
| AUTOCOMMIT | true | true | | The autocommit property determines whether is statement should to be implicitly |
| | | | | wrapped within a transaction or not. If autocommit is set to true, then a |
| | | | | statement that requires a transaction is executed within a transaction |
| | | | | implicitly. If autocommit is off then an explicit commit or rollback is required |
| | | | | to close a transaction. The default autocommit value is true. |
| AUTOCOMMIT_API_SUPPORTED | true | true | | Whether autocommit feature is enabled for this client. This parameter is for |
| | | | | Snowflake use only. |
| BINARY_INPUT_FORMAT | HEX | HEX | | input format for binary |
| BINARY_OUTPUT_FORMAT | HEX | HEX | | display format for binary |
| CLIENT_ENCRYPTION_KEY_SIZE | 128 | 128 | | Client-side encryption key size in bits. Either 128 or 256. |
| CLIENT_SESSION_KEEP_ALIVE | false | false | | If true, client session will not expire automatically |
| DATA_RETENTION_TIME_IN_DAYS | 1 | 1 | | number of days to retain the old version of deleted/updated data |
| DATE_INPUT_FORMAT | AUTO | AUTO | | input format for date |
| DATE_OUTPUT_FORMAT | YYYY-MM-DD | YYYY-MM-DD | | display format for date |
| ERROR_ON_NONDETERMINISTIC_MERGE | true | true |
For more details Click on this URL: https://docs.Snowflake.com/en/user-guide/admin-account-management.html
SQL or the web interface achieves user management in Snowflake. Using SQL, administrators can perform user-related tasks such as creating login credentials as well as defaults. Unlike SQL, web interface wizards provide limited capabilities to create users and reset passwords only.
According to Snowflake’s resetting password policy, the password must have the following requirements by default whenever passwords are reset using the web interface and SQL.
Snowflake provides high-level security features such as network access control, user and group administration, account authentication, object security, data security, and many more. In short, Snowflake provides secure authentication to resources and accounts. In this regard, federated authentication helps connect users with Snowflake using secure SSO. When SSO is enabled, users are authenticated through an external and SAML 2.0 compliant identity provider. This option helps initiate one or multiple sessions in Snowflake without logging in. Similar to SSO, Snowflake supports high-level authentication through key pair authentication. It requires a 2048 bit RSA key pair, and it helps create a private-public key pair using Open SSL.
You can manage network configurations in Snowflake by creating network policies. These policies help to restrict access based on IP addresses. Mainly, a network administrator can create any number of policies based on the requirements to ensure the security of a network. For example, a network policy helps you create a ‘IP allowed list’ and ‘IP blocked list’ in confidence.
Snowflake uses the SCIM feature that helps to carry out the automated management of user identities and groups. Here, SCIM is known as the System for Cross-domain Identity Management. In addition, Snowflake provides you with a very fine level of access management by which access to objects, operations, and changing access policies can be effectively managed.
Snowflake offers you many interfaces that support programming applications to connect with Snowflake. In this way, connectors, drivers, and client APIs act as the interface to connect developing applications with Snowflake. And UDF is known as user-defined functions, which help perform operations out of the boundary of Snowflake. Note that UDF can only support the languages such as SQL, JavaScript, and Java as of now.
Snowpark is the library that supports querying data and processing data in a data pipeline using intuitive APIs. It is one of the libraries that help build applications that process data in Snowflake. Mainly, it avoids moving data into applications where application codes run. Simply put, Snowflake doesn’t require separate clusters outside of Snowflake; on the other hand, all the computations will be carried out inside Snowflake.
External functions are the User Defined Functions (UDFs) stored and executed outside Snowflake. It simplifies the process of data importing and exporting while working in third-party applications. And they support access API services such as geocoders and ML Models outside of Snowflake.
The database object is created in a database and schema, representing the external function. For example:
select my_database.my_schema.my_external_function (coll) from tablel;
They allow you to write procedural code that helps to execute SQL.They can be written once and used many times. They support the dynamic creation of SQL statements and their execution. Stored procedures help to achieve looping and branching using procedural logic. Above all, a stored procedure is an independent statement. You can understand this by seeing the below codes that explain the difference between calling a function and calling a procedure.
CALL MyStoredProcedure_1 (argument_1);
SELECT MyFunction_1(column_1) FROM tablel;
To sum up, this snowflake documentation blog must have been a helpful reading to know how Snowflake simplifies cloud operations of users without provisioning, installing, and managing resources. Mainly, we have covered the total bandwidth of Snowflake’s features such as how to load and unload data, security features, various uses of Snowflake, and how to develop applications connecting with Snowflake in detail. Eventually, you can go forth by learning Snowflake thoroughly via our training and put it into action in full throttle.
Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!
Name | Dates | |
---|---|---|
Snowflake Training | Aug 05 to Aug 20 | |
Snowflake Training | Aug 08 to Aug 23 | |
Snowflake Training | Aug 12 to Aug 27 | |
Snowflake Training | Aug 15 to Aug 30 |
Madhuri is a Senior Content Creator at MindMajix. She has written about a range of different topics on various technologies, which include, Splunk, Tensorflow, Selenium, and CEH. She spends most of her time researching on technology, and startups. Connect with her via LinkedIn and Twitter .
1 /10
Copyright © 2013 - 2023 MindMajix Technologies