Snowflake Cost Control Best Practices

By Kalyan Chander K

Snowflake has become a well-known cloud-based data warehousing tool. While Snowflake differs from conventional data warehousing solutions in several ways, cost control is one area that needs consideration. If not handled carefully, Snowflake’s usage-based pricing model can rapidly spiral out of control.

We’ll examine some top techniques for limiting Snowflake expenses in this blog. Large volumes of data can be stored, processed, and analyzed by companies using the well-known cloud-based data warehousing platform. Despite the scalable and adaptable design that Snowflake provides, it is crucial to implement a cost control strategy to reduce costs.

Let’s go through some of the top Snowflake cost control techniques in this blog article.

Choose the Right Size of your Warehouses: The size of your Snowflake warehouse has a direct impact on your monthly bill. Your use case will be key in deciding whether to run large or small warehouses. In general, running large queries on large warehouses and small queries on small warehouses is one of the most cost-effective practice.

But warehouses don’t cost anything when they’re turned off. This brings us to our next tip.

Pic credits: Snowflake
Optimize Data Storage

Data storage significantly adds to the cost of a Snowflake account. In order to reduce costs, businesses should optimize their data storage by:

Storing only Relevant Data:

Businesses should only store relevant data in Snowflake. Storing unnecessary data can increase storage costs.

Using Compression:

Snowflake provides built-in compression that reduces the storage size of the data.

Using the Right Data Type:

Choosing the right data type can also reduce storage costs. For instance, using the smallest data type that can hold the required data can save storage space.

Avoid Large Queries:

Large queries can consume a lot of computing power and increase costs. Businesses should optimize their queries by breaking them down into smaller and more efficient queries.

Use Materialized Views:

Materialized views are pre-computed query results that can reduce the computing power required for queries.

Use Caching:

Snowflake provides a caching feature that can speed up query performance.

Suspend Warehouses that are Idle
If you have virtual warehouses that are inactive, you can suspend them to make sure you’re not getting charged for unused compute power. This feature is enabled automatically, but you can reduce the time it takes for a warehouse to shut down after executing a query. In your Snowflake app, go to the Warehouses tab and set a time limit for a warehouse to auto-suspend. You can also turn on auto-resume to resume the warehouse when it gets queried.
Pic credits: Snowflake
Use Snowflake’s Usage Reports

Snowflake provides usage reports that allow you to track your usage patterns and identify areas where you can optimize your costs. You can use these reports to identify queries that are consuming a lot of resources, tables that are not being used, and other areas where you can reduce your costs.

Use Clustering Keys

Clustering keys are a powerful feature in Snowflake that can improve query performance and reduce costs. Clustering keys organize data in a way that aligns with the way queries are executed. This means that queries will only scan the data they need, reducing the amount of data scanned and thus the cost.

Pic credits: Snowflake
Use Resource Management

Snowflake provides resource management features that allow businesses to manage their computing resources efficiently. For instance, businesses can allocate resources to specific users, warehouses, or queries. They can also set limits on resource usage to prevent overconsumption.

Use Resource Monitors

Resource Monitors are a key feature of Snowflake that allows you to control the number of resources allocated to your workloads. By creating resource monitors, you can ensure that you are using the right number of resources for your workloads and avoid over-provisioning which can lead to unnecessary costs. We can use a resource monitor to suspend a warehouse when it reaches its credit limit. A great trick is to set credit thresholds at different levels. For example, you could set an alert for when 70% credit consumption is reached and then another for when 90% of the credit consumption is reached.

Pic credits: Snowflake
Use Time-Travel and Fail-Safe

Snowflake’s Time-Travel and Fail-Safe features allow you to recover from accidental deletions, updates, or drops. However, these features come at a cost. To reduce costs, you should set a retention period that aligns with your needs, and periodically review the amount of storage used for these features.

Use Appropriate Compute Resources

Snowflake provides a wide range of compute resources that are designed for different workloads. By selecting the appropriate compute resources for your workloads, you can optimize performance and reduce costs.

Use Resource Sharing

Resource Sharing is a feature in Snowflake that allows you to share compute resources with other Snowflake accounts. By sharing resources, you can reduce costs and improve overall utilization of resources.

Pic credits: Snowflake
Use Zero-Copy Cloning

This unique feature lets you create database, table and schema clones which use pointers to the live data and don’t need additional storage. As a result, you can save on storage costs and the time it takes to configure the cloned environment. Note that by deleting the original table, storage fee gets transferred to the cloned table. You always need to delete both the original and cloned tables that are not in use.

Create Alerts for Reader Accounts

To share data with non-Snowflake customers, you can create reader accounts. This will let them execute queries on any data shared with them, but it’s you who will bear the cost for their usage. Keeping track of reader accounts can help you prevent unexpected cost spikes caused by active warehouses that are no longer in use. You can always set resource monitors to limit credit usage for reader accounts.

Pic credits: Snowflake
Split Large Files to Minimize Processing Overheads

To distribute the load across the compute resources in an active warehouse, export large files in smaller chunks using a split utility. This will allow Snowflake to divide the workload into parallel threads and load multiple files simultaneously and thus reducing the compute time of your virtual warehouse. The number of load operations that run in parallel cannot exceed the number of data files that are to be loaded. To optimize the number of parallel operations for a load, we recommend aiming at creating data files that are roughly 100-250 MB (or larger) in size when compressed.

For a quick and easy method, we at Boolean data, have developed a Snowflake Cost Estimator application to estimate the compute and storage costs.

We also designed a new application called Snowflake Health Check that can help you control the cost of your Snowflake account by checking various insights of the account usage.
Pic credits: Snowflake

You may also check Controlling Cost in Snowflake | Snowflake Documentation for more information regarding cost controlling.

Kalyan Chander K

Data Engineer

Boolean Data Systems


Kalyan works as a Data Engineer at Boolean Data Systems and has built many end-end Data Engineering solutions. He is a Matillion Associate certified engineer who has proven skills and expertise with Snowflake, Matillion, Python, Streamlit, AWS to name a few.

Conclusion:

Snowflake offers a number of features and best practices that can help you control costs. By using Snowflake’s usage reports, clustering keys, resource monitors, auto-suspend, time-travel, fail-safe, appropriate compute resources, and resource sharing, you can optimize your Snowflake environment and reduce costs. It’s important to regularly review your usage patterns and adjust your Snowflake account settings to ensure that you are using the right number of resources for your workloads. With these best practices in mind, you can maximize the value of Snowflake while minimizing costs.

About Boolean Data
Systems

Boolean Data Systems is a Snowflake Select Services partner that implements solutions on cloud platforms. we help enterprises make better business decisions with data and solve real-world business analytics and data problems.

Global
Head Quarters

1255 Peachtree Parkway, Suite #4204, Alpharetta, GA 30041, USA.
Ph. : +1 678-261-8899
Fax : (470) 560-3866