Mocking up data for Snowflake using Python

By Shiva Laccaraju

With the introduction of Snowflake in the data science market, the need to mockup data has become increasingly important. With Snowflake, developers, and data scientists can easily create a variety of databases from scratch, which allows them to generate data that is accurate and reliable. However, the process of creating this data can be time-consuming and tedious. 

Fortunately, there are tools available that can help make the process easier. Please refer to data science as a tool for business growth, for more information.

Faker is one such tool that helps in mocking up data for Snowflake. It is a Python library that helps generate fake data similar to real data. It uses various algorithms and options to generate data that looks and behaves like real data. It also supports a variety of data types such as text, numbers, dates, and images.

Using Faker to generate data for Snowflake is easy. All you need to do is specify the data type you want to generate and the size of the data set. Faker will then generate the required data for you in the format you have specified. Once the data is generated, you can use Snowflake’s SQL script or procedure to create the database and tables which will store the data.

In addition to generating data, Faker also helps in creating a data model for the data. This data model helps define the data’s structure and how it should be organized in the database. This helps to ensure that the data is properly structured and organized, which makes it easier to query and analyze.

By using Faker to mock up data for Snowflake, developers and data scientists can quickly generate realistic data that looks and behaves like real data. This makes it easier to develop and test applications that use the data. It also makes it easier to generate reports and analyses based on the data. All in all, Faker is a great tool to use when creating data for Snowflake. Please refer to Faker documentation for more information.

Snowflake is a popular cloud-based data warehousing solution that allows organizations to store, analyze and query large amounts of data in a scalable and efficient manner. When working with Snowflake, creating test data to validate the performance and accuracy of SQL scripts and procedures is often necessary. One way to accomplish this is by mocking up data using Faker, a Python library that generates realistic fake data like company name, address, bank, credit card, credit score, currency, phone number, etc. Please refer to Snowflake accelerator, for more information on Snowflake warehousing solutions.

In this blog, we will explore how to use Faker to create mock data for Snowflake, and how to design SQL scripts or procedures to create data that is similar to Faker.

STEP 1: Install Faker

pip install Faker

STEP 2: Create a Python script

Once Faker is installed, we can create a Python script to generate fake data. Here’s an example script that generates mock data for a customer table:

from faker import Faker

import snowflake.connector

# Connect to Snowflake

conn = snowflake.connector.connect(

    user='<username>’,

    password='<password>’,

    account='<account_name>’

)

# Create a cursor object

cur = conn.cursor()

# Instantiate Faker

fake = Faker()

# Generate 1000 customer records

for i in range(1000):

    first_name = fake.first_name()

    last_name = fake.last_name()

    email = fake.email()

    phone = fake.phone_number()

    # Insert record into customer table

    cur.execute(f”INSERT INTO customer (first_name, last_name, email, phone) VALUES (‘{first_name}’, ‘{last_name}’, ‘{email}’, ‘{phone}’)”)

# Commit changes

conn.commit()

# Close connection

conn.close()

This script connects to Snowflake that generates 1000 customer records using Faker, and inserts them into the customer table. You can modify this script to generate data for other tables as well.

STEP 3: Design SQL scripts or procedures

Once you have generated mock data using Faker, you can use it to design SQL scripts or procedures that creates data similar to Faker. For example, here’s a SQL script that creates a table with columns similar to the customer table we generated mock data for earlier:

CREATE TABLE customer (
customer_id INTEGER AUTOINCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100),
phone VARCHAR(20)
);

This script creates a customer table with columns for customer ID, first name, last name, email, and phone number. You can modify this script to create tables with columns for other types of data as well.

Shiva Laccaraju

Data Engineer

Boolean Data Systems

Shiva works as a Data Engineer at Boolean Data Systems. He is a SnowPro Core and Matillion Associate certified engineer who has worked on several projects and has experience in Snowflake, Streamlit, Python to name a few.

Conclusion:

Mocking up data for Snowflake using Faker can save your time and effort when designing SQL scripts and procedures. By generating realistic fake data, you can test the performance and accuracy of your code without using real data. Once you have generated mock data, you can use it to design SQL scripts or procedures that create data similar to Faker. With these tools at your disposal, you can ensure that your Snowflake data warehouse is functioning properly and efficiently.

About Boolean Data
Systems

Boolean Data Systems is a Snowflake Select Services partner that implements solutions on cloud platforms. we help enterprises make better business decisions with data and solve real-world business analytics and data problems.

Global
Head Quarters

1255 Peachtree Parkway, Suite #4204, Alpharetta, GA 30041, USA.
Ph. : +1 678-261-8899
Fax : (470) 560-3866