Google Cloud Firestore to Snowflake
This guide walks you through the process of creating an end-to-end real-time Data Flow from Google Cloud Firestore to Snowflake using Estuary.
Prerequisites
You'll need:
-
(Recommended) understanding of Estuary's basic concepts.
-
Access to the Estuary dashboard through an Estuary account. If you don't have one, visit the web app to register for free.
-
A Firestore database that contains the data you'd like to move to Snowflake. You create this as part of a Google Firebase project.
-
A Google service account with:
-
Read access to your Firestore database, via roles/datastore.viewer. You can assign this role when you create the service account, or add it to an existing service account.
-
A generated JSON service account key for the account.
-
-
A Snowflake account with:
-
A target database, schema, and virtual warehouse.
-
A user with a role assigned that grants the appropriate access levels to these resources and a public key assigned for JWT auth. You can use a script to quickly create all of these items. Have these details on hand for setup with Estuary.
-
The host URL noted. The URL is formatted using the account identifier. For example, you might have the account identifier
orgname-accountname.snowflakecomputing.com.
-
Introduction
In Estuary, you create Data Flows to transfer data from source systems to destination systems in real time. In this use case, your source is a Google Cloud Firestore NoSQL database and your destination is a Snowflake data warehouse.
After following this guide, you'll have a Data Flow that comprises:
- A capture, which ingests data from Firestore
- Several collections, cloud-backed copies of Firestore collections in the Estuary system
- A materialization, which pushes the collections to Snowflake
The capture and materialization rely on plug-in components called connectors. We'll walk through how to configure the Firestore and Snowflake connectors to integrate these systems with Estuary.
Capture from Firestore
You'll first create a capture to connect to your Firestore database, which will yield one Estuary collection for each Firestore collection in your database.
-
Go to Estuary's dashboard at dashboard.estuary.dev and sign in.
-
Click the Sources tab and choose New Capture.
-
Find the Google Firestore tile and click Capture.
A form appears with the properties required for a Firestore capture.
-
Type a name for your capture.
Your capture name must begin with a prefix to which you have access.
In the Name field, use the drop-down to select your prefix. Append a unique capture name after the
/to create the full name, for example,acmeCo/myFirestoreCapture. -
Fill out the required properties for Firestore.
-
Database: Estuary can autodetect the database name, but you may optionally specify it here. This is helpful if the service account used has access to multiple Firebase projects. Your database name usually follows the format
projects/$PROJECTID/databases/(default). -
Credentials: The JSON service account key created per the prerequisites.
-
-
Click Next.
Estuary uses the provided configuration to initiate a connection with Firestore.
It maps each available Firestore collection to a possible Estuary collection. It also infers schemas for each collection.
You can remove or modify collections in the Source Collections section. For each collection, you can rename or redact fields from its Collection tab.
If you make any changes to collections, click Next again.
-
Once you're satisfied with the collections to be captured, click Save and Publish.
You'll see a notification when the capture publishes successfully.
The data currently in your Firestore database has been captured, and future updates to it will be captured continuously.
-
Click Materialize to continue.
Materialize to Snowflake
Next, you'll add a Snowflake materialization to connect the captured data to its destination: your data warehouse.
-
Locate the Snowflake tile and click Materialization.
A form appears with the properties required for a Snowflake materialization.
-
You will be prompted to select a default naming strategy for your materialization.
For this demo, select Set a default schema and enter the schema name you created as per the prerequisites.
Click Continue to set your selection.
-
Choose a unique name for your materialization like you did when naming your capture; for example,
acmeCo/mySnowflakeMaterialization. -
Fill out the required properties for Snowflake (you should have most of these handy from the prerequisites).
- Host URL
- Database
- Schema
- Warehouse: optional
- Role: optional
- Timestamp type
- User
- Private key
-
Click Next.
Estuary uses the provided configuration to initiate a connection to Snowflake.
You'll be notified if there's an error. In that case, fix the configuration form or Snowflake setup as needed and click Next to try again.
Once the connection is successful, the Endpoint Config collapses and the Source Collections section becomes prominent. It shows the collections you captured previously. Each of them will be mapped to a Snowflake table.
-
In the Source Collections section, optionally modify the Resource Configuration for each collection.
This includes table name, an alternative schema name, and field selection options.
-
For each table, choose whether to enable delta updates.
-
Click Next to apply the changes you made to collections.
-
Click Save and Publish. You'll see a notification when the full Data Flow publishes successfully.
What's next?
Your Data Flow has been deployed, and will run continuously until it's stopped. Updates in your Firestore database will be reflected in your Snowflake table based on your defined sync schedule.
You can advance your Data Flow by adding a derivation. Derivations are real-time data transformations. See the guide to create a derivation.