Skip to main content

Google Cloud Firestore to Snowflake

This guide walks you through the process of creating an end-to-end real-time Data Flow from Google Cloud Firestore to Snowflake using Estuary.

Prerequisites

You'll need:

Introduction

In Estuary, you create Data Flows to transfer data from source systems to destination systems in real time. In this use case, your source is a Google Cloud Firestore NoSQL database and your destination is a Snowflake data warehouse.

After following this guide, you'll have a Data Flow that comprises:

  • A capture, which ingests data from Firestore
  • Several collections, cloud-backed copies of Firestore collections in the Estuary system
  • A materialization, which pushes the collections to Snowflake

The capture and materialization rely on plug-in components called connectors. We'll walk through how to configure the Firestore and Snowflake connectors to integrate these systems with Estuary.

Capture from Firestore

You'll first create a capture to connect to your Firestore database, which will yield one Estuary collection for each Firestore collection in your database.

  1. Go to Estuary's dashboard at dashboard.estuary.dev and sign in.

  2. Click the Sources tab and choose New Capture.

  3. Find the Google Firestore tile and click Capture.

    A form appears with the properties required for a Firestore capture.

  4. Type a name for your capture.

    Your capture name must begin with a prefix to which you have access.

    In the Name field, use the drop-down to select your prefix. Append a unique capture name after the / to create the full name, for example, acmeCo/myFirestoreCapture.

  5. Fill out the required properties for Firestore.

    • Database: Estuary can autodetect the database name, but you may optionally specify it here. This is helpful if the service account used has access to multiple Firebase projects. Your database name usually follows the format projects/$PROJECTID/databases/(default).

    • Credentials: The JSON service account key created per the prerequisites.

  6. Click Next.

    Estuary uses the provided configuration to initiate a connection with Firestore.

    It maps each available Firestore collection to a possible Estuary collection. It also infers schemas for each collection.

    You can remove or modify collections in the Source Collections section. For each collection, you can rename or redact fields from its Collection tab.

tip

If you make any changes to collections, click Next again.

  1. Once you're satisfied with the collections to be captured, click Save and Publish.

    You'll see a notification when the capture publishes successfully.

    The data currently in your Firestore database has been captured, and future updates to it will be captured continuously.

  2. Click Materialize to continue.

Materialize to Snowflake

Next, you'll add a Snowflake materialization to connect the captured data to its destination: your data warehouse.

  1. Locate the Snowflake tile and click Materialization.

    A form appears with the properties required for a Snowflake materialization.

  2. You will be prompted to select a default naming strategy for your materialization.

    For this demo, select Set a default schema and enter the schema name you created as per the prerequisites.

    Click Continue to set your selection.

  3. Choose a unique name for your materialization like you did when naming your capture; for example, acmeCo/mySnowflakeMaterialization.

  4. Fill out the required properties for Snowflake (you should have most of these handy from the prerequisites).

    • Host URL
    • Database
    • Schema
    • Warehouse: optional
    • Role: optional
    • Timestamp type
    • User
    • Private key
  5. Click Next.

    Estuary uses the provided configuration to initiate a connection to Snowflake.

    You'll be notified if there's an error. In that case, fix the configuration form or Snowflake setup as needed and click Next to try again.

    Once the connection is successful, the Endpoint Config collapses and the Source Collections section becomes prominent. It shows the collections you captured previously. Each of them will be mapped to a Snowflake table.

  6. In the Source Collections section, optionally modify the Resource Configuration for each collection.

    This includes table name, an alternative schema name, and field selection options.

  7. For each table, choose whether to enable delta updates.

  8. Click Next to apply the changes you made to collections.

  9. Click Save and Publish. You'll see a notification when the full Data Flow publishes successfully.

What's next?

Your Data Flow has been deployed, and will run continuously until it's stopped. Updates in your Firestore database will be reflected in your Snowflake table based on your defined sync schedule.

You can advance your Data Flow by adding a derivation. Derivations are real-time data transformations. See the guide to create a derivation.