Create a basic Data Flow
This guide walks you through the process of creating an end-to-end Data Flow.
This guide is intended for new Flow users and briefly introduces Flow's key concepts. Though it's not required, you may find it helpful to read the high level concepts documentation for more detail before you begin.
In Estuary Flow, you create Data Flows to connect data source and destination systems.
The simplest Data Flow comprises three types of entities:
- A data capture, which ingests data from an external source
- One or more collections, which store that data in a cloud-backed data lake
- A materialization, to push the data to an external destination
The capture and materialization each rely on a connector. A connector is a plug-in component that interfaces between Flow and whatever data system you need to connect to. Here, we'll walk through how to leverage various connectors, configure them, and deploy your Data Flow.
Create a capture
You'll first create a capture to connect to your data source system. This process will create one or more collections in Flow, which you can then materialize to another system.
Go to the Flow web application at dashboard.estuary.dev and sign in using the credentials provided by your Estuary account manager.
Click the Sources tab and choose New Capture.
Choose the appropriate Connector for your desired data source.
A form appears with the properties required for that connector. A documentation page with details about that connector appears in the side panel. You can also browse the connectors reference in your browser.
Type a name for your capture.
In the Name field, click the drop-down arrow and select an available prefix. Append a unique capture name after the
/to create the full name, for example
Fill out the required properties and click Next.
Flow uses the provided information to initiate a connection to the source system. It identifies one or more data resources — these may be tables, data streams, or something else, depending on the connector. These are each mapped to a collection.
The Output Collections browser appears, showing this list of available collections. You can decide which ones you want to capture.
Look over the list of available collections. All are selected by default. You can remove collections you don't want to capture, change collection names, and for some connectors, modify other properties.
Narrow down a large list of available collections by typing in the Search Bindings box.
If you're unsure which collections you want to keep or remove, you can look at their schemas.
In the Output Collections browser, select a collection and click the Collection tab to view its schema and collection key. . For many source systems, you'll notice that the collection schemas are quite permissive. You'll have the option to apply more restrictive schemas later, when you materialize the collections.
If you made any changes to output collections, click Next again.
Once you're satisfied with the configuration, click Save and Publish. You'll see a notification when the capture publishes successfully.
Click Materialize collections to continue.
Create a materialization
Now that you've captured data into one or more collections, you can materialize it to a destination.
Find the tile for your desired data destination and click Materialization.
The page populates with the properties required for that connector. More details are on each connector are provided in the connectors reference.
Choose a unique name for your materialization like you did when naming your capture; for example,
Fill out the required properties in the Endpoint Configuration.
Flow initiates a connection with the destination system.
The Endpoint Config has collapsed and the Source Collections browser is now prominent. It shows each collection you captured previously. All of them will be mapped to a resource in the destination. Again, these may be tables, data streams, or something else. When you publish the Data Flow, Flow will create these new resources in the destination.
Now's your chance to make changes to the collections before you materialize them.
Optionally remove some collections or add additional collections.
Type in the Search Collections box to find a collection.
To remove a collection, click the x in its table row. You can also click the Remove All button.
Optionally apply a stricter schema to each collection to use for the materialization.
Depending on the data source, you may have captured data with a fairly permissive schema. You can tighten up the schema so it'll materialize to your destination in the correct shape. (This isn't necessary for database and SaaS data sources, so the option won't be available.)
Choose a collection from the list and click its Collection tab.
Click Schema Inference.
The Schema Inference window appears. Flow scans the data in your collection and infers a new schema, called the
readSchema, to use for the materialization.
Review the new schema and click Apply Inferred Schema.
You can exert even more control over the output data structure using the Field Selector on the Config tab. Learn how.
If you've made any changes to source fields, click Next again.
Click Save and publish. You'll see a notification when the full Data Flow publishes successfully.
Now that you've deployed your first Data Flow, you can explore more possibilities.
Read the high level concepts to better understand how Flow works and what's possible.
Create more complex Data Flows by mixing and matching collections in your captures and materializations. For example:
Materialize the same collection to multiple destinations.
If a capture produces multiple collections, materialize each one to a different destination.
Materialize collections that came from different sources to the same destination.