Edit a Flow specification locally
The Flow web application is designed to make the most common Flow tasks quick and easy. With the app, you're able to create, monitor, and manage captures, materializations, and more. For creating basic Data Flows, the web app is by far the most efficient option, and basic editing capabilities are provided.
However, advanced editing tasks are only possible using flowctl. These include:
- Manually editing collection schemas, for example, to add projections or change the reduction strategy.
- Editing, testing, and publishing multiple entities at once.
- Creating and editing derivations.
A simplified development experience for derivations is available. You can use the web app to create a cloud-based development environment pre-populated with the components you need. Learn how here.
This guide covers the basic procedure of pulling one or more live Flow entities to your local development environment, editing their specifications, and re-publishing them.
Prerequisites
To complete this workflow, you need:
-
One or more published Flow entities. (To edit unpublished drafts, use this guide.)
Pull specifications locally
Every entity (including active tasks, like captures and materializations, and static collections) has a globally unique name in the Flow catalog.
For example, a given Data Flow may comprise:
- A capture,
myOrg/marketing/leads
, which writes to... - Two collections,
myOrg/marketing/emailList
andmyOrg/marketing/socialMedia
, which are materialized as part of... - A materialization,
myOrg/marketing/contacts
.
Using these names, you'll identify and pull the relevant specifications for editing.
-
Authorize flowctl.
-
Go to the CLI-API tab of the web app and copy your access token.
-
Run
flowctl auth token --token <paste-token-here>
-
-
Determine which entities you need to pull from the catalog. You can:
-
Check the web app's Sources, Collections, and Destinations pages. All published entities to which you have access are listed and can be searched.
-
Run
flowctl catalog list
. This command returns a complete list of entities to which you have access. You can refine by specifying a--prefix
and filter by entity type:--captures
,--collections
,--materializations
, or--tests
.
From the above example,
flowctl catalog list --prefix myOrg/marketing --captures --materializations
would returnmyOrg/marketing/leads
andmyOrg/marketing/contacts
. -
-
Pull the specifications you need by running
flowctl catalog pull-specs
:-
Pull one or more specifications by name, for example:
flowctl catalog pull-specs --name myOrg/marketing/emailList
-
Pull a group of specifications by prefix or type filter, for example:
flowctl catalog pull-specs --prefix myOrg/marketing --collections
-
The source files are written to your current working directory.
- Browse the source files.
flowctl pulls specifications into subdirectories organized by entity name, and specifications sharing a catalog prefix are written to the same YAML file.
Regardless of what you pull, there is always a top-level file called flow.yaml
that imports all other nested YAML files.
These, in turn, contain the entities' specifications.
Edit source files and re-publish specifications
Next, you'll complete your edits, test that they were performed correctly, and re-publish everything.
-
Open the YAML files that contain the specification you want to edit.
-
Make changes. For guidance on how to construct Flow specifications, see the documentation for the task type:
-
When you're done, you can test your changes:
flowctl catalog test --source flow.yaml
You'll almost always use the top-level flow.yaml
file as the source here because it imports all other Flow specifications
in your working directory.
Once the test has passed, you can publish your specifications.
- Re-publish all the specifications you pulled:
flowctl catalog publish --source flow.yaml
Again you'll almost always want to use the top-level flow.yaml
file. If you want to publish only certain specifications,
you can provide a path to a different file.
- Return to the web app or use
flowctl catalog list
to check the status of the entities you just published. Their publication time will be updated to reflect the work you just did.
If you're not satisfied with the results of your edits, repeat the process iteratively until you are.