When you work on a draft Data Flow using
your Flow specifications may be spread across multiple files.
For example, you may have multiple materializations that read from collections defined in separate files,
or you could store a derivation separately from its tests.
You might also reference specifications that aren't in your local draft.
For example, you might create a derivation with a source collection that is not in your local draft.
When you publish your draft, Flow automatically resolves references to specifications across the entirety of the catalog. This is possible because every entity in Flow has a globally unique name.
Alternatively, you can explicitly add other local specification files to the Data Flow's build process by including an
in the Flow specification file you'll publish.
When the draft is published, the imported specifications are treated as part of the file
into which they are imported.
All entities in the draft will be used to overwrite any existing version of those entities in the global catalog.
Explicit imports are useful when you need to update multiple components of a data flow at the same time,
but they're in separate files.
For example, when you update a derivation, you must also update its test(s) at the same time to prevent failures.
You could import
my-derivation.yaml and then publish
my-derivation.yaml to update both entities in the catalog.
A common pattern for a given draft is to have a single top-level specification
file which explicitly imports all the others.
Flow automatically generates such a top-level file for your draft when you begin a local work session
flowctl draft develop.
import section is structured as a list of partial or absolute URIs,
which Flow always evaluates relative to the base directory of the current source file.
For example, these are possible imports within a collection:
# Suppose we're in file "/path/dir/flow.yaml"
- sub/directory/flow.yaml # Resolves to "file:///path/dir/sub/directory/flow.yaml".
- ../sibling/directory/flow.yaml # Resolves to "file:///path/sibling/directory/flow.yaml".
- https://example/path/flow.yaml # Uses the absolute url.
The import rule is flexible; a collection doesn’t have to do anything special
to be imported by another,
flowctl can even directly build remote sources:
# Test an example from a GitHub repository.
$ flowctl draft test --source https://raw.githubusercontent.com/estuary/flow-template/main/word-counts.flow.yaml
Flow resolves, fetches, and validates all imports in your local environment during the catalog build process, and then includes their fetched contents within the published catalog on the Estuary servers. The resulting catalog entities are thus self-contained snapshots of all resources as they were at the time of publication.
This means it's both safe and recommended to directly reference an authoritative source of a resource, such as a third-party JSON schema, as well as resources within your private network. It will be fetched and verified locally at build time, and thereafter that fetched version will be used for execution, regardless of whether the authority URL itself later changes or errors.
Almost always, the
import stanza is used to import other Flow
This is the default when given a string path:
A long-form variant also accepts a content type of the imported resource:
- url: path/to/source/catalog.flow.yaml
Other permitted content types include
but these are not typically used and are needed only for advanced use cases.
Certain catalog entities, like collections, commonly reference JSON schemas.
It's not necessary to explicitly add these to the
they are automatically resolved and treated as an import.
You can think of this as an analog to the JSON Schema
which is used to reference a schema that may
be contained in another file.
The one exception is schemas that use the
at their root to define an alternative canonical URL.
In this case, the schema must be referenced through its canonical URL,
and then explicitly added to the
JSON_SCHEMA content type.
Importing derivation resources
In many cases, derivations in your catalog will need to import resources. Usually, these are TypeScript modules that define the lambda functions of a transformation, and, in certain cases, the NPM dependencies of that TypeScript module.
These imports are specified in the derivation specification, not in the
import section of the specification file.
If a catalog source file
foo.flow.yaml references a collection in
for example as a target of a capture,
there must be an import path where either
bar.flow.yaml or vice versa.
When you omit the
import section, Flow chooses an import path for you.
When you explicitly include the
import section, you have more control over the import path.
Import paths can be direct:
Or they can be indirect:
The sources must still have an import path even if referenced from a common parent. The following would not work:
These rules make your catalog sources more self-contained
and less brittle to refactoring and reorganization.
Consider what might otherwise happen if
were imported in another project without