Imports
The YAML files that comprise a catalog specification may include an import
section.
This is what allows you to organize your catalog spec across multiple
interlinked files.
When a catalog is deployed, the imported resources are treated as part of the file
into which they are imported.
The import
section is structured as a list of partial or absolute URLs,
which Flow always evaluates relative to the base directory of the current source file.
For example, these are possible imports within a collection:
# Suppose we're in file "/path/dir/flow.yaml"
import:
- sub/directory/flow.yaml # Resolves to "file:///path/dir/sub/directory/flow.yaml".
- ../sibling/directory/flow.yaml # Resolves to "file:///path/sibling/directory/flow.yaml".
- https://example/path/flow.yaml # Uses the absolute url.
The import rule is flexible; a collection doesn’t have to do anything special
to be imported by another,
and flowctl
can even directly build remote sources:
# Test an example from the flow-template repository.
$ flowctl draft test --source https://raw.githubusercontent.com/estuary/flow-template/main/word-counts.flow.yaml
Fetch behavior
Flow resolves, fetches, and validates all imports during the catalog build process, and then includes their fetched contents within the built catalog. The built catalog is thus a self-contained snapshot of all resources as they were at the time the catalog was built.
This means it's both safe and recommended to directly reference an authoritative source of a resource, such as a third-party JSON schema. It will be fetched and verified only at catalog build time, and thereafter that fetched version will be used for execution, regardless of whether the authority URL itself later changes or errors.
Import types
Almost always, the import
stanza is used to import other Flow
catalog source files.
This is the default when given a string path:
import:
- path/to/source/catalog.flow.yaml
A long-form variant also accepts a content type of the imported resource:
import:
- url: path/to/source/catalog.flow.yaml
contentType: CATALOG
Other permitted content types include JSON_SCHEMA
,
but these are not typically used and are needed only for advanced use cases.
JSON Schema $ref
Certain catalog entities, like collections, commonly reference JSON schemas.
It's not necessary to explicitly add these to the import
section;
they are automatically resolved and treated as an import.
You can think of this as an analog to the JSON Schema $ref
keyword,
which is used to reference a schema that may
be contained in another file.
The one exception is schemas that use the $id
keyword
at their root to define an alternative canonical URL.
In this case, the schema must be referenced through its canonical URL,
and then explicitly added to the import
section
with JSON_SCHEMA
content type.
Importing derivation resources
In many cases, derivations in your catalog will need to import resources. Usually, these are Typescript modules that define the lambda functions of a transformation, and, in certain cases, the NPM dependencies of that Typescript module.
These imports are specified in the derivation specification, not in the import section of the catalog spec.
For more information, see Derivation specification and creating TypeScript modules.
Import paths
If a catalog source file foo.flow.yaml
references a collection in bar.flow.yaml
,
for example as a target of a capture,
there must be an import path where either foo.flow.yaml
imports bar.flow.yaml
or vice versa.
Import paths can be direct:
Or they can be indirect:
The sources must still have an import path even if referenced from a common parent. The following would not work:
These rules make your catalog sources more self-contained
and less brittle to refactoring and reorganization.
Consider what might otherwise happen if foo.flow.yaml
were imported in another project without bar.flow.yaml
.