Customize materialized fields

When you first materialize a collection to an endpoint like a database or data warehouse, the resulting table columns might not be formatted how you want. You might notice missing columns, extra columns, or columns with names you don't like. This happens when the collection's JSON schema doesn't map to a table schema appropriate for your use case.

You can control the shape and appearance of materialized tables using a two-step process.

First, you modify the source collection schema. You can change column names by adding projections: JSON pointers that turn locations in a document's JSON structure into custom named fields.

Then, you add the fields stanza to the materialization specification, telling Flow which fields to materialize.

The following sections break down the process in more detail.

Hint

If you just need to add a field that isn't included by default and it's already present in the schema with a name you like, skip ahead to include desired fields in your materialization.

Capture desired fields and generate projections

Any field you eventually want to materialize must be included in the collection's schema. It's ok if the field is nested in the JSON structure; you'll flatten the structure with projections.

caution

In this workflow, you'll edit a collection. This change can impact other downstream materializations and derivations. Use caution and be mindful of any edit's consequences before publishing.

Captured collections

If the collection you're using was captured directly, follow these steps.

Go to the Captures page of the Flow web app and locate the capture that produced the collection.
Click the Options button and choose Edit Specification.
Under Output Collections, choose the binding that corresponds to the collection. Then, click the Collection tab.
In the list of fields, look for the fields you want to materialize. If they're present and correctly named, you can skip to including them in the materialization.

hint:

Compare the field name and pointer. For nested pointers, you'll probably want to change the field name to omit slashes.

If your desired fields aren't present or need to be re-named, edit the collection schema manually:
1. Click Edit.
2. Add missing fields to the schema in the correct location based on the source data structure.
3. Click Close.
Generate projections for new or incorrectly named fields.
1. If available, click the Schema Inference button. The Schema Inference Window appears. Flow cleans up your schema and adds projections for new fields.
2. Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.
3. Click Next.
info
Schema Inference isn't available for all capture types. You can also add projections manually with flowctl. Refer to the guide to editing with flowctl and how to format projections.
Repeat steps 3 through 6 with other collections, if necessary.
Click Save and Publish.

Derived collections

If the collection you're using came from a derivation, follow these steps.

Pull the derived collection's specification locally using flowctl.

flowctl catalog pull-specs --name <yourOrg/full/collectionName>

Review the collection's schema to see if the fields of interest are included. If they're present, you can skip to including them in the materialization.
If your desired fields aren't present or are incorrectly named, add any missing fields to the schema in the correct location based on the source data structure.
Use schema inference to generate projections for the fields.

flowctl preview --infer-schema --source <full\path\to\flow.yaml> --collection <yourOrg/full/collectionName>

Review the updated schema. Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.
Re-publish the collection specification.

Include desired fields in your materialization

Now that all your fields are present in the collection schema as projections, you can choose which ones to include in the materialization.

Every included field will be mapped to a table column or equivalent in the endpoint system.

If you haven't created the materialization, begin the process. Pause once you've selected the collections to materialize.

If your materialization already exists, navigate to the edit materialization page.
In the Collection Selector, choose the collection whose output fields you want to change. Click its Collection tab.
Review the listed fields.

In most cases, Flow automatically detects all fields to materialize, projected or otherwise. However, a projected field may still be missing, or you may want to exclude other fields.

By default, Estuary's recommended field selection generally includes:
- Scalars (simple data types including strings, numbers, booleans, nulls), and
- Natively supported types for the destination (e.g. arrays in the case of SQL destinations)
When dealing with objects in your data, Estuary:
- Flattens objects: Estuary flattens nested structures and includes the scalar fields within them by default.
- Excludes top-level objects: Top-level objects need to be explicitly selected to be included in the materialization.
Complex data structures like nested objects and maps are excluded by default.
Choose whether to start with one of Flow's field selection modes. You can customize individual fields later. Available modes include:
- Select Scalars: Include all scalar fields using the default setting
- Exclude All: Only required fields
For each individual field, you can choose one of these options:
- Select: The field is included based on the chosen mode; if the field becomes unavailable, it may be dropped silently.
- Require: Ensure the field is materialized; Flow will raise an error if the field cannot be materialized.
- Exclude: Prevent the field from being materialized to the destination.
Repeat steps 2 through 5 with other collections, if necessary.
Click Save and Publish.

The named, included fields will be reflected in the endpoint system.

Capture desired fields and generate projections​

Captured collections​

Derived collections​

Include desired fields in your materialization​

Capture desired fields and generate projections

Captured collections

Derived collections

Include desired fields in your materialization