Customize materialized fields
When you first materialize a collection to an endpoint like a database or data warehouse, the resulting table columns might not be formatted how you want. You might notice missing columns, extra columns, or columns with names you don't like. This happens when the collection's JSON schema doesn't map to a table schema appropriate for your use case.
You can control the shape and appearance of materialized tables using a two-step process.
First, you modify the source collection schema. You can change column names by adding projections: JSON pointers that turn locations in a document's JSON structure into custom named fields.
Then, you add the fields
stanza to the materialization specification, telling Flow which fields to materialize.
The following sections break down the process in more detail.
If you just need to add a field that isn't included by default and it's already present in the schema with a name you like, skip ahead to include desired fields in your materialization.
Capture desired fields and generate projections
Any field you eventually want to materialize must be included in the collection's schema. It's ok if the field is nested in the JSON structure; you'll flatten the structure with projections.
In this workflow, you'll edit a collection. This change can impact other downstream materializations and derivations. Use caution and be mindful of any edit's consequences before publishing.
Captured collections
If the collection you're using was captured directly, follow these steps.
-
Go to the Captures page of the Flow web app and locate the capture that produced the collection.
-
Click the Options button and choose Edit Specification.
-
Under Output Collections, choose the binding that corresponds to the collection. Then, click the Collection tab.
-
In the list of fields, look for the fields you want to materialize. If they're present and correctly named, you can skip to including them in the materialization.
Compare the field name and pointer. For nested pointers, you'll probably want to change the field name to omit slashes.
-
If your desired fields aren't present or need to be re-named, edit the collection schema manually:
-
Click Edit.
-
Add missing fields to the schema in the correct location based on the source data structure.
-
Click Close.
-
-
Generate projections for new or incorrectly named fields.
-
If available, click the Schema Inference button. The Schema Inference Window appears. Flow cleans up your schema and adds projections for new fields.
-
Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.
-
Click Next.
infoSchema Inference isn't available for all capture types. You can also add projections manually with
flowctl
. Refer to the guide to editing with flowctl and how to format projections. -
-
Repeat steps 3 through 6 with other collections, if necessary.
-
Click Save and Publish.
Derived collections
If the collection you're using came from a derivation, follow these steps.
- Pull the derived collection's specification locally using
flowctl
.
flowctl catalog pull-specs --name <yourOrg/full/collectionName>
-
Review the collection's schema to see if the fields of interest are included. If they're present, you can skip to including them in the materialization.
-
If your desired fields aren't present or are incorrectly named, add any missing fields to the schema in the correct location based on the source data structure.
-
Use schema inference to generate projections for the fields.
flowctl preview --infer-schema --source <full\path\to\flow.yaml> --collection <yourOrg/full/collectionName>
-
Review the updated schema. Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.
Include desired fields in your materialization
Now that all your fields are present in the collection schema as projections, you can choose which ones to include in the materialization.
Every included field will be mapped to a table column or equivalent in the endpoint system.
-
If you haven't created the materialization, begin the process. Pause once you've selected the collections to materialize.
If your materialization already exists, navigate to the edit materialization page.
-
In the Collection Selector, choose the collection whose output fields you want to change. Click its Collection tab.
-
Review the listed field.
In most cases, Flow automatically detects all fields to materialize, projected or otherwise. However, a projected field may still be missing, or you may want to exclude other fields.
-
If you want to make changes, click Edit.
-
Use the editor to add the
fields
stanza to the collection's binding specification.Learn more about configuring
fields
and view a sample specification. -
Choose whether to start with Flow's recommended fields. Under
fields
, setrecommended
totrue
orfalse
. If you choosetrue
, you can exclude fields later. -
Use
include
to add missing projections, orexclude
to remove fields. -
Click Close.
-
Repeat steps 2 through 8 with other collections, if necessary.
-
Click Save and Publish.
The named, included fields will be reflected in the endpoint system.