Skip to main content

Customize materialized fields

When you first materialize a collection to an endpoint like a database or data warehouse, the resulting table columns might not be formatted how you want. You might notice missing columns, extra columns, or columns with names you don't like. This happens when the collection's JSON schema doesn't map to a table schema appropriate for your use case.

You can control the shape and appearance of materialized tables using a two-step process.

First, you modify the source collection schema. You can change column names by adding projections: JSON pointers that turn locations in a document's JSON structure into custom named fields.

Then, you add the fields stanza to the materialization specification, telling Flow which fields to materialize.

The following sections break down the process in more detail.

Hint

If you just need to add a field that isn't included by default and it's already present in the schema with a name you like, skip ahead to include desired fields in your materialization.

Capture desired fields and generate projections

Any field you eventually want to materialize must be included in the collection's schema. It's ok if the field is nested in the JSON structure; you'll flatten the structure with projections.

caution

In this workflow, you'll edit a collection. This change can impact other downstream materializations and derivations. Use caution and be mindful of any edit's consequences before publishing.

Captured collections

If the collection you're using was captured directly, follow these steps.

  1. Go to the Captures page of the Flow web app and locate the capture that produced the collection.

  2. Click the Options button and choose Edit Specification.

  3. Under Output Collections, choose the binding that corresponds to the collection. Then, click the Collection tab.

  4. In the list of fields, look for the fields you want to materialize. If they're present and correctly named, you can skip to including them in the materialization.

hint:

Compare the field name and pointer. For nested pointers, you'll probably want to change the field name to omit slashes.

  1. If your desired fields aren't present or need to be re-named, edit the collection schema manually:

    1. Click Edit.

    2. Add missing fields to the schema in the correct location based on the source data structure.

    3. Click Close.

  2. Generate projections for new or incorrectly named fields.

    1. If available, click the Schema Inference button. The Schema Inference Window appears. Flow cleans up your schema and adds projections for new fields.

    2. Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.

    3. Click Next.

    info

    Schema Inference isn't available for all capture types. You can also add projections manually with flowctl. Refer to the guide to editing with flowctl and how to format projections.

  3. Repeat steps 3 through 6 with other collections, if necessary.

  4. Click Save and Publish.

Derived collections

If the collection you're using came from a derivation, follow these steps.

  1. Pull the derived collection's specification locally using flowctl.
flowctl catalog pull-specs --name <yourOrg/full/collectionName>
  1. Review the collection's schema to see if the fields of interest are included. If they're present, you can skip to including them in the materialization.

  2. If your desired fields aren't present or are incorrectly named, add any missing fields to the schema in the correct location based on the source data structure.

  3. Use schema inference to generate projections for the fields.

flowctl preview --infer-schema --source <full\path\to\flow.yaml> --collection <yourOrg/full/collectionName>

  1. Review the updated schema. Manually change the names of projected fields. These names will be used by the materialization and shown in the endpoint system as column names or the equivalent.

  2. Re-publish the collection specification.

Include desired fields in your materialization

Now that all your fields are present in the collection schema as projections, you can choose which ones to include in the materialization.

Every included field will be mapped to a table column or equivalent in the endpoint system.

  1. If you haven't created the materialization, begin the process. Pause once you've selected the collections to materialize.

    If your materialization already exists, navigate to the edit materialization page.

  2. In the Collection Selector, choose the collection whose output fields you want to change. Click its Collection tab.

  3. Review the listed field.

    In most cases, Flow automatically detects all fields to materialize, projected or otherwise. However, a projected field may still be missing, or you may want to exclude other fields.

  4. If you want to make changes, click Edit.

  5. Use the editor to add the fields stanza to the collection's binding specification.

    Learn more about configuring fields and view a sample specification.

  6. Choose whether to start with Flow's recommended fields. Under fields, set recommended to true or false. If you choose true, you can exclude fields later.

  7. Use include to add missing projections, or exclude to remove fields.

  8. Click Close.

  9. Repeat steps 2 through 8 with other collections, if necessary.

  10. Click Save and Publish.

The named, included fields will be reflected in the endpoint system.