How to flatten an array using TypeScript
This guide will show you how to flatten an array field in a collection by creating a TypeScript derivation in Estuary Flow.
We'll be using GitPod and TypeScript for our derivation in this guide. Check out our other guides if you're interested in creating derivations locally with flowctl
or using SQL for transformations.
The collection we'll be working with (user_content
) contains a field called tags
, which is an array of objects. Each object in the array has a name and a value. We'll be flattening this array into a new collection, with two separate fields: tag_name
and tag_value
.
The original data looks like this:
{
"id": "1",
"name": "example",
"tags": [
{
"name": "tag1",
"value": "value1"
},
{
"name": "tag2",
"value": "value2"
}
]
}
The resulting data will have the following structure:
{
"tag_name": "tag1",
"tag_value": "value1"
}
Step 1: Set up your GitPod environment
- In the Estuary Flow dashboard, click on the Collections tab.
- Select the checkbox next to the collection you want to work with.
- Click on the Transform button at the top of the table.
- Select TypeScript as the language, and give your new derived collection a name.
- Click Proceed to GitPod to open the GitPod environment.
The GitPod environment will generate a file structure and stub files to get you started. This may take a few moments.
Step 2: Set up your schema
In a folder called <your_tenant>
, you'll find a file called flow.yaml
. This file contains the schema for your derived collection. You'll need to modify this file to match the structure of the data you're working with.
-
Open the
flow.yaml
file in the GitPod environment.Your schema should look something like this:
---
collections:
<your_tenant>/<derivation_name>:
schema:
type: object
properties:
your_key:
type: string
required:
- your_key
key:
- /your_key
derive:
using:
typescript:
module: <derivation_name>.ts
transforms:
- name: user_content
source: <your_tenant>/<capture_name>/public/<collection_name>
shuffle: any -
We need to modify the schema to match what we want our derived collection to look like. We'll be using the
tags
field from the original data, so we'll need to add a new property for each field we want to include in the derived collection. We'll also need to set a key for the derived collection.These updates to the
flow.yaml
file will look something like this:---
collections:
<your_tenant>/<derivation_name>:
schema:
type: object
properties:
tag_name:
type: string
tag_value:
type: string
required:
- tag_name
- tag_value
key:
- /tag_name
derive:
using:
typescript:
module: <derivation_name>.ts
transforms:
- name: user_content
source: <your_tenant>/<capture_name>/public/<collection_name>
shuffle: any -
Save the
flow.yaml
file.
Step 3: Write your TypeScript derivation
In the GitPod environment, you'll find a file called <derivation_name>.ts
in the same folder as the flow.yaml
file you just edited. This is where you'll write your TypeScript code to flatten the array.
-
Open the
<derivation_name>.ts
file in the GitPod environment.You'll see a basic structure for your TypeScript code. It should look something like this:
import {
IDerivation,
Document,
SourceUserContent,
} from "flow/sean-estuary/test-derivation.ts";
export class Derivation extends IDerivation {
userContent(_read: { doc: SourceUserContent }): Document[] {
throw new Error("Not implemented");
}
} -
Now, let's modify the
userContent
function to flatten the array. We'll loop through each document in theSourceUserContent
, and for each document, we'll loop through thetags
array. For each tag, we'll create a new document with thetag_name
andtag_value
fields.Update the
userContent
function to look like this:import {
IDerivation,
Document,
SourceUserContent,
} from "flow/sean-estuary/test-derivation.ts";
export class Derivation extends IDerivation {
userContent(_read: { doc: SourceUserContent }): Document[] {
const doc = _read.doc;
const output: Document[] = [];
if (doc.tags) {
const tagsJson = JSON.parse(doc.tags); // Since our tags are arriving as a string from Google Sheets
for (const tag of tagsJson) {
output.push({
tag_name: tag.name,
tag_value: tag.value,
});
}
}
return output;
}
} -
Save the
<derivation_name>.ts
file.
Step 4: Preview your derivation
-
In the GitPod environment, open a terminal.
-
Run the following command to test your derivation:
flowctl preview --source flow.yaml
-
This will show you a preview of the derived collection, including the flattened fields. Make sure everything looks good.
For example, an original row like this:
{
"_meta": {
...
},
"id": "1",
"name": "test1",
"tags": "[{"name":"PFJUjs6Wec","value":"HB668r7MfN"},{"name":"aIWpjtpNnj","value":"elQ9948Wpf"}]"
}Should appear in your preview as two individual records:
{
"_meta": {
...
},
"tag_name": "PFJUjs6Wec",
"tag_value": "HB668r7MfN"
}
{
"_meta": {
...
},
"tag_name": "aIWpjtpNnj",
"tag_value": "elQ9948Wpf"
} -
Once you've confirmed your results, you can proceed to publish your derivation to Estuary Flow:
flowctl catalog publish --source flow.yaml
Congratulations! You've successfully flattened an array in TypeScript using Estuary Flow. You can now use this technique to flatten other arrays in your data as well.