How to join two collections (TypeScript)
This guide will teach you how to write and publish a TypeScript derivation, which will join two collections together on a common key.
Introduction
This tutorial will show you how to implement a stateless transformation using TypeScript. You’ll learn how to implement a flow that matches orders to customers in real-time.
Setting up your development environment
The data sources used in this tutorial are available in two Google Sheets. This one for the orders and this one for the customers. Make a copy of each so you’ll be able to test out the pipeline by adding, editing or removing records.
Customers table sample
customer_id | name | phone | |
101 | customer1@email.com | John Doe | 123-456-7890 |
102 | customer2@email.com | Jane Smith | 987-654-3210 |
103 | customer3@email.com | Alex Lee | 555-123-4567 |
Orders table sample
order_id | customer_id | order_date | total_amount |
1 | 101 | 2024-05-10 8:00:00 | 50 |
2 | 102 | 2024-05-09 12:00:00 | 75.5 |
3 | 103 | 2024-05-08 15:30:00 | 100.25 |
As you can see, both tables contain a field called customer_id
. This is what we’re going to use as the key in our join operation. One customer can have multiple orders, but one order can only belong to one customer. There are also some customers without any orders.
Let’s say you want to see all customers and all of their orders in the results. This means, you’ll be looking to implement a full outer join.
To create the collections in Estuary Flow, head over to the dashboard and create a new Google Sheet capture. Give it a name and add one of the previously copied sheet’s URL as the “Spreadsheet Link”. Repeat this process for the other sheet, which should leave you with 2 collections.
You can take a look into each via the data preview window on the Collections page to verify that the sample data has already landed in Flow.
In order to implement transformations through derivations, you’ll need to set up your development environment. You’ll need a text editor and flowctl, the CLI-tool for Flow installed on your machine. Check out the docs page on installation instructions.
To verify that you’re able to access Flow via flowctl
, see if you can execute these commands successfully to view documents in your collections.
flowctl collections read --collection <your_collection_id> --uncommitted
{"_meta":{"uuid":"9790e50e-0ed3-11ef-8401-6d9be407e4b8"},"customer_id":"101","order_date":"2024-05-10 8:00:00","order_id":"1","row_id":203,"total_amount":"50"}
{"_meta":{"uuid":"9790e50e-0ed3-11ef-8801-6d9be407e4b8"},"customer_id":"102","order_date":"2024-05-09 12:00:00","order_id":"2","row_id":204,"total_amount":"75.5"}
{"_meta":{"uuid":"9790e50e-0ed3-11ef-8c01-6d9be407e4b8"},"customer_id":"103","order_date":"2024-05-08 15:30:00","order_id":"3","row_id":205,"total_amount":"100.25"}
{"_meta":{"uuid":"9790e50e-0ed3-11ef-9001-6d9be407e4b8"},"customer_id":"101","order_date":"2024-05-07 10:00:00","order_id":"4","row_id":206,"total_amount":"25.75"}
{"_meta":{"uuid":"9790e50e-0ed3-11ef-9401-6d9be407e4b8"},"customer_id":"103","order_date":"2024-05-06 14:45:00","order_id":"5","row_id":207,"total_amount":"60.2"}
{"_meta":{"uuid":"9790e50e-0ed3-11ef-9801-6d9be407e4b8"},"customer_id":"102","order_date":"2024-05-05 11:20:00","order_id":"6","row_id":208,"total_amount":"45.9"}
{"_meta":{"uuid":"9790e50e-0ed3-11ef-9c01-6d9be407e4b8"},"customer_id":"104","order_date":"2024-05-04 9:30:00","order_id":"7","row_id":209,"total_amount":"80.1"}
If you see something similar, you’re good to continue!
Writing the derivation
Set up your folder structure so you can organize the resources required for the derivation. Create a working directory to follow along, and inside, create a flow.yaml
file.
Inside your flow.yaml
file, add the following contents:
collections:
Dani/join-tutorial-typescript/customers_with_orders:
schema:
description: >-
A document that represents the joined result of orders with customer
information
type: object
properties:
customer_id:
type: string