Storage mappings
Flow stores the documents that comprise your collections in a cloud storage bucket. Your storage mapping tells Flow which bucket to use.
When you first register for Flow, your storage mapping is Estuary's secure Google Cloud Storage bucket. Data in Flow's cloud storage bucket is deleted 20 days after collection.
For production workflows, you should set up your own cloud storage bucket as a storage mapping.
You can set up a bucket lifecycle policy to manage data retention in your storage mapping; for example, to remove data after six months.
Recovery logs
In addition to collection data, Flow uses your storage mapping to temporarily store recovery logs.
Flow tasks — captures, derivations, and materializations — use recovery logs to durably store their processing context as a backup. Recovery logs are an opaque binary log, but may contain user data.
The recovery logs of a task are always prefixed by recovery/
,
so a task named acmeCo/produce-TNT
would have a recovery log called recovery/acmeCo/produce-TNT
Flow prunes data from recovery logs once it is no longer required.
Deleting data from recovery logs while it is still in use can cause Flow processing tasks to fail permanently.
Bucket lifecycle policies
You can add a lifecycle policy to your storage bucket to limit how long to keep collection data. This is similar to Estuary's 20-day limit on collection data when using the trial bucket.
Bucket lifecycle policies should only be applied to the collection-data/
subfolder in a bucket.
Deleting data in the recovery/
folder can cause tasks to fail.
You can apply a lifecycle policy to your storage bucket in:
Add a lifecycle policy in AWS
To add a lifecycle policy in AWS:
-
Select your storage bucket in the AWS S3 console.
-
If
collection-data/
isn't located at the top level of your bucket, note the full path for the directory. -
In the Management tab, click Create lifecycle rule.
-
Add a name for the rule.
-
Choose to Limit the scope to specific prefixes or tags and enter the full path for
collection-data/
as the prefix. -
Select a desired action to take, such as deleting data or setting it to a different storage class.
-
Enter the number of days for the action to take effect and any other information required for the action.
-
Click Create rule.
For full instructions on creating and managing a lifecycle policy in AWS, see the AWS docs.