Capture Multiple Paths with File Source Connectors

File source connectors like Amazon S3, Google Cloud Storage, SFTP, Google Drive, Azure Blob Storage, Dropbox, and HTTP File all support capturing from multiple paths within a single capture task. However, the Estuary web app only creates a single binding during initial setup. To add additional paths, you can use flowctl or the Advanced Specification Editor to manually configure extra bindings.

This is useful when you need to:

Capture files from multiple directories or prefixes into separate collections (e.g., /invoices/ and /receipts/ into their own collections).
Capture files from multiple directories or prefixes into the same collection (e.g., /region-us/ and /region-eu/ merged together).

Prerequisites

An existing file source capture created through the Estuary web app.
flowctl installed and authenticated (for Options A and B via CLI).

Option A: Multiple paths into the same collection

Add extra bindings that capture from different paths but write to the same target collection.

You can also make this change directly in the web app using the Advanced Specification Editor instead of flowctl. Edit your capture, open the Advanced Specification Editor, and add new entries to the bindings array with a different stream value but the same target collection. This only works for Option A — Option B requires creating new collections via flowctl.

1. Pull the capture specification

flowctl catalog pull-specs --name your-org/your-capture

This creates a local flow.yaml file and subdirectories with your capture's specification.

2. Disable auto-discover

Open the capture YAML file. Auto-discover periodically re-runs discovery and rewrites the capture's bindings to match what the connector reports. File source connectors only ever discover a single binding (the bucket/prefix root), so on its next run auto-discover will remove the extra bindings you add below.

Setting addNewBindings to false does not prevent this — it only stops new bindings from being added, not existing ones from being removed. Remove the autoDiscover block entirely (or set it to null) so auto-discover never runs:

captures:
  acmeCo/sftp-capture:
    autoDiscover: null
    endpoint: {...}

Disabling auto-discover does not affect schema inference, which is driven by your collection's read schema and continues regardless.

tip

In the web app, this corresponds to unchecking Automatically keep schemas up to date (under Schema Evolution) when editing the capture.

3. Add a new binding

In the bindings array, add a new entry with a different stream value but the same target collection:

bindings:
  # Existing binding
  - resource:
      stream: "invoices/2024"
    target: your-org/your-collection
  # New binding — different path, same target
  - resource:
      stream: "invoices/2025"
    target: your-org/your-collection

4. Publish

flowctl catalog publish --source flow.yaml

Both paths now feed into the same collection.

Option B: Multiple paths into separate collections

Use this when you want each path captured into its own distinct collection.

1–2. Pull and configure auto-discover

Same as Option A above.

3. Add a new binding with a new target

Add a binding that points to a new collection name:

bindings:
  - resource:
      stream: "invoices/"
    target: your-org/invoices
  - resource:
      stream: "receipts/"
    target: your-org/receipts

4. Define the new collection

The new target collection must exist before you publish. Add its definition to your flow.yaml (or a file imported by it). The easiest approach is to copy the schema and key from your existing collection:

Copy the schema and key from your existing collection (found in the YAML files that pull-specs created):

collections:
  your-org/receipts:
    schema:
      # Copy from your existing collection's schema
      type: object
      properties:
        _meta:
          type: object
          properties:
            file:
              type: string
            offset:
              type: integer
          required: [file, offset]
      required: [_meta]
    key: [/_meta/file, /_meta/offset]

tip

If you're unsure what schema to use, pull the existing collection's spec and copy it:

flowctl catalog pull-specs --name your-org/your-existing-collection

5. Publish

flowctl catalog publish --source flow.yaml

Worked example: SFTP with two directories

This example captures CSV files from two directories on an SFTP server into separate collections.

captures:
  acmeCo/sftp-capture:
    endpoint:
      connector:
        image: "ghcr.io/estuary/source-sftp:dev"
        config:
          address: sftp.example.com:22
          username: estuary
          password: <SECRET>
          directory: /data
          parser:
            format:
              type: csv
              config:
                delimiter: ","
                encoding: UTF-8
    bindings:
      - resource:
          stream: /data/invoices
        target: acmeCo/invoices
      - resource:
          stream: /data/receipts
        target: acmeCo/receipts

collections:
  acmeCo/invoices:
    schema:
      type: object
      properties:
        _meta:
          type: object
          properties:
            file:
              type: string
            offset:
              type: integer
          required: [file, offset]
      required: [_meta]
    key: [/_meta/file, /_meta/offset]

  acmeCo/receipts:
    schema:
      type: object
      properties:
        _meta:
          type: object
          properties:
            file:
              type: string
            offset:
              type: integer
          required: [file, offset]
      required: [_meta]
    key: [/_meta/file, /_meta/offset]

Worked example: S3 with two prefixes

This example captures JSON files from two S3 prefixes into the same collection.

captures:
  acmeCo/s3-capture:
    endpoint:
      connector:
        image: "ghcr.io/estuary/source-s3:dev"
        config:
          bucket: acme-data-lake
          region: us-east-1
          credentials:
            auth_type: AWSAccessKey
            aws_access_key_id: <SECRET>
            aws_secret_access_key: <SECRET>
          parser:
            format:
              type: json
    bindings:
      - resource:
          stream: acme-data-lake/events/region-us
        target: acmeCo/all-events
      - resource:
          stream: acme-data-lake/events/region-eu
        target: acmeCo/all-events

Connector-specific notes

Google Drive

For Google Drive, the stream value in each binding is the Google Drive folder ID — the long string at the end of the folder's URL (e.g., 1aBcDeFgHiJkLmNoPqRsTuV from https://drive.google.com/drive/folders/1aBcDeFgHiJkLmNoPqRsTuV).

Each binding can point to a different folder. The folderUrl in the endpoint config is the folder used during initial discovery, but bindings are not limited to that folder.

The folderUrl must use the format https://drive.google.com/drive/folders/FOLDER_ID. URLs with /u/0/ or /u/1/ (from Google's multi-account switcher) will be rejected — remove the /u/N segment if present.

SFTP

The stream value is the full path to the directory on the SFTP server (relative to the SFTP chroot). For example, if your SFTP directory config is /data, a binding might use stream: /data/invoices.

Amazon S3 and Google Cloud Storage

The stream value is formatted as bucket-name/prefix. For example: my-bucket/events/region-us.

Important caveats

Parser config is shared. All bindings in a capture share the same endpoint-level parser configuration (compression, format, CSV options, etc.). You cannot mix file formats within a single capture — for example, capturing CSV from one path and JSON from another requires two separate captures.
Disable auto-discover. File source discovery returns a single binding (the bucket/prefix root), so if auto-discover runs it will remove your manually-added bindings on the next discovery cycle. Setting addNewBindings: false does not prevent this — remove the autoDiscover block (or set autoDiscover: null). This does not affect schema inference, which is driven by the collection's read schema.
Target collections must exist. When using Option B (separate collections), the target collection must be defined and published. If you reference a collection that doesn't exist, the publish will fail.

Prerequisites​

Option A: Multiple paths into the same collection​

1. Pull the capture specification​

2. Disable auto-discover​

3. Add a new binding​

4. Publish​

Option B: Multiple paths into separate collections​

1–2. Pull and configure auto-discover​

3. Add a new binding with a new target​

4. Define the new collection​

5. Publish​

Worked example: SFTP with two directories​

Worked example: S3 with two prefixes​

Connector-specific notes​

Google Drive​

SFTP​

Amazon S3 and Google Cloud Storage​

Important caveats​