Skip to main content

Working with logs and statistics

Your logs and stats collections are useful for debugging and monitoring catalog tasks.

Accessing logs and statistics

You can access logs and statistics in the Flow web app, by materializing them to an external endpoint, or from the command line.

Logs and statistics in the Flow web app

You can view a subset of logs and statistics for individual tasks in the Flow web app.

Logs

After you publish a new capture or materialization, a pop-up window appears that displays the task's logs. Once you close the window, you can't regain access to the full logs in the web app. For a complete view of logs, use flowctl or materialize the logs collection to an outside system.

However, if a task fails, you can view the logs associated with the error(s) that caused the failure. In the Details view of the published capture or materialization, click the name of its shard to display the logs.

Statistics

Two statistics are shown for each capture, collection, and materialization:

These fields have slightly different meanings for each Flow entity type:

  • For captures, Bytes Written and Docs Written represent the total data written across all of the capture's associated collections.
  • For collections, Bytes Written and Docs Written represent the data written to the collection from its associated capture or derivation.
  • For materializations, Bytes Read and Docs Read represent the total data read from all of the materialization's associated collections.

Accessing logs and statistics from the command line

The flowctl logs and flowctl stats subcommands allow you to print logs and stats, respectively, from the command line. This method allows more flexibility and is ideal for debugging.

You can retrieve logs and stats for any published Flow task. For example:

flowctl logs --task acmeCo/anvils/capture-one

flowctl stats --task acmeCo/anvils/capture-one --uncommitted
Beta

The --uncommitted flag is currently required for flowctl stats. This means that all statistics are read, regardless of whether they are about a successfully committed transaction, or a transaction that was rolled back or uncommitted. In the future, committed reads will be the default.

Printing logs or stats since a specific time

To limit output, you can retrieve logs are stats starting at a specific time in the past. For example:

flowctl stats --task acmeCo/anvils/materialization-one --since 1h

...will retrieve stats from approximately the last hour. The actual start time will always be at the previous fragment boundary, so it can be significantly before the requested time period.

Additional options for flowctl logs and flowctl stats can be accessed through command-line help.

Available statistics

Available statistics include information about the amount of data in inputs and outputs of each transaction. They also include temporal information about the transaction. Statistics vary by task type (capture, materialization, or derivation).

A thorough knowledge of Flow's advanced concepts is necessary to effectively leverage these statistics.

stats collection documents include the following properties.

Shard information

A stats document begins with data about the shard processing the transaction. Each processing shard is uniquely identified by the combination of its name, keyBegin, and rClockBegin. This information is important for tasks with multiple shards: it allows you to determine whether data throughput is evenly distributed amongst those shards.

PropertyDescriptionData TypeApplicable Task Type
/shardFlow shard informationobjectAll
/shard/kindThe type of catalog task. One of "capture", "derivation", or "materialization"stringAll
/shard/nameThe name of the catalog task (without the task type prefix)stringAll
/shard/keyBeginWith rClockBegin, this comprises the shard ID. The inclusive beginning of the shard's assigned key range.stringAll
/shard/rClockBeginWith keyBegin, this comprises the shard ID. The inclusive beginning of the shard's assigned rClock range.stringAll

Transaction information

stats documents include information about a transaction: its inputs and outputs, the amount of data processed, and the time taken. You can use this information to ensure that your Flow tasks are running efficiently, and that the amount of data processed matches your expectations.

PropertyDescriptionData TypeApplicable Task Type
/tsTimestamp corresponding to the start of the transaction, rounded to the nearest minutestringAll
/openSecondsTotalTotal time that the transaction was open before starting to commitnumberAll
/txnCountTotal number of transactions represented by this stats document. Used for reduction.integerAll
/captureCapture stats, organized by collectionobjectCapture
/materializeMaterialization stats, organized by collectionobjectMaterialization
/deriveDerivation statisticsobjectDerivation
/<task-type>/<collection-name>/right/Input documents from a the task's sourceobjectCapture, materialization
/<task-type>/<collection-name>/left/Input documents from an external destination; used for reduced updates in materializationsobjectMaterialization
/<task-type>/<collection-name>/out/Output documents from the transactionobjectAll
/<task-type>/{}/docsTotalTotal number of documentsintegerAll
/<task-type>/{}/bytesTotalTotal number of bytes representing the JSON encoded documentsintegerAll
/derivations/transforms/transformStatsStats for a specific transform of a derivation, which will have an update, publish, or bothobjectDerivation
/derivations/transforms/transformStats/inputThe input documents that were fed into this transformobjectDerivation
/derivations/transforms/transformStats/updateThe outputs from update lambda invocations, which were combined into registersobjectDerivation
/derivations/transforms/transformStats/publishThe outputs from publish lambda invocations.objectDerivation
/derivations/registers/createdTotalThe total number of new register keys that were createdintegerDerivation