Logs and statistics
Flow collects logs and statistics of catalog tasks to aid in debugging and refinement of your workflows.
Logs
Each organization that uses Flow has a logs
collection under the global ops
prefix.
For the organization Acme Co, it would have the name ops/acmeCo/logs
.
These can be thought of as standard application logs: they store information about events that occur at runtime. They’re distinct from recovery logs, which track the state of various task shards.
Regardless of how many Data Flows your organization has, all logs are stored in the same collection, which is read-only and logically partitioned on tasks. Logs are collected from events that occur within the Flow runtime, as well as the capture and materialization connectors your Data Flow is using.
Log level
You can set the log level for each catalog task to control the level of detail at which logs are collected for that task. The available levels, listed from least to most detailed, are:
error
: Non-recoverable errors from the Flow runtime or connector that are critical to know aboutwarn
: Errors that can be re-tried, but likely require investigationinfo
: Task lifecycle events, or information you might want to collect on an ongoing basisdebug
: Details that will help debug an issue with a tasktrace
: Maximum level of detail that may yield gigabytes of logs
The default log level is info
. You can change a task’s log level by adding the shards
keyword to its definition in the catalog spec:
materializations:
acmeCo/debugMaterialization:
shards:
logLevel: debug
endpoint:
{}
Statistics
Each organization that uses Flow has a stats
collection under the global ops
prefix.
For the organization Acme Co, it would have the name ops/acmeCo/stats
.
Regardless of how many Data Flows your organization has, all stats are stored in the same collection, which is read-only and logically partitioned on tasks.
A new document is published to the stats
collection for each task transaction.
Each document includes information about the time and quantity of data inputs and outputs.
Statistics vary by task type (capture, materialization, or derivation).
Use stats to:
- Evaluate the data throughput of a task; for example, a derivation.
- Compare a data throughput of a task between platforms; for example, compare reported data capture by Flow to detected change rate in a source system.
- Access the same information used by Estuary for billing.
- Optimize your tasks for increased efficiency.
See a detailed table of the properties included in stats
documents.