Custom Chart Debug Page

On this page Carat arrow pointing down

The Custom Chart debug page in the DB Console lets you create one or multiple custom charts showing any combination of available metrics.

The definition of the customized dashboard is encoded in the URL. To share the dashboard with someone, send them the URL. Like any other URL, it can be bookmarked, sit in a pinned tab in your browser, etc.

To view the Custom Chart page, access the DB Console, click Advanced Debug In the left-hand navigation bar, and in the Reports section select Custom Time Series Chart.

Use the Custom Chart page

DB Console

On the Custom Chart page, you can set the time span for all charts, add new custom charts, and customize each chart:

  • To set the time span for the page, use the dropdown menu above the charts and select the desired time span. In addition, once you have selected a metric to display, you can drag within the chart itself to a set a custom time range.

  • To add a chart, click Add Chart and customize the new chart.

  • To customize each chart, use the Units dropdown menu to set the units to display. Then use the table below the chart to select the metrics being queried, and how they'll be combined and displayed. Options include:

    Column Description
    Metric Name How the system refers to this metric, e.g., sql.bytesin.
    Downsampler

    The "Downsampler" operation is used to combine the individual datapoints over the longer period into a single datapoint. We store one data point every ten seconds, but for queries over long time spans the backend lowers the resolution of the returned data, perhaps only returning one data point for every minute, five minutes, or even an entire hour in the case of the 30 day view.

    Options:

    • AVG: Returns the average value over the time period.
    • MIN: Returns the lowest value seen.
    • MAX: Returns the highest value seen.
    • SUM: Returns the sum of all values seen.

    Aggregator

    Used to combine data points from different nodes. It has the same operations available as the Downsampler.

    Options:

    • AVG: Returns the average value over the time period.
    • MIN: Returns the lowest value seen.
    • MAX: Returns the highest value seen.
    • SUM: Returns the sum of all values seen.

    Rate

    Determines how to display the rate of change during the selected time period.

    Options:

    • Normal: Returns the actual recorded value.
    • Rate: Returns the rate of change of the value per second.
    • Non-negative Rate: Returns the rate-of-change, but returns 0 instead of negative values. A large number of the stats we track are actually tracked as monotonically increasing counters so each sample is just the total value of that counter. The rate of change of that counter represents the rate of events being counted, which is usually what you want to graph. "Non-negative Rate" is needed because the counters are stored in memory, and thus if a node resets it goes back to zero (whereas normally they only increase).

    Source The set of nodes being queried, which is either:
    • The entire cluster.
    • A single, named node.
    Per Node If checked, the chart will show a line for each node's value of this metric.

Examples

Query user and system CPU usage

DB Console

To compare system vs. userspace CPU usage, select the following values under Metric Name:

  • sys.cpu.sys.percent
  • sys.cpu.user.percent

The Y-axis label is the Count. A count of 1 represents 100% utilization. The Aggregator of Sum can show the count to be above 1, which would mean CPU utilization is greater than 100%.

Checking Per Node displays statistics for each node, which could show whether an individual node's CPU usage was higher or lower than the average.

Essential Metrics to Monitor

For important metrics to visualize in a custom dashboard, refer to:

Available metrics

Note:

Some of the metrics listed below are already visible in other areas of the DB Console.

CockroachDB Metric Name Description Type Unit
addsstable.applications
Number of SSTable ingestions applied (i.e. applied by Replicas) COUNTER COUNT
addsstable.copies
number of SSTable ingestions that required copying files during application COUNTER COUNT
addsstable.proposals
Number of SSTable ingestions proposed (i.e. sent to Raft by lease holders) COUNTER COUNT
admission.io.overload
1-normalized float indicating whether IO admission control considers the store as overloaded with respect to compaction out of L0 (considers sub-level and file counts). GAUGE PERCENT
auth.cert.conn.latency
Latency to establish and authenticate a SQL connection using certificate HISTOGRAM NANOSECONDS
auth.gss.conn.latency
Latency to establish and authenticate a SQL connection using GSS HISTOGRAM NANOSECONDS
auth.jwt.conn.latency
Latency to establish and authenticate a SQL connection using JWT Token HISTOGRAM NANOSECONDS
auth.ldap.conn.latency
Latency to establish and authenticate a SQL connection using LDAP HISTOGRAM NANOSECONDS
auth.password.conn.latency
Latency to establish and authenticate a SQL connection using password HISTOGRAM NANOSECONDS
auth.scram.conn.latency
Latency to establish and authenticate a SQL connection using SCRAM HISTOGRAM NANOSECONDS
build.timestamp
Build information GAUGE TIMESTAMP_SEC
capacity
Total storage capacity GAUGE BYTES
capacity.available
Available storage capacity GAUGE BYTES
capacity.reserved
Capacity reserved for snapshots GAUGE BYTES
capacity.used
Used storage capacity GAUGE BYTES
changefeed.aggregator_progress
The earliest timestamp up to which any aggregator is guaranteed to have emitted all values for GAUGE TIMESTAMP_NS
changefeed.backfill_count
Number of changefeeds currently executing backfill GAUGE COUNT
changefeed.backfill_pending_ranges
Number of ranges in an ongoing backfill that are yet to be fully emitted GAUGE COUNT
changefeed.checkpoint_progress
The earliest timestamp of any changefeed's persisted checkpoint (values prior to this timestamp will never need to be re-emitted) GAUGE TIMESTAMP_NS
changefeed.commit_latency
Event commit latency: a difference between event MVCC timestamp and the time it was acknowledged by the downstream sink. If the sink batches events, then the difference between the oldest event in the batch and acknowledgement is recorded; Excludes latency during backfill HISTOGRAM NANOSECONDS
changefeed.emitted_bytes
Bytes emitted by all feeds COUNTER BYTES
changefeed.emitted_messages
Messages emitted by all feeds COUNTER COUNT
changefeed.error_retries
Total retryable errors encountered by all changefeeds COUNTER COUNT
changefeed.failures
Total number of changefeed jobs which have failed COUNTER COUNT
changefeed.lagging_ranges
The number of ranges considered to be lagging behind GAUGE COUNT
changefeed.max_behind_nanos
(Deprecated in favor of checkpoint_progress) The most any changefeed's persisted checkpoint is behind the present GAUGE NANOSECONDS
changefeed.message_size_hist
Message size histogram HISTOGRAM BYTES
changefeed.running
Number of currently running changefeeds, including sinkless GAUGE COUNT
clock-offset.meannanos
Mean clock offset with other nodes GAUGE NANOSECONDS
clock-offset.stddevnanos
Stddev clock offset with other nodes GAUGE NANOSECONDS
cluster.preserve-downgrade-option.last-updated
Unix timestamp of last updated time for cluster.preserve_downgrade_option GAUGE TIMESTAMP_SEC
distsender.batches
Number of batches processed COUNTER COUNT
distsender.batches.partial
Number of partial batches processed after being divided on range boundaries COUNTER COUNT
distsender.errors.notleaseholder
Number of NotLeaseHolderErrors encountered from replica-addressed RPCs COUNTER COUNT
distsender.rpc.sent
Number of replica-addressed RPCs sent COUNTER COUNT
distsender.rpc.sent.local
Number of replica-addressed RPCs sent through the local-server optimization COUNTER COUNT
distsender.rpc.sent.nextreplicaerror
Number of replica-addressed RPCs sent due to per-replica errors COUNTER COUNT
exec.error
Number of batch KV requests that failed to execute on this node.

This count excludes transaction restart/abort errors. However, it will include other errors expected during normal operation, such as ConditionFailedError. This metric is thus not an indicator of KV health.

COUNTER COUNT
exec.latency
Latency of batch KV requests (including errors) executed on this node.

This measures requests already addressed to a single replica, from the moment at which they arrive at the internal gRPC endpoint to the moment at which the response (or an error) is returned.

This latency includes in particular commit waits, conflict resolution and replication, and end-users can easily produce high measurements via long-running transactions that conflict with foreground traffic. This metric thus does not provide a good signal for understanding the health of the KV layer.

HISTOGRAM NANOSECONDS
exec.success
Number of batch KV requests executed successfully on this node.

A request is considered to have executed 'successfully' if it either returns a result or a transaction restart/abort error.

COUNTER COUNT
gcbytesage
Cumulative age of non-live data GAUGE SECONDS
gossip.bytes.received
Number of received gossip bytes COUNTER BYTES
gossip.bytes.sent
Number of sent gossip bytes COUNTER BYTES
gossip.connections.incoming
Number of active incoming gossip connections GAUGE COUNT
gossip.connections.outgoing
Number of active outgoing gossip connections GAUGE COUNT
gossip.connections.refused
Number of refused incoming gossip connections COUNTER COUNT
gossip.infos.received
Number of received gossip Info objects COUNTER COUNT
gossip.infos.sent
Number of sent gossip Info objects COUNTER COUNT
intentage
Cumulative age of locks GAUGE SECONDS
intentbytes
Number of bytes in intent KV pairs GAUGE BYTES
intentcount
Count of intent keys GAUGE COUNT
jobs.auto_config_env_runner.currently_paused
Number of auto_config_env_runner jobs currently considered Paused GAUGE COUNT
jobs.auto_config_env_runner.protected_age_sec
The age of the oldest PTS record protected by auto_config_env_runner jobs GAUGE SECONDS
jobs.auto_config_env_runner.protected_record_count
Number of protected timestamp records held by auto_config_env_runner jobs GAUGE COUNT
jobs.auto_config_runner.currently_paused
Number of auto_config_runner jobs currently considered Paused GAUGE COUNT
jobs.auto_config_runner.protected_age_sec
The age of the oldest PTS record protected by auto_config_runner jobs GAUGE SECONDS
jobs.auto_config_runner.protected_record_count
Number of protected timestamp records held by auto_config_runner jobs GAUGE COUNT
jobs.auto_config_task.currently_paused
Number of auto_config_task jobs currently considered Paused GAUGE COUNT
jobs.auto_config_task.protected_age_sec
The age of the oldest PTS record protected by auto_config_task jobs GAUGE SECONDS
jobs.auto_config_task.protected_record_count
Number of protected timestamp records held by auto_config_task jobs GAUGE COUNT
jobs.auto_create_partial_stats.currently_paused
Number of auto_create_partial_stats jobs currently considered Paused GAUGE COUNT
jobs.auto_create_partial_stats.protected_age_sec
The age of the oldest PTS record protected by auto_create_partial_stats jobs GAUGE SECONDS
jobs.auto_create_partial_stats.protected_record_count
Number of protected timestamp records held by auto_create_partial_stats jobs GAUGE COUNT
jobs.auto_create_stats.currently_paused
Number of auto_create_stats jobs currently considered Paused GAUGE COUNT
jobs.auto_create_stats.currently_paused
Number of auto_create_stats jobs currently considered Paused GAUGE COUNT
jobs.auto_create_stats.currently_running
Number of auto_create_stats jobs currently running in Resume or OnFailOrCancel state GAUGE COUNT
jobs.auto_create_stats.protected_age_sec
The age of the oldest PTS record protected by auto_create_stats jobs GAUGE SECONDS
jobs.auto_create_stats.protected_record_count
Number of protected timestamp records held by auto_create_stats jobs GAUGE COUNT
jobs.auto_create_stats.resume_failed
Number of auto_create_stats jobs which failed with a non-retriable error COUNTER COUNT
jobs.auto_schema_telemetry.currently_paused
Number of auto_schema_telemetry jobs currently considered Paused GAUGE COUNT
jobs.auto_schema_telemetry.protected_age_sec
The age of the oldest PTS record protected by auto_schema_telemetry jobs GAUGE SECONDS
jobs.auto_schema_telemetry.protected_record_count
Number of protected timestamp records held by auto_schema_telemetry jobs GAUGE COUNT
jobs.auto_span_config_reconciliation.currently_paused
Number of auto_span_config_reconciliation jobs currently considered Paused GAUGE COUNT
jobs.auto_span_config_reconciliation.protected_age_sec
The age of the oldest PTS record protected by auto_span_config_reconciliation jobs GAUGE SECONDS
jobs.auto_span_config_reconciliation.protected_record_count
Number of protected timestamp records held by auto_span_config_reconciliation jobs GAUGE COUNT
jobs.auto_sql_stats_compaction.currently_paused
Number of auto_sql_stats_compaction jobs currently considered Paused GAUGE COUNT
jobs.auto_sql_stats_compaction.protected_age_sec
The age of the oldest PTS record protected by auto_sql_stats_compaction jobs GAUGE SECONDS
jobs.auto_sql_stats_compaction.protected_record_count
Number of protected timestamp records held by auto_sql_stats_compaction jobs GAUGE COUNT
jobs.auto_update_sql_activity.currently_paused
Number of auto_update_sql_activity jobs currently considered Paused GAUGE COUNT
jobs.auto_update_sql_activity.protected_age_sec
The age of the oldest PTS record protected by auto_update_sql_activity jobs GAUGE SECONDS
jobs.auto_update_sql_activity.protected_record_count
Number of protected timestamp records held by auto_update_sql_activity jobs GAUGE COUNT
jobs.backup.currently_paused
Number of backup jobs currently considered Paused GAUGE COUNT
jobs.backup.currently_paused
Number of backup jobs currently considered Paused GAUGE COUNT
jobs.backup.currently_running
Number of backup jobs currently running in Resume or OnFailOrCancel state GAUGE COUNT
jobs.backup.protected_age_sec
The age of the oldest PTS record protected by backup jobs GAUGE SECONDS
jobs.backup.protected_record_count
Number of protected timestamp records held by backup jobs GAUGE COUNT
jobs.changefeed.currently_paused
Number of changefeed jobs currently considered Paused GAUGE COUNT
jobs.changefeed.currently_paused
Number of changefeed jobs currently considered Paused GAUGE COUNT
jobs.changefeed.expired_pts_records
Number of expired protected timestamp records owned by changefeed jobs COUNTER COUNT
jobs.changefeed.protected_age_sec
The age of the oldest PTS record protected by changefeed jobs GAUGE SECONDS
jobs.changefeed.protected_age_sec
The age of the oldest PTS record protected by changefeed jobs GAUGE SECONDS
jobs.changefeed.protected_record_count
Number of protected timestamp records held by changefeed jobs GAUGE COUNT
jobs.changefeed.resume_retry_error
Number of changefeed jobs which failed with a retriable error COUNTER COUNT
jobs.create_stats.currently_paused
Number of create_stats jobs currently considered Paused GAUGE COUNT
jobs.create_stats.currently_running
Number of create_stats jobs currently running in Resume or OnFailOrCancel state GAUGE COUNT
jobs.create_stats.protected_age_sec
The age of the oldest PTS record protected by create_stats jobs GAUGE SECONDS
jobs.create_stats.protected_record_count
Number of protected timestamp records held by create_stats jobs GAUGE COUNT
jobs.history_retention.currently_paused
Number of history_retention jobs currently considered Paused GAUGE COUNT
jobs.history_retention.protected_age_sec
The age of the oldest PTS record protected by history_retention jobs GAUGE SECONDS
jobs.history_retention.protected_record_count
Number of protected timestamp records held by history_retention jobs GAUGE COUNT
jobs.import.currently_paused
Number of import jobs currently considered Paused GAUGE COUNT
jobs.import.protected_age_sec
The age of the oldest PTS record protected by import jobs GAUGE SECONDS
jobs.import.protected_record_count
Number of protected timestamp records held by import jobs GAUGE COUNT
jobs.import_rollback.currently_paused
Number of import_rollback jobs currently considered Paused GAUGE COUNT
jobs.import_rollback.protected_age_sec
The age of the oldest PTS record protected by import_rollback jobs GAUGE SECONDS
jobs.import_rollback.protected_record_count
Number of protected timestamp records held by import_rollback jobs GAUGE COUNT
jobs.key_visualizer.currently_paused
Number of key_visualizer jobs currently considered Paused GAUGE COUNT
jobs.key_visualizer.protected_age_sec
The age of the oldest PTS record protected by key_visualizer jobs GAUGE SECONDS
jobs.key_visualizer.protected_record_count
Number of protected timestamp records held by key_visualizer jobs GAUGE COUNT
jobs.logical_replication.currently_paused
Number of logical_replication jobs currently considered Paused GAUGE COUNT
jobs.logical_replication.protected_age_sec
The age of the oldest PTS record protected by logical_replication jobs GAUGE SECONDS
jobs.logical_replication.protected_record_count
Number of protected timestamp records held by logical_replication jobs GAUGE COUNT
jobs.migration.currently_paused
Number of migration jobs currently considered Paused GAUGE COUNT
jobs.migration.protected_age_sec
The age of the oldest PTS record protected by migration jobs GAUGE SECONDS
jobs.migration.protected_record_count
Number of protected timestamp records held by migration jobs GAUGE COUNT
jobs.mvcc_statistics_update.currently_paused
Number of mvcc_statistics_update jobs currently considered Paused GAUGE COUNT
jobs.mvcc_statistics_update.protected_age_sec
The age of the oldest PTS record protected by mvcc_statistics_update jobs GAUGE SECONDS
jobs.mvcc_statistics_update.protected_record_count
Number of protected timestamp records held by mvcc_statistics_update jobs GAUGE COUNT
jobs.new_schema_change.currently_paused
Number of new_schema_change jobs currently considered Paused GAUGE COUNT
jobs.new_schema_change.protected_age_sec
The age of the oldest PTS record protected by new_schema_change jobs GAUGE SECONDS
jobs.new_schema_change.protected_record_count
Number of protected timestamp records held by new_schema_change jobs GAUGE COUNT
jobs.poll_jobs_stats.currently_paused
Number of poll_jobs_stats jobs currently considered Paused GAUGE COUNT
jobs.poll_jobs_stats.protected_age_sec
The age of the oldest PTS record protected by poll_jobs_stats jobs GAUGE SECONDS
jobs.poll_jobs_stats.protected_record_count
Number of protected timestamp records held by poll_jobs_stats jobs GAUGE COUNT
jobs.replication_stream_ingestion.currently_paused
Number of replication_stream_ingestion jobs currently considered Paused GAUGE COUNT
jobs.replication_stream_ingestion.protected_age_sec
The age of the oldest PTS record protected by replication_stream_ingestion jobs GAUGE SECONDS
jobs.replication_stream_ingestion.protected_record_count
Number of protected timestamp records held by replication_stream_ingestion jobs GAUGE COUNT
jobs.replication_stream_producer.currently_paused
Number of replication_stream_producer jobs currently considered Paused GAUGE COUNT
jobs.replication_stream_producer.protected_age_sec
The age of the oldest PTS record protected by replication_stream_producer jobs GAUGE SECONDS
jobs.replication_stream_producer.protected_record_count
Number of protected timestamp records held by replication_stream_producer jobs GAUGE COUNT
jobs.restore.currently_paused
Number of restore jobs currently considered Paused GAUGE COUNT
jobs.restore.protected_age_sec
The age of the oldest PTS record protected by restore jobs GAUGE SECONDS
jobs.restore.protected_record_count
Number of protected timestamp records held by restore jobs GAUGE COUNT
jobs.row_level_ttl.currently_paused
Number of row_level_ttl jobs currently considered Paused GAUGE COUNT
jobs.row_level_ttl.currently_paused
Number of row_level_ttl jobs currently considered Paused GAUGE COUNT
jobs.row_level_ttl.currently_running
Number of row_level_ttl jobs currently running in Resume or OnFailOrCancel state GAUGE COUNT
jobs.row_level_ttl.delete_duration
Duration for delete requests during row level TTL. HISTOGRAM NANOSECONDS
jobs.row_level_ttl.num_active_spans
Number of active spans the TTL job is deleting from. GAUGE COUNT
jobs.row_level_ttl.protected_age_sec
The age of the oldest PTS record protected by row_level_ttl jobs GAUGE SECONDS
jobs.row_level_ttl.protected_record_count
Number of protected timestamp records held by row_level_ttl jobs GAUGE COUNT
jobs.row_level_ttl.resume_completed
Number of row_level_ttl jobs which successfully resumed to completion COUNTER COUNT
jobs.row_level_ttl.resume_failed
Number of row_level_ttl jobs which failed with a non-retriable error COUNTER COUNT
jobs.row_level_ttl.rows_deleted
Number of rows deleted by the row level TTL job. COUNTER COUNT
jobs.row_level_ttl.rows_selected
Number of rows selected for deletion by the row level TTL job. COUNTER COUNT
jobs.row_level_ttl.select_duration
Duration for select requests during row level TTL. HISTOGRAM NANOSECONDS
jobs.row_level_ttl.span_total_duration
Duration for processing a span during row level TTL. HISTOGRAM NANOSECONDS
jobs.row_level_ttl.total_expired_rows
Approximate number of rows that have expired the TTL on the TTL table. GAUGE COUNT
jobs.row_level_ttl.total_rows
Approximate number of rows on the TTL table. GAUGE COUNT
jobs.schema_change.currently_paused
Number of schema_change jobs currently considered Paused GAUGE COUNT
jobs.schema_change.protected_age_sec
The age of the oldest PTS record protected by schema_change jobs GAUGE SECONDS
jobs.schema_change.protected_record_count
Number of protected timestamp records held by schema_change jobs GAUGE COUNT
jobs.schema_change_gc.currently_paused
Number of schema_change_gc jobs currently considered Paused GAUGE COUNT
jobs.schema_change_gc.protected_age_sec
The age of the oldest PTS record protected by schema_change_gc jobs GAUGE SECONDS
jobs.schema_change_gc.protected_record_count
Number of protected timestamp records held by schema_change_gc jobs GAUGE COUNT
jobs.standby_read_ts_poller.currently_paused
Number of standby_read_ts_poller jobs currently considered Paused GAUGE COUNT
jobs.standby_read_ts_poller.protected_age_sec
The age of the oldest PTS record protected by standby_read_ts_poller jobs GAUGE SECONDS
jobs.standby_read_ts_poller.protected_record_count
Number of protected timestamp records held by standby_read_ts_poller jobs GAUGE COUNT
jobs.typedesc_schema_change.currently_paused
Number of typedesc_schema_change jobs currently considered Paused GAUGE COUNT
jobs.typedesc_schema_change.protected_age_sec
The age of the oldest PTS record protected by typedesc_schema_change jobs GAUGE SECONDS
jobs.typedesc_schema_change.protected_record_count
Number of protected timestamp records held by typedesc_schema_change jobs GAUGE COUNT
jobs.update_table_metadata_cache.currently_paused
Number of update_table_metadata_cache jobs currently considered Paused GAUGE COUNT
jobs.update_table_metadata_cache.protected_age_sec
The age of the oldest PTS record protected by update_table_metadata_cache jobs GAUGE SECONDS
jobs.update_table_metadata_cache.protected_record_count
Number of protected timestamp records held by update_table_metadata_cache jobs GAUGE COUNT
keybytes
Number of bytes taken up by keys GAUGE BYTES
keycount
Count of all keys GAUGE COUNT
leases.epoch
Number of replica leaseholders using epoch-based leases GAUGE COUNT
leases.error
Number of failed lease requests COUNTER COUNT
leases.expiration
Number of replica leaseholders using expiration-based leases GAUGE COUNT
leases.success
Number of successful lease requests COUNTER COUNT
leases.transfers.error
Number of failed lease transfers COUNTER COUNT
leases.transfers.success
Number of successful lease transfers COUNTER COUNT
livebytes
Number of bytes of live data (keys plus values) GAUGE BYTES
livecount
Count of live keys GAUGE COUNT
liveness.epochincrements
Number of times this node has incremented its liveness epoch COUNTER COUNT
liveness.heartbeatfailures
Number of failed node liveness heartbeats from this node COUNTER COUNT
liveness.heartbeatlatency
Node liveness heartbeat latency HISTOGRAM NANOSECONDS
liveness.heartbeatsuccesses
Number of successful node liveness heartbeats from this node COUNTER COUNT
liveness.livenodes
Number of live nodes in the cluster (will be 0 if this node is not itself live) GAUGE COUNT
node-id
node ID with labels for advertised RPC and HTTP addresses GAUGE CONST
physical_replication.logical_bytes
Logical bytes (sum of keys + values) ingested by all replication jobs COUNTER BYTES
physical_replication.replicated_time_seconds
The replicated time of the physical replication stream in seconds since the unix epoch. GAUGE SECONDS
queue.consistency.pending
Number of pending replicas in the consistency checker queue GAUGE COUNT
queue.consistency.process.failure
Number of replicas which failed processing in the consistency checker queue COUNTER COUNT
queue.consistency.process.success
Number of replicas successfully processed by the consistency checker queue COUNTER COUNT
queue.consistency.processingnanos
Nanoseconds spent processing replicas in the consistency checker queue COUNTER NANOSECONDS
queue.gc.info.abortspanconsidered
Number of AbortSpan entries old enough to be considered for removal COUNTER COUNT
queue.gc.info.abortspangcnum
Number of AbortSpan entries fit for removal COUNTER COUNT
queue.gc.info.abortspanscanned
Number of transactions present in the AbortSpan scanned from the engine COUNTER COUNT
queue.gc.info.clearrangefailed
Number of failed ClearRange operations during GC COUNTER COUNT
queue.gc.info.clearrangesuccess
Number of successful ClearRange operations during GC COUNTER COUNT
queue.gc.info.intentsconsidered
Number of 'old' intents COUNTER COUNT
queue.gc.info.intenttxns
Number of associated distinct transactions COUNTER COUNT
queue.gc.info.numkeysaffected
Number of keys with GC'able data COUNTER COUNT
queue.gc.info.pushtxn
Number of attempted pushes COUNTER COUNT
queue.gc.info.resolvesuccess
Number of successful intent resolutions COUNTER COUNT
queue.gc.info.resolvetotal
Number of attempted intent resolutions COUNTER COUNT
queue.gc.info.transactionspangcaborted
Number of GC'able entries corresponding to aborted txns COUNTER COUNT
queue.gc.info.transactionspangccommitted
Number of GC'able entries corresponding to committed txns COUNTER COUNT
queue.gc.info.transactionspangcpending
Number of GC'able entries corresponding to pending txns COUNTER COUNT
queue.gc.info.transactionspanscanned
Number of entries in transaction spans scanned from the engine COUNTER COUNT
queue.gc.pending
Number of pending replicas in the MVCC GC queue GAUGE COUNT
queue.gc.process.failure
Number of replicas which failed processing in the MVCC GC queue COUNTER COUNT
queue.gc.process.success
Number of replicas successfully processed by the MVCC GC queue COUNTER COUNT
queue.gc.processingnanos
Nanoseconds spent processing replicas in the MVCC GC queue COUNTER NANOSECONDS
queue.raftlog.pending
Number of pending replicas in the Raft log queue GAUGE COUNT
queue.raftlog.process.failure
Number of replicas which failed processing in the Raft log queue COUNTER COUNT
queue.raftlog.process.success
Number of replicas successfully processed by the Raft log queue COUNTER COUNT
queue.raftlog.processingnanos
Nanoseconds spent processing replicas in the Raft log queue COUNTER NANOSECONDS
queue.raftsnapshot.pending
Number of pending replicas in the Raft repair queue GAUGE COUNT
queue.raftsnapshot.process.failure
Number of replicas which failed processing in the Raft repair queue COUNTER COUNT
queue.raftsnapshot.process.success
Number of replicas successfully processed by the Raft repair queue COUNTER COUNT
queue.raftsnapshot.processingnanos
Nanoseconds spent processing replicas in the Raft repair queue COUNTER NANOSECONDS
queue.replicagc.pending
Number of pending replicas in the replica GC queue GAUGE COUNT
queue.replicagc.process.failure
Number of replicas which failed processing in the replica GC queue COUNTER COUNT
queue.replicagc.process.success
Number of replicas successfully processed by the replica GC queue COUNTER COUNT
queue.replicagc.processingnanos
Nanoseconds spent processing replicas in the replica GC queue COUNTER NANOSECONDS
queue.replicagc.removereplica
Number of replica removals attempted by the replica GC queue COUNTER COUNT
queue.replicate.addreplica
Number of replica additions attempted by the replicate queue COUNTER COUNT
queue.replicate.addreplica.error
Number of failed replica additions processed by the replicate queue COUNTER COUNT
queue.replicate.addreplica.success
Number of successful replica additions processed by the replicate queue COUNTER COUNT
queue.replicate.pending
Number of pending replicas in the replicate queue GAUGE COUNT
queue.replicate.process.failure
Number of replicas which failed processing in the replicate queue COUNTER COUNT
queue.replicate.process.success
Number of replicas successfully processed by the replicate queue COUNTER COUNT
queue.replicate.processingnanos
Nanoseconds spent processing replicas in the replicate queue COUNTER NANOSECONDS
queue.replicate.purgatory
Number of replicas in the replicate queue's purgatory, awaiting allocation options GAUGE COUNT
queue.replicate.rebalancereplica
Number of replica rebalancer-initiated additions attempted by the replicate queue COUNTER COUNT
queue.replicate.removedeadreplica
Number of dead replica removals attempted by the replicate queue (typically in response to a node outage) COUNTER COUNT
queue.replicate.removedeadreplica.error
Number of failed dead replica removals processed by the replicate queue COUNTER COUNT
queue.replicate.removedeadreplica.success
Number of successful dead replica removals processed by the replicate queue COUNTER COUNT
queue.replicate.removedecommissioningreplica.error
Number of failed decommissioning replica removals processed by the replicate queue COUNTER COUNT
queue.replicate.removedecommissioningreplica.success
Number of successful decommissioning replica removals processed by the replicate queue COUNTER COUNT
queue.replicate.removereplica
Number of replica removals attempted by the replicate queue (typically in response to a rebalancer-initiated addition) COUNTER COUNT
queue.replicate.removereplica.error
Number of failed replica removals processed by the replicate queue COUNTER COUNT
queue.replicate.removereplica.success
Number of successful replica removals processed by the replicate queue COUNTER COUNT
queue.replicate.replacedeadreplica.error
Number of failed dead replica replacements processed by the replicate queue COUNTER COUNT
queue.replicate.replacedeadreplica.success
Number of successful dead replica replacements processed by the replicate queue COUNTER COUNT
queue.replicate.replacedecommissioningreplica.error
Number of failed decommissioning replica replacements processed by the replicate queue COUNTER COUNT
queue.replicate.replacedecommissioningreplica.success
Number of successful decommissioning replica replacements processed by the replicate queue COUNTER COUNT
queue.replicate.transferlease
Number of range lease transfers attempted by the replicate queue COUNTER COUNT
queue.split.pending
Number of pending replicas in the split queue GAUGE COUNT
queue.split.process.failure
Number of replicas which failed processing in the split queue COUNTER COUNT
queue.split.process.success
Number of replicas successfully processed by the split queue COUNTER COUNT
queue.split.processingnanos
Nanoseconds spent processing replicas in the split queue COUNTER NANOSECONDS
queue.tsmaintenance.pending
Number of pending replicas in the time series maintenance queue GAUGE COUNT
queue.tsmaintenance.process.failure
Number of replicas which failed processing in the time series maintenance queue COUNTER COUNT
queue.tsmaintenance.process.success
Number of replicas successfully processed by the time series maintenance queue COUNTER COUNT
queue.tsmaintenance.processingnanos
Nanoseconds spent processing replicas in the time series maintenance queue COUNTER NANOSECONDS
raft.commandsapplied
Number of Raft commands applied.

This measurement is taken on the Raft apply loops of all Replicas (leaders and followers alike), meaning that it does not measure the number of Raft commands proposed (in the hypothetical extreme case, all Replicas may apply all commands through snapshots, thus not increasing this metric at all). Instead, it is a proxy for how much work is being done advancing the Replica state machines on this node.

COUNTER COUNT
raft.heartbeats.pending
Number of pending heartbeats and responses waiting to be coalesced GAUGE COUNT
raft.process.commandcommit.latency
Latency histogram for applying a batch of Raft commands to the state machine.

This metric is misnamed: it measures the latency for applying a batch of committed Raft commands to a Replica state machine. This requires only non-durable I/O (except for replication configuration changes).

Note that a "batch" in this context is really a sub-batch of the batch received for application during raft ready handling. The 'raft.process.applycommitted.latency' histogram is likely more suitable in most cases, as it measures the total latency across all sub-batches (i.e. the sum of commandcommit.latency for a complete batch).

HISTOGRAM NANOSECONDS
raft.process.logcommit.latency
Latency histogram for committing Raft log entries to stable storage

This measures the latency of durably committing a group of newly received Raft entries as well as the HardState entry to disk. This excludes any data processing, i.e. we measure purely the commit latency of the resulting Engine write. Homogeneous bands of p50-p99 latencies (in the presence of regular Raft traffic), make it likely that the storage layer is healthy. Spikes in the latency bands can either hint at the presence of large sets of Raft entries being received, or at performance issues at the storage layer.

HISTOGRAM NANOSECONDS
raft.process.tickingnanos
Nanoseconds spent in store.processRaft() processing replica.Tick() COUNTER NANOSECONDS
raft.process.workingnanos
Nanoseconds spent in store.processRaft() working.

This is the sum of the measurements passed to the raft.process.handleready.latency histogram.

COUNTER NANOSECONDS
raft.rcvd.app
Number of MsgApp messages received by this store COUNTER COUNT
raft.rcvd.appresp
Number of MsgAppResp messages received by this store COUNTER COUNT
raft.rcvd.dropped
Number of incoming Raft messages dropped (due to queue length or size) COUNTER COUNT
raft.rcvd.heartbeat
Number of (coalesced, if enabled) MsgHeartbeat messages received by this store COUNTER COUNT
raft.rcvd.heartbeatresp
Number of (coalesced, if enabled) MsgHeartbeatResp messages received by this store COUNTER COUNT
raft.rcvd.prevote
Number of MsgPreVote messages received by this store COUNTER COUNT
raft.rcvd.prevoteresp
Number of MsgPreVoteResp messages received by this store COUNTER COUNT
raft.rcvd.prop
Number of MsgProp messages received by this store COUNTER COUNT
raft.rcvd.snap
Number of MsgSnap messages received by this store COUNTER COUNT
raft.rcvd.timeoutnow
Number of MsgTimeoutNow messages received by this store COUNTER COUNT
raft.rcvd.transferleader
Number of MsgTransferLeader messages received by this store COUNTER COUNT
raft.rcvd.vote
Number of MsgVote messages received by this store COUNTER COUNT
raft.rcvd.voteresp
Number of MsgVoteResp messages received by this store COUNTER COUNT
raft.ticks
Number of Raft ticks queued COUNTER COUNT
raftlog.behind
Number of Raft log entries followers on other stores are behind.

This gauge provides a view of the aggregate number of log entries the Raft leaders on this node think the followers are behind. Since a raft leader may not always have a good estimate for this information for all of its followers, and since followers are expected to be behind (when they are not required as part of a quorum) and the aggregate thus scales like the count of such followers, it is difficult to meaningfully interpret this metric.

GAUGE COUNT
raftlog.truncated
Number of Raft log entries truncated COUNTER COUNT
range.adds
Number of range additions COUNTER COUNT
range.merges
Number of range merges COUNTER COUNT
range.raftleadertransfers
Number of raft leader transfers COUNTER COUNT
range.removes
Number of range removals COUNTER COUNT
range.snapshots.generated
Number of generated snapshots COUNTER COUNT
range.snapshots.rcvd-bytes
Number of snapshot bytes received COUNTER BYTES
range.snapshots.rebalancing.rcvd-bytes
Number of rebalancing snapshot bytes received COUNTER BYTES
range.snapshots.rebalancing.sent-bytes
Number of rebalancing snapshot bytes sent COUNTER BYTES
range.snapshots.recovery.rcvd-bytes
Number of raft recovery snapshot bytes received COUNTER BYTES
range.snapshots.recovery.sent-bytes
Number of raft recovery snapshot bytes sent COUNTER BYTES
range.snapshots.recv-in-progress
Number of non-empty snapshots being received GAUGE COUNT
range.snapshots.recv-queue
Number of snapshots queued to receive GAUGE COUNT
range.snapshots.recv-total-in-progress
Number of total snapshots being received GAUGE COUNT
range.snapshots.send-in-progress
Number of non-empty snapshots being sent GAUGE COUNT
range.snapshots.send-queue
Number of snapshots queued to send GAUGE COUNT
range.snapshots.send-total-in-progress
Number of total snapshots being sent GAUGE COUNT
range.snapshots.sent-bytes
Number of snapshot bytes sent COUNTER BYTES
range.snapshots.unknown.rcvd-bytes
Number of unknown snapshot bytes received COUNTER BYTES
range.snapshots.unknown.sent-bytes
Number of unknown snapshot bytes sent COUNTER BYTES
range.splits
Number of range splits COUNTER COUNT
rangekeybytes
Number of bytes taken up by range keys (e.g. MVCC range tombstones) GAUGE BYTES
rangekeycount
Count of all range keys (e.g. MVCC range tombstones) GAUGE COUNT
ranges
Number of ranges GAUGE COUNT
ranges.overreplicated
Number of ranges with more live replicas than the replication target GAUGE COUNT
ranges.unavailable
Number of ranges with fewer live replicas than needed for quorum GAUGE COUNT
ranges.underreplicated
Number of ranges with fewer live replicas than the replication target GAUGE COUNT
rangevalbytes
Number of bytes taken up by range key values (e.g. MVCC range tombstones) GAUGE BYTES
rangevalcount
Count of all range key values (e.g. MVCC range tombstones) GAUGE COUNT
rebalancing.queriespersecond
Number of kv-level requests received per second by the store, considering the last 30 minutes, as used in rebalancing decisions. GAUGE COUNT
rebalancing.readbytespersecond
Number of bytes read recently per second, considering the last 30 minutes. GAUGE BYTES
rebalancing.readspersecond
Number of keys read recently per second, considering the last 30 minutes. GAUGE COUNT
rebalancing.requestspersecond
Number of requests received recently per second, considering the last 30 minutes. GAUGE COUNT
rebalancing.writebytespersecond
Number of bytes written recently per second, considering the last 30 minutes. GAUGE BYTES
rebalancing.writespersecond
Number of keys written (i.e. applied by raft) per second to the store, considering the last 30 minutes. GAUGE COUNT
replicas
Number of replicas GAUGE COUNT
replicas.leaders
Number of raft leaders GAUGE COUNT
replicas.leaders_invalid_lease
Number of replicas that are Raft leaders whose lease is invalid GAUGE COUNT
replicas.leaders_not_leaseholders
Number of replicas that are Raft leaders whose range lease is held by another store GAUGE COUNT
replicas.leaseholders
Number of lease holders GAUGE COUNT
replicas.quiescent
Number of quiesced replicas GAUGE COUNT
replicas.reserved
Number of replicas reserved for snapshots GAUGE COUNT
requests.backpressure.split
Number of backpressured writes waiting on a Range split.

A Range will backpressure (roughly) non-system traffic when the range is above the configured size until the range splits. When the rate of this metric is nonzero over extended periods of time, it should be investigated why splits are not occurring.

GAUGE COUNT
requests.slow.distsender
Number of range-bound RPCs currently stuck or retrying for a long time.

Note that this is not a good signal for KV health. The remote side of the RPCs tracked here may experience contention, so an end user can easily cause values for this metric to be emitted by leaving a transaction open for a long time and contending with it using a second transaction.

GAUGE COUNT
requests.slow.lease
Number of requests that have been stuck for a long time acquiring a lease.

This gauge registering a nonzero value usually indicates range or replica unavailability, and should be investigated. In the common case, we also expect to see 'requests.slow.raft' to register a nonzero value, indicating that the lease requests are not getting a timely response from the replication layer.

GAUGE COUNT
requests.slow.raft
Number of requests that have been stuck for a long time in the replication layer.

An (evaluated) request has to pass through the replication layer, notably the quota pool and raft. If it fails to do so within a highly permissive duration, the gauge is incremented (and decremented again once the request is either applied or returns an error).

A nonzero value indicates range or replica unavailability, and should be investigated.

GAUGE COUNT
rocksdb.block.cache.hits
Count of block cache hits COUNTER COUNT
rocksdb.block.cache.misses
Count of block cache misses COUNTER COUNT
rocksdb.block.cache.usage
Bytes used by the block cache GAUGE BYTES
rocksdb.bloom.filter.prefix.checked
Number of times the bloom filter was checked COUNTER COUNT
rocksdb.bloom.filter.prefix.useful
Number of times the bloom filter helped avoid iterator creation COUNTER COUNT
rocksdb.compactions
Number of table compactions COUNTER COUNT
rocksdb.flushes
Number of table flushes COUNTER COUNT
rocksdb.memtable.total-size
Current size of memtable in bytes GAUGE BYTES
rocksdb.num-sstables
Number of storage engine SSTables GAUGE COUNT
rocksdb.read-amplification
Number of disk reads per query GAUGE COUNT
rocksdb.table-readers-mem-estimate
Memory used by index and filter blocks GAUGE BYTES
round-trip-latency
Distribution of round-trip latencies with other nodes.

This only reflects successful heartbeats and measures gRPC overhead as well as possible head-of-line blocking. Elevated values in this metric may hint at network issues and/or saturation, but they are no proof of them. CPU overload can similarly elevate this metric. The operator should look towards OS-level metrics such as packet loss, retransmits, etc, to conclusively diagnose network issues. Heartbeats are not very frequent (~seconds), so they may not capture rare or short-lived degradations.

HISTOGRAM NANOSECONDS
rpc.connection.avg_round_trip_latency
Sum of exponentially weighted moving average of round-trip latencies, as measured through a gRPC RPC.

Dividing this Gauge by rpc.connection.healthy gives an approximation of average latency, but the top-level round-trip-latency histogram is more useful. Instead, users should consult the label families of this metric if they are available (which requires prometheus and the cluster setting 'server.child_metrics.enabled'); these provide per-peer moving averages.

This metric does not track failed connection. A failed connection's contribution is reset to zero.

GAUGE NANOSECONDS
rpc.connection.failures
Counter of failed connections.

This includes both the event in which a healthy connection terminates as well as unsuccessful reconnection attempts.

Connections that are terminated as part of local node shutdown are excluded. Decommissioned peers are excluded.

COUNTER COUNT
rpc.connection.healthy
Gauge of current connections in a healthy state (i.e. bidirectionally connected and heartbeating) GAUGE COUNT
rpc.connection.healthy_nanos
Gauge of nanoseconds of healthy connection time

On the prometheus endpoint scraped with the cluster setting 'server.child_metrics.enabled' set, the constituent parts of this metric are available on a per-peer basis and one can read off for how long a given peer has been connected

GAUGE NANOSECONDS
rpc.connection.heartbeats
Counter of successful heartbeats. COUNTER COUNT
rpc.connection.unhealthy
Gauge of current connections in an unhealthy state (not bidirectionally connected or heartbeating) GAUGE COUNT
rpc.connection.unhealthy_nanos
Gauge of nanoseconds of unhealthy connection time.

On the prometheus endpoint scraped with the cluster setting 'server.child_metrics.enabled' set, the constituent parts of this metric are available on a per-peer basis and one can read off for how long a given peer has been unreachable

GAUGE NANOSECONDS
schedules.BACKUP.failed
Number of BACKUP jobs failed COUNTER COUNT
schedules.BACKUP.last-completed-time
The unix timestamp of the most recently completed backup by a schedule specified as maintaining this metric GAUGE TIMESTAMP_SEC
schedules.BACKUP.protected_age_sec
The age of the oldest PTS record protected by BACKUP schedules GAUGE SECONDS
schedules.BACKUP.protected_record_count
Number of PTS records held by BACKUP schedules GAUGE COUNT
schedules.BACKUP.started
Number of BACKUP jobs started COUNTER COUNT
schedules.BACKUP.succeeded
Number of BACKUP jobs succeeded COUNTER COUNT
schedules.scheduled-row-level-ttl-executor.failed
Number of scheduled-row-level-ttl-executor jobs failed COUNTER COUNT
security.certificate.expiration.ca
Expiration for the CA certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.ca-client-tenant
Expiration for the Tenant Client CA certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.client
Minimum expiration for client certificates, labeled by SQL user. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.client-ca
Expiration for the client CA certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.client-tenant
Expiration for the Tenant Client certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.node
Expiration for the node certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.node-client
Expiration for the node's client certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.ui
Expiration for the UI certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.expiration.ui-ca
Expiration for the UI CA certificate. 0 means no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.ca
Seconds till expiration for the CA certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.ca-client-tenant
Seconds till expiration for the Tenant Client CA certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.client
Seconds till expiration for the client certificates, labeled by SQL user. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.client-ca
Seconds till expiration for the client CA certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.client-tenant
Seconds till expiration for the Tenant Client certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.node
Seconds till expiration for the node certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.node-client
Seconds till expiration for the node's client certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.ui
Seconds till expiration for the UI certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
security.certificate.ttl.ui-ca
Seconds till expiration for the UI CA certificate. 0 means expired, no certificate or error. GAUGE TIMESTAMP_SEC
sql.bytesin
Number of SQL bytes received COUNTER BYTES
sql.bytesout
Number of SQL bytes sent COUNTER BYTES
sql.conn.latency
Latency to establish and authenticate a SQL connection HISTOGRAM NANOSECONDS
sql.conns
Number of open SQL connections GAUGE COUNT
sql.crud_query.count
Number of SQL SELECT, INSERT, UPDATE, DELETE statements successfully executed COUNTER COUNT
sql.crud_query.started.count
Number of SQL SELECT, INSERT, UPDATE, DELETE statements started COUNTER COUNT
sql.ddl.count
Number of SQL DDL statements successfully executed COUNTER COUNT
sql.delete.count
Number of SQL DELETE statements successfully executed COUNTER COUNT
sql.distsql.contended_queries.count
Number of SQL queries that experienced contention COUNTER COUNT
sql.distsql.exec.latency
Latency of DistSQL statement execution HISTOGRAM NANOSECONDS
sql.distsql.flows.active
Number of distributed SQL flows currently active GAUGE COUNT
sql.distsql.flows.total
Number of distributed SQL flows executed COUNTER COUNT
sql.distsql.queries.active
Number of SQL queries currently active GAUGE COUNT
sql.distsql.queries.total
Number of SQL queries executed COUNTER COUNT
sql.distsql.select.count
Number of DistSQL SELECT statements COUNTER COUNT
sql.distsql.service.latency
Latency of DistSQL request execution HISTOGRAM NANOSECONDS
sql.exec.latency
Latency of SQL statement execution HISTOGRAM NANOSECONDS
sql.failure.count
Number of statements resulting in a planning or runtime error COUNTER COUNT
sql.full.scan.count
Number of full table or index scans COUNTER COUNT
sql.guardrails.max_row_size_err.count
Number of rows observed violating sql.guardrails.max_row_size_err COUNTER COUNT
sql.guardrails.max_row_size_log.count
Number of rows observed violating sql.guardrails.max_row_size_log COUNTER COUNT
sql.insert.count
Number of SQL INSERT statements successfully executed COUNTER COUNT
sql.mem.distsql.current
Current sql statement memory usage for distsql GAUGE BYTES
sql.mem.distsql.max
Memory usage per sql statement for distsql HISTOGRAM BYTES
sql.mem.internal.session.current
Current sql session memory usage for internal GAUGE BYTES
sql.mem.internal.session.max
Memory usage per sql session for internal HISTOGRAM BYTES
sql.mem.internal.txn.current
Current sql transaction memory usage for internal GAUGE BYTES
sql.mem.internal.txn.max
Memory usage per sql transaction for internal HISTOGRAM BYTES
sql.mem.root.current
Current sql statement memory usage for root GAUGE BYTES
sql.mem.root.max
Memory usage per sql statement for root HISTOGRAM BYTES
sql.misc.count
Number of other SQL statements successfully executed COUNTER COUNT
sql.new_conns
Number of SQL connections created COUNTER COUNT
sql.pgwire_cancel.ignored
Number of pgwire query cancel requests that were ignored due to rate limiting COUNTER COUNT
sql.pgwire_cancel.successful
Number of pgwire query cancel requests that were successful COUNTER COUNT
sql.pgwire_cancel.total
Number of pgwire query cancel requests COUNTER COUNT
sql.query.count
Number of SQL operations started including queries, and transaction control statements COUNTER COUNT
sql.select.count
Number of SQL SELECT statements successfully executed COUNTER COUNT
sql.service.latency
Latency of SQL request execution HISTOGRAM NANOSECONDS
sql.statements.active
Number of currently active user SQL statements GAUGE COUNT
sql.txn.abort.count
Number of SQL transaction abort errors COUNTER COUNT
sql.txn.begin.count
Number of SQL transaction BEGIN statements successfully executed COUNTER COUNT
sql.txn.commit.count
Number of SQL transaction COMMIT statements successfully executed COUNTER COUNT
sql.txn.contended.count
Number of SQL transactions experienced contention COUNTER COUNT
sql.txn.latency
Latency of SQL transactions HISTOGRAM NANOSECONDS
sql.txn.rollback.count
Number of SQL transaction ROLLBACK statements successfully executed COUNTER COUNT
sql.txns.open
Number of currently open user SQL transactions GAUGE COUNT
sql.update.count
Number of SQL UPDATE statements successfully executed COUNTER COUNT
storage.keys.range-key-set.count
Approximate count of RangeKeySet internal keys across the storage engine. GAUGE COUNT
storage.l0-level-score
Compaction score of level 0 GAUGE COUNT
storage.l0-level-size
Size of the SSTables in level 0 GAUGE BYTES
storage.l0-num-files
Number of SSTables in Level 0 GAUGE COUNT
storage.l0-sublevels
Number of Level 0 sublevels GAUGE COUNT
storage.l1-level-score
Compaction score of level 1 GAUGE COUNT
storage.l1-level-size
Size of the SSTables in level 1 GAUGE BYTES
storage.l2-level-score
Compaction score of level 2 GAUGE COUNT
storage.l2-level-size
Size of the SSTables in level 2 GAUGE BYTES
storage.l3-level-score
Compaction score of level 3 GAUGE COUNT
storage.l3-level-size
Size of the SSTables in level 3 GAUGE BYTES
storage.l4-level-score
Compaction score of level 4 GAUGE COUNT
storage.l4-level-size
Size of the SSTables in level 4 GAUGE BYTES
storage.l5-level-score
Compaction score of level 5 GAUGE COUNT
storage.l5-level-size
Size of the SSTables in level 5 GAUGE BYTES
storage.l6-level-score
Compaction score of level 6 GAUGE COUNT
storage.l6-level-size
Size of the SSTables in level 6 GAUGE BYTES
storage.marked-for-compaction-files
Count of SSTables marked for compaction GAUGE COUNT
storage.write-stalls
Number of instances of intentional write stalls to backpressure incoming writes GAUGE COUNT
sys.cgo.allocbytes
Current bytes of memory allocated by cgo GAUGE BYTES
sys.cgo.totalbytes
Total bytes of memory allocated by cgo, but not released GAUGE BYTES
sys.cgocalls
Total number of cgo calls COUNTER COUNT
sys.cpu.combined.percent-normalized
Current user+system cpu percentage consumed by the CRDB process, normalized 0-1 by number of cores GAUGE PERCENT
sys.cpu.host.combined.percent-normalized
Current user+system cpu percentage across the whole machine, normalized 0-1 by number of cores GAUGE PERCENT
sys.cpu.sys.ns
Total system cpu time consumed by the CRDB process COUNTER NANOSECONDS
sys.cpu.sys.percent
Current system cpu percentage consumed by the CRDB process GAUGE PERCENT
sys.cpu.user.ns
Total user cpu time consumed by the CRDB process COUNTER NANOSECONDS
sys.cpu.user.percent
Current user cpu percentage consumed by the CRDB process GAUGE PERCENT
sys.fd.open
Process open file descriptors GAUGE COUNT
sys.fd.softlimit
Process open FD soft limit GAUGE COUNT
sys.gc.count
Total number of GC runs COUNTER COUNT
sys.gc.pause.ns
Total GC pause COUNTER NANOSECONDS
sys.gc.pause.percent
Current GC pause percentage GAUGE PERCENT
sys.go.allocbytes
Current bytes of memory allocated by go GAUGE BYTES
sys.go.totalbytes
Total bytes of memory allocated by go, but not released GAUGE BYTES
sys.goroutines
Current number of goroutines GAUGE COUNT
sys.host.disk.iopsinprogress
IO operations currently in progress on this host (as reported by the OS) GAUGE COUNT
sys.host.disk.read.bytes
Bytes read from all disks since this process started (as reported by the OS) COUNTER BYTES
sys.host.disk.read.count
Disk read operations across all disks since this process started (as reported by the OS) COUNTER COUNT
sys.host.disk.write.bytes
Bytes written to all disks since this process started (as reported by the OS) COUNTER BYTES
sys.host.disk.write.count
Disk write operations across all disks since this process started (as reported by the OS) COUNTER COUNT
sys.host.net.recv.bytes
Bytes received on all network interfaces since this process started (as reported by the OS) COUNTER BYTES
sys.host.net.send.bytes
Bytes sent on all network interfaces since this process started (as reported by the OS) COUNTER BYTES
sys.rss
Current process RSS GAUGE BYTES
sys.runnable.goroutines.per.cpu
Average number of goroutines that are waiting to run, normalized by number of cores GAUGE COUNT
sys.totalmem
Total memory (both free and used) GAUGE BYTES
sys.uptime
Process uptime COUNTER SECONDS
sysbytes
Number of bytes in system KV pairs GAUGE BYTES
syscount
Count of system KV pairs GAUGE COUNT
tenant.consumption.cross_region_network_ru
Total number of RUs charged for cross-region network traffic COUNTER COUNT
tenant.consumption.external_io_egress_bytes
Total number of bytes written to external services such as cloud storage providers GAUGE COUNT
tenant.consumption.pgwire_egress_bytes
Total number of bytes transferred from a SQL pod to the client GAUGE COUNT
tenant.consumption.read_batches
Total number of KV read batches GAUGE COUNT
tenant.consumption.read_bytes
Total number of bytes read from KV GAUGE COUNT
tenant.consumption.read_requests
Total number of KV read requests GAUGE COUNT
tenant.consumption.request_units
Total RU consumption COUNTER COUNT
tenant.consumption.sql_pods_cpu_seconds
Total amount of CPU used by SQL pods GAUGE SECONDS
tenant.consumption.write_batches
Total number of KV write batches GAUGE COUNT
tenant.consumption.write_bytes
Total number of bytes written to KV GAUGE COUNT
tenant.consumption.write_requests
Total number of KV write requests GAUGE COUNT
tenant.sql_usage.cross_region_network_ru
Total number of RUs charged for cross-region network traffic COUNTER COUNT
tenant.sql_usage.estimated_cpu_seconds
Estimated amount of CPU consumed by a virtual cluster COUNTER SECONDS
tenant.sql_usage.external_io_egress_bytes
Total number of bytes written to external services such as cloud storage providers COUNTER COUNT
tenant.sql_usage.external_io_ingress_bytes
Total number of bytes read from external services such as cloud storage providers COUNTER COUNT
tenant.sql_usage.kv_request_units
RU consumption attributable to KV COUNTER COUNT
tenant.sql_usage.pgwire_egress_bytes
Total number of bytes transferred from a SQL pod to the client COUNTER COUNT
tenant.sql_usage.provisioned_vcpus
Number of vcpus available to the virtual cluster GAUGE COUNT
tenant.sql_usage.read_batches
Total number of KV read batches COUNTER COUNT
tenant.sql_usage.read_bytes
Total number of bytes read from KV COUNTER COUNT
tenant.sql_usage.read_requests
Total number of KV read requests COUNTER COUNT
tenant.sql_usage.request_units
RU consumption COUNTER COUNT
tenant.sql_usage.sql_pods_cpu_seconds
Total amount of CPU used by SQL pods COUNTER SECONDS
tenant.sql_usage.write_batches
Total number of KV write batches COUNTER COUNT
tenant.sql_usage.write_bytes
Total number of bytes written to KV COUNTER COUNT
tenant.sql_usage.write_requests
Total number of KV write requests COUNTER COUNT
timeseries.write.bytes
Total size in bytes of metric samples written to disk COUNTER BYTES
timeseries.write.errors
Total errors encountered while attempting to write metrics to disk COUNTER COUNT
timeseries.write.samples
Total number of metric samples written to disk COUNTER COUNT
totalbytes
Total number of bytes taken up by keys and values including non-live data GAUGE BYTES
txn.aborts
Number of aborted KV transactions COUNTER COUNT
txn.commits
Number of committed KV transactions (including 1PC) COUNTER COUNT
txn.commits1PC
Number of KV transaction one-phase commits COUNTER COUNT
txn.durations
KV transaction durations HISTOGRAM NANOSECONDS
txn.restarts
Number of restarted KV transactions HISTOGRAM COUNT
txn.restarts.asyncwritefailure
Number of restarts due to async consensus writes that failed to leave intents COUNTER COUNT
txn.restarts.readwithinuncertainty
Number of restarts due to reading a new value within the uncertainty interval COUNTER COUNT
txn.restarts.serializable
Number of restarts due to a forwarded commit timestamp and isolation=SERIALIZABLE COUNTER COUNT
txn.restarts.txnaborted
Number of restarts due to an abort by a concurrent transaction (usually due to deadlock) COUNTER COUNT
txn.restarts.txnpush
Number of restarts due to a transaction push failure COUNTER COUNT
txn.restarts.unknown
Number of restarts due to a unknown reasons COUNTER COUNT
txn.restarts.writetooold
Number of restarts due to a concurrent writer committing first COUNTER COUNT
txnwaitqueue.deadlocks_total
Number of deadlocks detected by the txn wait queue COUNTER COUNT
valbytes
Number of bytes taken up by values GAUGE BYTES
valcount
Count of all values GAUGE COUNT

See also


Yes No
On this page

Yes No