Skip to main content
Skip to main content

Sampling query profiler

ClickHouse runs a sampling profiler that allows analyzing query execution. Using the profiler, you can find the source code routines that are used the most frequently during query execution. You can trace CPU time and wall-clock time spent including idle time.

The query profiler is automatically enabled in ClickHouse Cloud. The following example query finds the most frequent stack traces for a profiled query, with resolved function names and source locations:

Tip

Replace the query_id value with the ID of the query you want to profile.

In ClickHouse Cloud, you can obtain the query ID by clicking "..." on the far right of the bar above the query result table (next to the table/chart toggle). This opens a context menu where you can click "Copy query ID".

Use clusterAllReplicas(default, system.trace_log) to select from all nodes of the cluster:

SELECT
    count(),
    arrayStringConcat(arrayMap(x -> concat(demangle(addressToSymbol(x)), '\n    ', addressToLine(x)), trace), '\n') AS sym
FROM clusterAllReplicas(default, system.trace_log)
WHERE query_id = '<query_id>' AND trace_type = 'CPU' AND event_date = today()
GROUP BY trace
ORDER BY count() DESC
LIMIT 10
SETTINGS allow_introspection_functions = 1

Using the query profiler in self-managed deployments

In self-managed deployments, to use the query profiler follow the steps below:

Install ClickHouse with debug info

Install the clickhouse-common-static-dbg package:

  1. Follow the instructions in step "Set up the Debian repository"
  2. Run sudo apt-get install clickhouse-server clickhouse-client clickhouse-common-static-dbg to install ClickHouse compiled binary files with debug info
  3. Run sudo service clickhouse-server start to start the server
  4. Run clickhouse-client. The debug symbols from clickhouse-common-static-dbg will automatically be picked up by the server - you don't need to do anything special to enable them

Check server config

Ensure that the trace_log section of your server configuration file is set up. It is enabled by default:

<!-- Trace log. Stores stack traces collected by query profilers.
     See query_profiler_real_time_period_ns and query_profiler_cpu_time_period_ns settings. -->
<trace_log>
    <database>system</database>
    <table>trace_log</table>

    <partition_by>toYYYYMM(event_date)</partition_by>
    <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    <max_size_rows>1048576</max_size_rows>
    <reserved_size_rows>8192</reserved_size_rows>
    <buffer_size_rows_flush_threshold>524288</buffer_size_rows_flush_threshold>
    <!-- Indication whether logs should be dumped to the disk in case of a crash -->
    <flush_on_crash>false</flush_on_crash>
    <symbolize>true</symbolize>
</trace_log>

This section configures the trace_log system table containing the results of the profiler functioning. Remember that data in this table is valid only for a running server. After the server restart, ClickHouse does not clean up the table and all the stored virtual memory address may become invalid.

Configure profile timers

Set up the query_profiler_cpu_time_period_ns or query_profiler_real_time_period_ns settings. Both settings can be used simultaneously.

These settings allow you to configure profiler timers. As these are the session settings, you can get different sampling frequency for the whole server, individual users or user profiles, for your interactive session, and for each individual query.

The default sampling frequency is one sample per second, and both CPU and real timers are enabled. This frequency allows you to collect sufficient information about your ClickHouse cluster whilst not affecting your server's performance. If you need to profile each individual query, use a higher sampling frequency.

Analyze the trace_log system table

To analyze the trace_log system table allow introspection functions with the allow_introspection_functions setting:

SET allow_introspection_functions=1
Note

For security reasons, introspection functions are disabled by default

Use the addressToLine, addressToLineWithInlines, addressToSymbol and demangle introspection functions to get function names and their positions in ClickHouse code. To get a profile for some query, you need to aggregate data from the trace_log table. You can aggregate data by individual functions or by the whole stack traces.

Tip

If you need to visualize trace_log info, try flamegraph and speedscope.

Building flame graphs with the flameGraph function

ClickHouse provides the flameGraph aggregate function which builds a flame graph directly from stack traces stored in trace_log. The output is an array of strings in a format compatible with flamegraph.pl.

Syntax:

flameGraph(traces, [size = 1], [ptr = 0])

Arguments:

  • traces — a stacktrace. Array(UInt64).
  • size — an allocation size for memory profiling. Int64.
  • ptr — an allocation address. UInt64.

When ptr is non-zero, flameGraph maps allocations (size > 0) and deallocations (size < 0) with the same size and pointer. Only allocations that were not freed are shown. Unmatched deallocations are ignored.

CPU flame graph

Note

The queries below require you to have flamegraph.pl installed.

You can do so by running:

git clone https://github.com/brendangregg/FlameGraph
# Then use it as:
# ~/FlameGraph/flamegraph.pl

Replace flamegraph.pl in the following queries with the path where flamegraph.pl is located on your machine

SET query_profiler_cpu_time_period_ns = 10000000;

Run your query, then build the flame graph:

clickhouse client --allow_introspection_functions=1 \
    -q "SELECT arrayJoin(flameGraph(arrayReverse(trace)))
        FROM system.trace_log
        WHERE trace_type = 'CPU' AND query_id = '<query_id>'" \
    | flamegraph.pl > flame_cpu.svg

Memory flame graph — all allocations

SET memory_profiler_sample_probability = 1, max_untracked_memory = 1;

Run your query, then build the flame graph:

clickhouse client --allow_introspection_functions=1 \
    -q "SELECT arrayJoin(flameGraph(trace, size))
        FROM system.trace_log
        WHERE trace_type = 'MemorySample' AND query_id = '<query_id>'" \
    | flamegraph.pl --countname=bytes --color=mem > flame_mem.svg

Memory flame graph — unfreed allocations

This variant matches allocations against deallocations by pointer and shows only memory that was not freed during the query.

SET memory_profiler_sample_probability = 1, max_untracked_memory = 1,
    use_uncompressed_cache = 1,
    merge_tree_max_rows_to_use_cache = 100000000000,
    merge_tree_max_bytes_to_use_cache = 1000000000000;

Run the following query to build the flame graph:

clickhouse client --allow_introspection_functions=1 \
    -q "SELECT arrayJoin(flameGraph(trace, size, ptr))
        FROM system.trace_log
        WHERE trace_type = 'MemorySample' AND query_id = '<query_id>'" \
    | flamegraph.pl --countname=bytes --color=mem > flame_mem_unfreed.svg

Memory flame graph — active allocations at a point in time

This approach lets you find peak memory usage and visualize what was allocated at that moment.

SET memory_profiler_sample_probability = 1, max_untracked_memory = 1;

Find memory usage over time

SELECT
    event_time,
    formatReadableSize(max(s)) AS m
FROM (
    SELECT
        event_time,
        sum(size) OVER (ORDER BY event_time) AS s
    FROM system.trace_log
    WHERE query_id = '<query_id>' AND trace_type = 'MemorySample'
)
GROUP BY event_time
ORDER BY event_time;

Find the time point with maximum memory usage

SELECT
    argMax(event_time, s),
    max(s)
FROM (
    SELECT
        event_time,
        sum(size) OVER (ORDER BY event_time) AS s
    FROM system.trace_log
    WHERE query_id = '<query_id>' AND trace_type = 'MemorySample'
);

Build a flame graph of active allocations at that time point

clickhouse client --allow_introspection_functions=1 \
    -q "SELECT arrayJoin(flameGraph(trace, size, ptr))
        FROM (
            SELECT * FROM system.trace_log
            WHERE trace_type = 'MemorySample'
              AND query_id = '<query_id>'
              AND event_time <= '<time_point>'
            ORDER BY event_time
        )" \
    | flamegraph.pl --countname=bytes --color=mem > flame_mem_time_point_pos.svg

Build a flame graph of deallocations after that time point (to understand what was freed later)

clickhouse client --allow_introspection_functions=1 \
    -q "SELECT arrayJoin(flameGraph(trace, -size, ptr))
        FROM (
            SELECT * FROM system.trace_log
            WHERE trace_type = 'MemorySample'
              AND query_id = '<query_id>'
              AND event_time > '<time_point>'
            ORDER BY event_time DESC
        )" \
    | flamegraph.pl --countname=bytes --color=mem > flame_mem_time_point_neg.svg

Example

The code snippet below:

  • Filters trace_log data by a query identifier and the current date.
  • Aggregates by stack trace.
  • Uses introspection functions to get a report of:
    • The names of symbols and corresponding source code functions.
    • The source code locations of these functions.
SELECT
    count(),
    arrayStringConcat(arrayMap(x -> concat(demangle(addressToSymbol(x)), '\n    ', addressToLine(x)), trace), '\n') AS sym
FROM system.trace_log
WHERE (query_id = '<query_id>') AND (event_date = today())
GROUP BY trace
ORDER BY count() DESC
LIMIT 10