Skip to main content
Skip to main content
Edit this page

Importing and exporting custom text data using Templates and Regex in ClickHouse

We often have to deal with data in custom text formats. That could be a non-standard format, invalid JSON, or a broken CSV. Using standard parsers like CSV or JSON won't work in all such cases. But ClickHouse has us covered here with powerful Template and Regex formats.

Importing based on a template

Suppose we want to import data from the following log file:

We can use a Template format to import this data. We have to define a template string with values placeholders for each row of input data:

Let's create a table to import our data into:

To import data using a given template, we have to save our template string in a file (row.template in our case):

We define a name of a column and escaping rule in a ${name:escaping} format. Multiple options are available here, like CSV, JSON, Escaped, or Quoted, which implement respective escaping rules.

Now we can use the given file as an argument to the format_template_row settings option while importing data (note, that template and data files should not have an extra \n symbol at the end of file):

And we can make sure our data was loaded into the table:

Skipping whitespaces

Consider using TemplateIgnoreSpaces, which allows skipping whitespaces between delimiters in a template:

Exporting data using templates

We can also export data to any text format using templates as well. In this case, we have to create two files:

Result set template, which defines the layout for the whole result set:

Here, rows_read and time are system metrics available for each request. While data stands for generated rows (${data} should always come as a first placeholder in this file), based on a template defined in a row template file:

Now let's use these templates to export the following query:

Exporting to HTML files

Template-based results can also be exported to files using an INTO OUTFILE clause. Let's generate HTML files based on given resultset and row formats:

Exporting to XML

Template format can be used to generate all imaginable text format files, including XML. Just put a relevant template and do the export.

Also consider using an XML format to get standard XML results including metadata:

Importing data based on regular expressions

Regexp format addresses more sophisticated cases when input data needs to be parsed in a more complex way. Let's parse our error.log example file, but capture the file name and protocol this time to save them into separate columns. First, let's prepare a new table for that:

Now we can import data based on a regular expression:

ClickHouse will insert data from each capture group into the relevant column based on its order. Let's check the data:

By default, ClickHouse will raise an error in case of unmatched rows. If you want to skip unmatched rows instead, enable it using format_regexp_skip_unmatched option:

Other formats

ClickHouse introduces support for many formats, both text, and binary, to cover various scenarios and platforms. Explore more formats and ways to work with them in the following articles:

And also check clickhouse-local - a portable full-featured tool to work on local/remote files without the need for Clickhouse server.