Skip to the content.

Chapter 11: Output and Export

TL;DR

MarkQL result semantics stay the same across sinks; only serialization format changes. Pick the sink that matches your downstream workflow.

What are output sinks in MarkQL?

Output sinks are query targets that serialize result rows: TO LIST(), TO CSV(...), TO JSON(...), and TO NDJSON(...). They are not just convenience features; they define interoperability boundaries with downstream tools.

This matters because extraction value is realized downstream. If output shape and format are unstable, downstream systems become fragile. MarkQL keeps sink syntax explicit in the query so result shaping and export intent are reviewable together.

This may feel unfamiliar if you are used to handling output entirely in host language code. In MarkQL, sink intent can be encoded at query level, which reduces glue code and keeps extraction semantics close to serialization semantics.

Note: Sink choice is part of contract design

Rules

Tables

TO TABLE() is rectangular by default. It preserves extracted rows/cells as-is unless you opt into trimming or sparse output options.

Defaults:

Tiny before/after (same fixture shape as tests/fixtures/tables/trailing_empty_rows_and_cols.html):

SELECT table FROM doc TO TABLE();

Keeps trailing padding rows and trailing empty columns.

SELECT table FROM doc
TO TABLE(TRIM_EMPTY_ROWS=ON, TRIM_EMPTY_COLS=TRAILING);

Drops padding rows and trims only right-edge empty columns.

Sparse formats:

SELECT table FROM doc
TO TABLE(FORMAT=SPARSE, SPARSE_SHAPE=LONG, TRIM_EMPTY_ROWS=ON, TRIM_EMPTY_COLS=TRAILING, HEADER=ON);

Returns one record per non-empty cell (row_index, col_index, optional header, value). Use this for pipelines and append-style processing.

SELECT table FROM doc
TO TABLE(FORMAT=SPARSE, SPARSE_SHAPE=WIDE, TRIM_EMPTY_ROWS=ON, TRIM_EMPTY_COLS=TRAILING, HEADER=ON);

Returns one object per data row with only non-empty keys. Use this for per-row object payloads.

Determinism and compatibility:

Scope

query result rows
  -> sink serializer
  -> file or stdout
same row semantics, different wire format

Listing 11-1: JSON array to stdout

./build/markql --mode plain --color=disabled \
  --query "SELECT li.node_id, PROJECT(li) AS (name: TEXT(h2)) FROM doc WHERE tag = 'li' ORDER BY node_id TO JSON();" \
  --input docs/fixtures/products.html

Observed output:

[{"node_id":"3","name":"Alpha"},{"node_id":"8","name":"Beta"},{"node_id":"11","name":"Gamma"}]

Listing 11-2: NDJSON to stdout

./build/markql --mode plain --color=disabled \
  --query "SELECT li.node_id, PROJECT(li) AS (name: TEXT(h2), note: COALESCE(TEXT(p), 'n/a')) FROM doc WHERE tag = 'li' ORDER BY node_id TO NDJSON();" \
  --input docs/fixtures/products.html

Observed output:

{"node_id":"3","name":"Alpha","note":"Fast and light"}
{"node_id":"8","name":"Beta","note":"n/a"}
{"node_id":"11","name":"Gamma","note":"Budget"}

Listing 11-3: CSV to file

./build/markql --mode plain --color=disabled \
  --query "SELECT li.node_id, PROJECT(li) AS (name: TEXT(h2), note: COALESCE(TEXT(p), 'n/a')) FROM doc WHERE tag = 'li' ORDER BY node_id TO CSV('/tmp/markql_products.csv');" \
  --input docs/fixtures/products.html

Observed file /tmp/markql_products.csv:

node_id,name,note
3,Alpha,Fast and light
8,Beta,n/a
11,Gamma,Budget

Listing 11-4: Deliberate failure (TO LIST shape)

# EXPECT_FAIL: TO LIST() requires a single projected column
./build/markql --mode plain --color=disabled \
  --query "SELECT a.href, a.tag FROM doc WHERE href IS NOT NULL TO LIST();" \
  --input docs/fixtures/basic.html

Observed error:

Error: TO LIST() requires a single projected column

Fix: use one projected value for LIST, or switch to a multi-column sink.

Before/after diagrams

Before
  extract -> custom serializer script
After
  extract + sink in one query contract

Common mistakes

Chapter takeaway

Output is part of the extraction contract, not an afterthought.