Skip to the content.

Chapter 2: Mental Model

TL;DR

MarkQL runs in two stages: outer WHERE decides which rows exist, and field expressions decide which values those rows carry. If you separate those two decisions while reading a query, most confusion disappears.

What is the MarkQL mental model?

The MarkQL mental model is: “A query is a two-stage computation over a DOM row stream.” Stage 1 filters which nodes survive as output rows. Stage 2 computes values for each surviving row using scoped field expressions. This two-stage model is the semantic center of the language.

This matters because it prevents scope confusion. In many query systems, one expression simultaneously picks rows and values, which hides evaluation order. MarkQL keeps those concerns separated. You can reason about row inclusion first, then reason about value sourcing. That separation reduces accidental data loss and makes null behavior understandable rather than mysterious.

This may feel unfamiliar at first if you are used to one-shot selector APIs. It is common to expect TEXT(span WHERE ...) to filter rows directly. In MarkQL it does not; it only picks a supplier node for one field on a row that already exists. That distinction feels strict at first and then becomes liberating.

Note: “Two-stage evaluation” is MarkQL’s core concept

MarkQL teaches “which stage owns decision-making at each point.”

Rules

Scope

Primary scope names used in this book:

Outer loop:
  doc rows -> row node R
            -> outer WHERE decides keep/drop
            -> if keep: evaluate fields for R
Inside a field:
  supplier search space = R (+ scoped descendants/axes)
  choose first/last/indexed match
  return value or NULL

Listing 2-1: Observe stage 1 only

Query:

SELECT section.node_id
FROM doc
WHERE tag = 'section'
  AND EXISTS(child WHERE tag = 'h3')
ORDER BY node_id;
./build/markql --mode plain --color=disabled \
  --query "SELECT section.node_id FROM doc WHERE tag = 'section' AND EXISTS(child WHERE tag = 'h3') ORDER BY node_id;" \
  --input docs/fixtures/basic.html

Observed output:

[
  {"node_id":6},
  {"node_id":11},
  {"node_id":16}
]

This listing deliberately avoids field extraction to isolate stage 1. The output tells you only which rows survive. When debugging, that isolation is powerful: first make row inclusion correct, then move to fields.

Listing 2-2: Add stage 2 explicitly

Query:

SELECT section.node_id,
PROJECT(section) AS (
  title: TEXT(h3),
  stop_text: TEXT(span WHERE DIRECT_TEXT(span) LIKE '%stop%')
)
FROM doc
WHERE tag = 'section'
ORDER BY node_id;
./build/markql --mode plain --color=disabled \
  --query "SELECT section.node_id, PROJECT(section) AS (title: TEXT(h3), stop_text: TEXT(span WHERE DIRECT_TEXT(span) LIKE '%stop%')) FROM doc WHERE tag = 'section' ORDER BY node_id;" \
  --input docs/fixtures/basic.html

Observed output:

[
  {"node_id":6,"title":"Tokyo","stop_text":"1 stop"},
  {"node_id":11,"title":"Osaka","stop_text":"nonstop"},
  {"node_id":16,"title":"Kyoto Stay","stop_text":null}
]

Notice row 16 remains even though stop_text is null. This is the two-stage model in one line of output: stage 1 kept the row, stage 2 could not find a supplier for that one field.

Listing 2-3: Deliberate failure (invalid axis)

Naive query:

SELECT section.node_id FROM doc WHERE EXISTS(foo WHERE tag = 'h3');
# EXPECT_FAIL: Expected axis name
./build/markql --mode plain --color=disabled \
  --query "SELECT section.node_id FROM doc WHERE EXISTS(foo WHERE tag = 'h3');" \
  --input docs/fixtures/basic.html

Observed error:

Error: Query parse error: Expected axis name (self, parent, child, ancestor, descendant)

The parser is protecting the scope model. EXISTS must declare an axis universe. A vague axis name would make evaluation ambiguous.

Listing 2-4: Correct axis-based filter

./build/markql --mode plain --color=disabled \
  --query "SELECT section.node_id FROM doc WHERE tag='section' AND EXISTS(descendant WHERE tag='span' AND attributes.role = 'text') ORDER BY node_id;" \
  --input docs/fixtures/basic.html

Observed output:

[
  {"node_id":6},
  {"node_id":11}
]

Before/after scope diagrams

Before (common confusion)
  field predicate controls row survival
After (actual semantics)
  outer WHERE -> row survival
  field WHERE -> supplier choice

Keep this chapter’s model active as you read the rest of the book. It is not one chapter’s concept; it is the language’s operating system.

Common mistakes

Chapter takeaway

When output looks wrong, ask two questions in order: “Did the right rows survive?” and “Did each field pick the right supplier?”