Chapter 6: Structural Row Filtering with EXISTS
TL;DR
Axes describe structural scope, and EXISTS turns structural facts into row-filter decisions. Use them when class names are unstable.
What are axes and EXISTS?
Axes (parent, child, ancestor, descendant) define structural traversal relative to the current row. EXISTS(axis WHERE ...) asks whether at least one node in that axis scope satisfies a predicate.
This matters because structural invariants survive class churn better than cosmetic attributes. If a row “must have a heading and a price node,” that is a structural claim. MarkQL lets you encode that claim directly instead of writing brittle selector chains.
This may feel unfamiliar if you have mostly used selector APIs that hide traversal decisions. Axes make traversal visible. That visibility adds a little syntax, but it buys explicit control and debuggability.
Note: Axes are scoped to the current row
In outer
WHERE, the current row is each candidate node from stage 1. SoEXISTS(descendant WHERE ...)means descendants of that row, not global descendants of the whole document. This scoping rule explains why the same EXISTS clause can be true for one row and false for another in the same query.
Rules
- Use
EXISTSwhen row inclusion depends on presence/absence of structure. - Read axis predicates as sentences: “this row has a descendant where …”.
- Prefer
descendantfor broad checks,childfor strict depth. - Keep axis predicates small and composable.
- Validate axis assumptions with tiny row projections before full extraction.
Scope
row R
parent scope -> at most 1 node
child scope -> direct children of R
ancestor scope -> chain up from R
descendant scope -> all nodes below R
EXISTS(axis WHERE P)
true if any node in axis scope satisfies P
false otherwise
Listing 6-1: Child existence gate
./build/markql --mode plain --color=disabled \
--query "SELECT section.node_id FROM doc WHERE tag = 'section' AND EXISTS(child WHERE tag = 'h3') ORDER BY node_id;" \
--input docs/fixtures/basic.html
Observed output:
[
{"node_id":6},
{"node_id":11},
{"node_id":16}
]
Listing 6-2: Descendant existence gate with predicate
./build/markql --mode plain --color=disabled \
--query "SELECT section.node_id FROM doc WHERE tag = 'section' AND EXISTS(descendant WHERE tag = 'span' AND attributes.role = 'text') ORDER BY node_id;" \
--input docs/fixtures/basic.html
Observed output:
[
{"node_id":6},
{"node_id":11}
]
Row 16 is excluded because it does not have a matching descendant span with role='text'.
Listing 6-3: Deliberate failure (invalid axis symbol)
# EXPECT_FAIL: Expected axis name
./build/markql --mode plain --color=disabled \
--query "SELECT section.node_id FROM doc WHERE EXISTS(foo WHERE tag = 'h3');" \
--input docs/fixtures/basic.html
Observed error:
Error: Query parse error: Expected axis name (self, parent, child, ancestor, descendant)
Fix by choosing an explicit axis (child or descendant) based on depth requirements.
Before/after diagrams
Before
EXISTS(unknown_scope WHERE ...)
After
EXISTS(descendant WHERE tag='span' AND ...)
Common mistakes
- Reading
EXISTS(descendant ...)as global document search.
Fix: interpret it relative to the current row. - Using
childwhen the target may be nested deeper.
Fix: switch todescendantwhen depth can vary.
Chapter takeaway
Axes plus EXISTS are how you encode robust structural rules instead of brittle selector trivia.