Chapter 12: Troubleshooting
TL;DR
Most MarkQL failures map cleanly to one stage: parse, row filter, field extraction, or sink. Diagnose by stage, not by random query edits.
What is troubleshooting in MarkQL?
Troubleshooting in MarkQL is a staged diagnosis process: isolate parse errors, isolate row-filter errors, isolate supplier-selection errors, and then validate output sink constraints. Because MarkQL is explicit, error messages usually point to stage boundaries.
This matters because most difficult extraction incidents are not random. They are usually one of a small number of recurring mistakes: grammar mismatch, scope confusion, unsupported sort key, or sink misuse. A disciplined troubleshooting flow turns those incidents from panic into checklists.
This can feel unfamiliar if you are used to silently failing selector APIs. MarkQL tends to fail loudly with explicit parser/validation messages. That loudness is useful once you map each error family to a stage of evaluation.
Note: Error messages are stage hints
- Parse errors hint grammar boundary issues.
- Validation/runtime errors hint semantic constraints.
- Empty rows hint row-scope mismatch. Treat error messages as signposts for where to inspect next, not just as strings to suppress.
Rules
- Reproduce with smallest fixture and smallest query.
- Verify row scope before field scope.
- Keep one deliberate failure query per incident for future regression notes.
- Record exact command and exact observed error.
- Use
SHOW FUNCTIONS,SHOW AXES,SHOW OPERATORSto confirm capabilities.
Scope
Troubleshooting ladder
parser -> row stage -> field stage -> sink stage
If row stage fails:
fix WHERE / EXISTS / axis logic first
If field stage fails:
fix TEXT/ATTR supplier logic
Listing 12-1: Parse boundary failure (Expected FROM)
# EXPECT_FAIL: Expected FROM
./build/markql --mode plain --color=disabled \
--query "select div(data-id) as data_id from doc;" \
--input docs/fixtures/basic.html
Observed error:
Error: Query parse error: Expected FROM
Listing 12-2: Semantic boundary failure (ORDER BY expression)
# EXPECT_FAIL: ORDER BY supports node_id, tag, text, parent_id, sibling_pos, max_depth, or doc_order
./build/markql --mode plain --color=disabled \
--query "SELECT section.node_id FROM doc WHERE tag='section' ORDER BY attributes.data-kind;" \
--input docs/fixtures/basic.html
Observed error:
ERROR[MQL-SEM-0401]: ORDER BY supports node_id, tag, text, parent_id, sibling_pos, max_depth, or doc_order
In this build, ORDER BY supports core row fields. Sorting by expression-like paths is not supported yet.
Listing 12-3: Capability introspection
./build/markql --mode plain --color=disabled --query "SHOW FUNCTIONS;"
./build/markql --mode plain --color=disabled --query "SHOW AXES;"
./build/markql --mode plain --color=disabled --query "SHOW OPERATORS;"
Observed outputs (trimmed):
SHOW FUNCTIONSincludesproject(tag),flatten(tag[, depth]),coalesce,case.SHOW AXESincludesparent,child,ancestor,descendant.SHOW OPERATORSincludesLIKE,IN,IS NULL,HAS_DIRECT_TEXT(legacy shorthand), and logical operators.
Listing 12-4: Corrective rewrite pattern
./build/markql --mode plain --color=disabled \
--query "SELECT section.node_id, PROJECT(section) AS (title: TEXT(h3), stop_text: TEXT(span WHERE DIRECT_TEXT(span) LIKE '%stop%')) FROM doc WHERE tag='section' AND EXISTS(descendant WHERE tag='span' AND text LIKE '%stop%') ORDER BY node_id;" \
--input docs/fixtures/basic.html
Observed output:
[
{"node_id":6,"title":"Tokyo","stop_text":"1 stop"},
{"node_id":11,"title":"Osaka","stop_text":"nonstop"}
]
Before/after diagrams
Before
unclear error -> random query edits
After
identify stage -> apply stage-specific fix -> verify
Common mistakes
- Editing multiple clauses before confirming the failing stage.
Fix: isolate one stage and rerun with a minimal query. - Treating error text as noise.
Fix: read it as a boundary hint (grammar vs semantics vs capability).
Chapter takeaway
MarkQL errors are most useful when you read them as stage-level diagnostics.