Architecture¶

NLQL is designed around a three-stage execution model that balances flexibility, performance, and extensibility.

Overview¶

NLQL Query String
       ↓
   [Parsing]
       ↓
      AST
       ↓
   [Routing]
       ↓
  Query Plan (Push-down + In-memory)
       ↓
  [Execution]
       ↓
   Raw Results
       ↓
  [Reshaping]
       ↓
  Final Results

Stage 1: Parsing¶

Grammar¶

NLQL uses Lark for parsing. The grammar is defined in parser/grammar.lark and supports:

SELECT with granularity control (DOCUMENT, CHUNK, SENTENCE, SPAN)
WHERE with boolean logic (AND, OR, NOT)
Operators (MATCH, SIMILAR_TO, CONTAINS, IS, META)
Functions (LENGTH, NOW, COUNT, custom functions)
ORDER BY with multiple fields
LIMIT clause

AST Structure¶

The parser produces an Abstract Syntax Tree (AST) with nodes defined in ast/nodes.py:

SelectStatement - Root node
WhereClause - Filter conditions
LogicalExpr - AND/OR/NOT expressions
ComparisonExpr - Comparison operations
OperatorCall - Custom operator invocations
FunctionCall - Function invocations
Literal, Identifier - Leaf nodes

Error Handling¶

Parse errors include:

Line and column numbers
Context lines showing the error location
Helpful error messages

Stage 2: Routing¶

Push-down vs In-memory¶

The routing stage analyzes the WHERE clause and determines what can be executed where:

Push-down (executed by data source): - Semantic similarity (SIMILAR_TO) - if adapter supports it - Metadata filters (META) - if adapter supports it - Simple comparisons on indexed fields

In-memory (executed after retrieval): - Complex boolean logic - Custom operators - Text pattern matching (MATCH, CONTAINS) - Functions that require full text access

Adapter Capabilities¶

Each adapter declares its capabilities:

class BaseAdapter:
    def supports_semantic_search(self) -> bool: ...
    def supports_metadata_filter(self) -> bool: ...

The router uses these capabilities to create an optimal query plan.

Query Plan¶

The routing stage produces a QueryPlan:

@dataclass
class QueryPlan:
    filters: dict[str, Any] | None  # Push-down filters
    query_text: str | None          # Semantic search text
    limit: int | None               # Result limit
    metadata: dict[str, Any]        # Adapter-specific params

Stage 3: Execution¶

Adapter Execution¶

The executor sends the query plan to the adapter:

units = adapter.query(plan)

Adapters return TextUnit objects (typically Chunk instances from vector databases).

In-memory Filtering¶

After retrieval, the executor applies in-memory filters:

Evaluate WHERE clause conditions that couldn't be pushed down
Apply custom operators and functions
Filter results based on evaluation

Granularity Transformation¶

Based on the SELECT clause, results are transformed:

DOCUMENT - Group chunks back into documents
CHUNK - Return as-is (default from vector DBs)
SENTENCE - Split chunks into sentences
SPAN(unit, window=N) - Create sliding windows with context

Ordering and Limiting¶

Finally:

Apply ORDER BY (similarity score or metadata fields)
Apply LIMIT to get top-N results

Extensibility Points¶

1. Custom Adapters¶

Implement BaseAdapter to support new data sources:

class MyAdapter(BaseAdapter):
    def query(self, plan: QueryPlan) -> list[TextUnit]: ...
    def supports_semantic_search(self) -> bool: ...
    def supports_metadata_filter(self) -> bool: ...

2. Custom Operators¶

Register operators for domain-specific logic:

@nlql.register_operator("HAS_EMAIL")
def has_email_operator(text: str) -> bool:
    return bool(re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text))

3. Custom Functions¶

Add query functions:

@nlql.register_function("word_count")
def word_count(text: str) -> int:
    return len(text.split())

4. Custom Types¶

Define metadata field types for type-safe comparisons:

from nlql import register_meta_field, NumberType

register_meta_field("priority", NumberType)

5. Custom Splitters¶

Implement language-specific text splitting:

@nlql.register_splitter("SENTENCE")
def german_sentence_splitter(text: str) -> list[str]:
    import nltk
    return nltk.sent_tokenize(text, language='german')

6. Custom Embedding¶

Use your own embedding model:

@nlql.register_embedding_provider
def my_embedding(texts: list[str]) -> list[list[float]]:
    # Your embedding logic
    return embeddings

Type System¶

NLQL uses an implicit type system for metadata fields:

NumberType - Numeric comparisons
TextType - String comparisons
DateType - Date/time comparisons

Types are registered per field and used during WHERE clause evaluation to ensure type-safe comparisons.

Performance Considerations¶

Push-down optimization: Maximize what's pushed to the data source
Lazy evaluation: Only compute what's needed
Lazy imports: Optional dependencies loaded on-demand
Batch processing: Embeddings computed in batches

Next Steps¶

Explore Query Syntax details
Learn about Data Sources
Dive into Extensibility