Skip to content

Architecture

NLQL is designed around a three-stage execution model that balances flexibility, performance, and extensibility.

Overview

NLQL Query String
   [Parsing]
      AST
   [Routing]
  Query Plan (Push-down + In-memory)
  [Execution]
   Raw Results
  [Reshaping]
  Final Results

Stage 1: Parsing

Grammar

NLQL uses Lark for parsing. The grammar is defined in parser/grammar.lark and supports:

  • SELECT with granularity control (DOCUMENT, CHUNK, SENTENCE, SPAN)
  • WHERE with boolean logic (AND, OR, NOT)
  • Operators (MATCH, SIMILAR_TO, CONTAINS, IS, META)
  • Functions (LENGTH, NOW, COUNT, custom functions)
  • ORDER BY with multiple fields
  • LIMIT clause

AST Structure

The parser produces an Abstract Syntax Tree (AST) with nodes defined in ast/nodes.py:

  • SelectStatement - Root node
  • WhereClause - Filter conditions
  • LogicalExpr - AND/OR/NOT expressions
  • ComparisonExpr - Comparison operations
  • OperatorCall - Custom operator invocations
  • FunctionCall - Function invocations
  • Literal, Identifier - Leaf nodes

Error Handling

Parse errors include:

  • Line and column numbers
  • Context lines showing the error location
  • Helpful error messages

Stage 2: Routing

Push-down vs In-memory

The routing stage analyzes the WHERE clause and determines what can be executed where:

Push-down (executed by data source): - Semantic similarity (SIMILAR_TO) - if adapter supports it - Metadata filters (META) - if adapter supports it - Simple comparisons on indexed fields

In-memory (executed after retrieval): - Complex boolean logic - Custom operators - Text pattern matching (MATCH, CONTAINS) - Functions that require full text access

Adapter Capabilities

Each adapter declares its capabilities:

class BaseAdapter:
    def supports_semantic_search(self) -> bool: ...
    def supports_metadata_filter(self) -> bool: ...

The router uses these capabilities to create an optimal query plan.

Query Plan

The routing stage produces a QueryPlan:

@dataclass
class QueryPlan:
    filters: dict[str, Any] | None  # Push-down filters
    query_text: str | None          # Semantic search text
    limit: int | None               # Result limit
    metadata: dict[str, Any]        # Adapter-specific params

Stage 3: Execution

Adapter Execution

The executor sends the query plan to the adapter:

units = adapter.query(plan)

Adapters return TextUnit objects (typically Chunk instances from vector databases).

In-memory Filtering

After retrieval, the executor applies in-memory filters:

  1. Evaluate WHERE clause conditions that couldn't be pushed down
  2. Apply custom operators and functions
  3. Filter results based on evaluation

Granularity Transformation

Based on the SELECT clause, results are transformed:

  • DOCUMENT - Group chunks back into documents
  • CHUNK - Return as-is (default from vector DBs)
  • SENTENCE - Split chunks into sentences
  • SPAN(unit, window=N) - Create sliding windows with context

Ordering and Limiting

Finally:

  1. Apply ORDER BY (similarity score or metadata fields)
  2. Apply LIMIT to get top-N results

Extensibility Points

1. Custom Adapters

Implement BaseAdapter to support new data sources:

class MyAdapter(BaseAdapter):
    def query(self, plan: QueryPlan) -> list[TextUnit]: ...
    def supports_semantic_search(self) -> bool: ...
    def supports_metadata_filter(self) -> bool: ...

2. Custom Operators

Register operators for domain-specific logic:

@nlql.register_operator("HAS_EMAIL")
def has_email_operator(text: str) -> bool:
    return bool(re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text))

3. Custom Functions

Add query functions:

@nlql.register_function("word_count")
def word_count(text: str) -> int:
    return len(text.split())

4. Custom Types

Define metadata field types for type-safe comparisons:

from nlql import register_meta_field, NumberType

register_meta_field("priority", NumberType)

5. Custom Splitters

Implement language-specific text splitting:

@nlql.register_splitter("SENTENCE")
def german_sentence_splitter(text: str) -> list[str]:
    import nltk
    return nltk.sent_tokenize(text, language='german')

6. Custom Embedding

Use your own embedding model:

@nlql.register_embedding_provider
def my_embedding(texts: list[str]) -> list[list[float]]:
    # Your embedding logic
    return embeddings

Type System

NLQL uses an implicit type system for metadata fields:

  • NumberType - Numeric comparisons
  • TextType - String comparisons
  • DateType - Date/time comparisons

Types are registered per field and used during WHERE clause evaluation to ensure type-safe comparisons.

Performance Considerations

  1. Push-down optimization: Maximize what's pushed to the data source
  2. Lazy evaluation: Only compute what's needed
  3. Lazy imports: Optional dependencies loaded on-demand
  4. Batch processing: Embeddings computed in batches

Next Steps