Query Syntax Reference¶

Complete reference for NLQL query syntax.

SELECT Clause¶

Granularity Levels¶

-- Full documents
SELECT DOCUMENT

-- Chunks (default from vector databases)
SELECT CHUNK

-- Individual sentences
SELECT SENTENCE

-- Sliding window with context
SELECT SPAN(SENTENCE, window=3)
SELECT SPAN(CHUNK, window=2)

WHERE Clause¶

Semantic Operators¶

SIMILAR_TO - Semantic Similarity¶

The SIMILAR_TO operator performs vector-based semantic search:

-- Basic semantic search
WHERE SIMILAR_TO("query text") > 0.8

-- Combine with other conditions
WHERE SIMILAR_TO("AI agents") > 0.7 AND META("status") == "published"

How it works:

Automatic Vectorization: NLQL automatically embeds both the query text and all document chunks using the configured embedding provider (default: all-MiniLM-L6-v2)
Similarity Computation: Computes cosine similarity between query and document vectors, returning a score between 0 and 1
Score Storage: The similarity score is stored in metadata["similarity"] for each result, making it accessible in:
WHERE clause: WHERE SIMILAR_TO("query") > threshold
ORDER BY clause: ORDER BY SIMILARITY DESC
Result metadata: result.metadata['similarity']

Important Notes:

Execution Order: Similarity is computed on the original chunks BEFORE granularity transformation (SENTENCE/SPAN)
Score Inheritance: When transforming to SENTENCE/SPAN, the similarity score is inherited from the parent chunk
Threshold Selection: Typical thresholds:
> 0.8: Very high similarity (near-duplicates)
> 0.6: High similarity (related content)
> 0.4: Moderate similarity (loosely related)
> 0.2: Low similarity (may include noise)

Example:

results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("machine learning applications") > 0.6
    ORDER BY SIMILARITY DESC
    LIMIT 5
""")

for result in results:
    print(f"[{result.metadata['similarity']:.3f}] {result.content}")

MATCH - Exact Text Match¶

Exact phrase matching (case-sensitive):

-- Match exact phrase
WHERE MATCH("exact phrase")

-- Combine with OR
WHERE MATCH("error") OR MATCH("warning")

CONTAINS - Substring Match¶

Case-insensitive substring matching:

-- Contains keyword (case-insensitive)
WHERE CONTAINS("keyword")

-- Combine with AND
WHERE CONTAINS("machine") AND CONTAINS("learning")

Metadata Operators¶

-- Access metadata fields
WHERE META("field_name") == "value"
WHERE META("date") > "2024-01-01"
WHERE META("score") >= 0.5

Comparison Operators¶

-- Numeric comparisons
WHERE META("score") > 0.8
WHERE META("count") <= 100
WHERE META("value") >= 50

-- Text comparisons
WHERE META("status") == "active"
WHERE META("category") != "archived"

-- Date comparisons (requires DateType registration)
WHERE META("created_at") > "2024-01-01"

Boolean Logic¶

-- AND
WHERE SIMILAR_TO("AI") > 0.7 AND META("topic") == "ML"

-- OR
WHERE CONTAINS("machine learning") OR CONTAINS("deep learning")

-- NOT
WHERE NOT CONTAINS("deprecated")

-- Complex combinations
WHERE (SIMILAR_TO("AI") > 0.8 OR CONTAINS("artificial intelligence"))
  AND META("date") > "2024-01-01"
  AND NOT META("archived") == true

Functions¶

-- Built-in functions
WHERE LENGTH(content) > 100
WHERE COUNT("AI") > 3

-- Custom functions (after registration)
WHERE word_count(content) > 50

ORDER BY Clause¶

-- Order by similarity score (for SIMILAR_TO queries)
ORDER BY SIMILARITY DESC

-- Order by metadata fields
ORDER BY META("date") DESC
ORDER BY META("score") ASC

-- Multiple fields
ORDER BY META("priority") DESC, META("date") DESC

LIMIT Clause¶

-- Limit number of results
LIMIT 10
LIMIT 100

Complete Examples¶

Basic Retrieval¶

SELECT CHUNK
WHERE CONTAINS("machine learning")
LIMIT 10

Semantic Search¶

SELECT SENTENCE
WHERE SIMILAR_TO("AI agents and autonomous systems") > 0.75
ORDER BY SIMILARITY DESC
LIMIT 5

Metadata Filtering¶

SELECT DOCUMENT
WHERE META("author") == "Alice"
  AND META("date") > "2024-01-01"
ORDER BY META("date") DESC

Hybrid Query¶

SELECT SENTENCE
WHERE SIMILAR_TO("neural networks") > 0.7
  AND META("topic") == "deep learning"
  AND LENGTH(content) > 50
ORDER BY SIMILARITY DESC
LIMIT 20

Context Windows¶

SELECT SPAN(SENTENCE, window=2)
WHERE SIMILAR_TO("transformer architecture")
ORDER BY SIMILARITY DESC
LIMIT 5

Operator Reference¶

Operator	Description	Returns	Example
`SIMILAR_TO("text")`	Semantic similarity (vector-based)	Float (0-1)	`SIMILAR_TO("query") > 0.8`
`MATCH("text")`	Exact text match (case-sensitive)	Boolean	`MATCH("exact phrase")`
`CONTAINS("text")`	Substring match (case-insensitive)	Boolean	`CONTAINS("keyword")`
`META("field")`	Metadata field access	Any	`META("field") == value`

Function Reference¶

Function	Description	Example
`LENGTH`	Text length	`LENGTH(content) > 100`
`COUNT`	Count occurrences	`COUNT("word") > 3`
`NOW`	Current timestamp	`META("date") < NOW()`

Type System¶

Register metadata field types for type-safe comparisons:

from nlql import register_meta_field, NumberType, DateType, TextType

# Register field types
register_meta_field("score", NumberType)
register_meta_field("created_at", DateType)
register_meta_field("status", TextType)

Then use in queries:

WHERE META("score") > 0.8
WHERE META("created_at") > "2024-01-01"
WHERE META("status") == "active"