Query Syntax Reference¶
Complete reference for NLQL query syntax.
SELECT Clause¶
Granularity Levels¶
-- Full documents
SELECT DOCUMENT
-- Chunks (default from vector databases)
SELECT CHUNK
-- Individual sentences
SELECT SENTENCE
-- Sliding window with context
SELECT SPAN(SENTENCE, window=3)
SELECT SPAN(CHUNK, window=2)
WHERE Clause¶
Semantic Operators¶
SIMILAR_TO - Semantic Similarity¶
The SIMILAR_TO operator performs vector-based semantic search:
-- Basic semantic search
WHERE SIMILAR_TO("query text") > 0.8
-- Combine with other conditions
WHERE SIMILAR_TO("AI agents") > 0.7 AND META("status") == "published"
How it works:
-
Automatic Vectorization: NLQL automatically embeds both the query text and all document chunks using the configured embedding provider (default:
all-MiniLM-L6-v2) -
Similarity Computation: Computes cosine similarity between query and document vectors, returning a score between 0 and 1
-
Score Storage: The similarity score is stored in
metadata["similarity"]for each result, making it accessible in: - WHERE clause:
WHERE SIMILAR_TO("query") > threshold - ORDER BY clause:
ORDER BY SIMILARITY DESC - Result metadata:
result.metadata['similarity']
Important Notes:
- Execution Order: Similarity is computed on the original chunks BEFORE granularity transformation (SENTENCE/SPAN)
- Score Inheritance: When transforming to SENTENCE/SPAN, the similarity score is inherited from the parent chunk
- Threshold Selection: Typical thresholds:
> 0.8: Very high similarity (near-duplicates)> 0.6: High similarity (related content)> 0.4: Moderate similarity (loosely related)> 0.2: Low similarity (may include noise)
Example:
results = nlql.execute("""
SELECT CHUNK
WHERE SIMILAR_TO("machine learning applications") > 0.6
ORDER BY SIMILARITY DESC
LIMIT 5
""")
for result in results:
print(f"[{result.metadata['similarity']:.3f}] {result.content}")
MATCH - Exact Text Match¶
Exact phrase matching (case-sensitive):
-- Match exact phrase
WHERE MATCH("exact phrase")
-- Combine with OR
WHERE MATCH("error") OR MATCH("warning")
CONTAINS - Substring Match¶
Case-insensitive substring matching:
-- Contains keyword (case-insensitive)
WHERE CONTAINS("keyword")
-- Combine with AND
WHERE CONTAINS("machine") AND CONTAINS("learning")
Metadata Operators¶
-- Access metadata fields
WHERE META("field_name") == "value"
WHERE META("date") > "2024-01-01"
WHERE META("score") >= 0.5
Comparison Operators¶
-- Numeric comparisons
WHERE META("score") > 0.8
WHERE META("count") <= 100
WHERE META("value") >= 50
-- Text comparisons
WHERE META("status") == "active"
WHERE META("category") != "archived"
-- Date comparisons (requires DateType registration)
WHERE META("created_at") > "2024-01-01"
Boolean Logic¶
-- AND
WHERE SIMILAR_TO("AI") > 0.7 AND META("topic") == "ML"
-- OR
WHERE CONTAINS("machine learning") OR CONTAINS("deep learning")
-- NOT
WHERE NOT CONTAINS("deprecated")
-- Complex combinations
WHERE (SIMILAR_TO("AI") > 0.8 OR CONTAINS("artificial intelligence"))
AND META("date") > "2024-01-01"
AND NOT META("archived") == true
Functions¶
-- Built-in functions
WHERE LENGTH(content) > 100
WHERE COUNT("AI") > 3
-- Custom functions (after registration)
WHERE word_count(content) > 50
ORDER BY Clause¶
-- Order by similarity score (for SIMILAR_TO queries)
ORDER BY SIMILARITY DESC
-- Order by metadata fields
ORDER BY META("date") DESC
ORDER BY META("score") ASC
-- Multiple fields
ORDER BY META("priority") DESC, META("date") DESC
LIMIT Clause¶
Complete Examples¶
Basic Retrieval¶
Semantic Search¶
SELECT SENTENCE
WHERE SIMILAR_TO("AI agents and autonomous systems") > 0.75
ORDER BY SIMILARITY DESC
LIMIT 5
Metadata Filtering¶
SELECT DOCUMENT
WHERE META("author") == "Alice"
AND META("date") > "2024-01-01"
ORDER BY META("date") DESC
Hybrid Query¶
SELECT SENTENCE
WHERE SIMILAR_TO("neural networks") > 0.7
AND META("topic") == "deep learning"
AND LENGTH(content) > 50
ORDER BY SIMILARITY DESC
LIMIT 20
Context Windows¶
SELECT SPAN(SENTENCE, window=2)
WHERE SIMILAR_TO("transformer architecture")
ORDER BY SIMILARITY DESC
LIMIT 5
Operator Reference¶
| Operator | Description | Returns | Example |
|---|---|---|---|
SIMILAR_TO("text") |
Semantic similarity (vector-based) | Float (0-1) | SIMILAR_TO("query") > 0.8 |
MATCH("text") |
Exact text match (case-sensitive) | Boolean | MATCH("exact phrase") |
CONTAINS("text") |
Substring match (case-insensitive) | Boolean | CONTAINS("keyword") |
META("field") |
Metadata field access | Any | META("field") == value |
Function Reference¶
| Function | Description | Example |
|---|---|---|
LENGTH |
Text length | LENGTH(content) > 100 |
COUNT |
Count occurrences | COUNT("word") > 3 |
NOW |
Current timestamp | META("date") < NOW() |
Type System¶
Register metadata field types for type-safe comparisons:
from nlql import register_meta_field, NumberType, DateType, TextType
# Register field types
register_meta_field("score", NumberType)
register_meta_field("created_at", DateType)
register_meta_field("status", TextType)
Then use in queries: