Skip to content

Query Syntax Reference

Complete reference for NLQL query syntax.

SELECT Clause

Granularity Levels

-- Full documents
SELECT DOCUMENT

-- Chunks (default from vector databases)
SELECT CHUNK

-- Individual sentences
SELECT SENTENCE

-- Sliding window with context
SELECT SPAN(SENTENCE, window=3)
SELECT SPAN(CHUNK, window=2)

WHERE Clause

Semantic Operators

SIMILAR_TO - Semantic Similarity

The SIMILAR_TO operator performs vector-based semantic search:

-- Basic semantic search
WHERE SIMILAR_TO("query text") > 0.8

-- Combine with other conditions
WHERE SIMILAR_TO("AI agents") > 0.7 AND META("status") == "published"

How it works:

  1. Automatic Vectorization: NLQL automatically embeds both the query text and all document chunks using the configured embedding provider (default: all-MiniLM-L6-v2)

  2. Similarity Computation: Computes cosine similarity between query and document vectors, returning a score between 0 and 1

  3. Score Storage: The similarity score is stored in metadata["similarity"] for each result, making it accessible in:

  4. WHERE clause: WHERE SIMILAR_TO("query") > threshold
  5. ORDER BY clause: ORDER BY SIMILARITY DESC
  6. Result metadata: result.metadata['similarity']

Important Notes:

  • Execution Order: Similarity is computed on the original chunks BEFORE granularity transformation (SENTENCE/SPAN)
  • Score Inheritance: When transforming to SENTENCE/SPAN, the similarity score is inherited from the parent chunk
  • Threshold Selection: Typical thresholds:
  • > 0.8: Very high similarity (near-duplicates)
  • > 0.6: High similarity (related content)
  • > 0.4: Moderate similarity (loosely related)
  • > 0.2: Low similarity (may include noise)

Example:

results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("machine learning applications") > 0.6
    ORDER BY SIMILARITY DESC
    LIMIT 5
""")

for result in results:
    print(f"[{result.metadata['similarity']:.3f}] {result.content}")

MATCH - Exact Text Match

Exact phrase matching (case-sensitive):

-- Match exact phrase
WHERE MATCH("exact phrase")

-- Combine with OR
WHERE MATCH("error") OR MATCH("warning")

CONTAINS - Substring Match

Case-insensitive substring matching:

-- Contains keyword (case-insensitive)
WHERE CONTAINS("keyword")

-- Combine with AND
WHERE CONTAINS("machine") AND CONTAINS("learning")

Metadata Operators

-- Access metadata fields
WHERE META("field_name") == "value"
WHERE META("date") > "2024-01-01"
WHERE META("score") >= 0.5

Comparison Operators

-- Numeric comparisons
WHERE META("score") > 0.8
WHERE META("count") <= 100
WHERE META("value") >= 50

-- Text comparisons
WHERE META("status") == "active"
WHERE META("category") != "archived"

-- Date comparisons (requires DateType registration)
WHERE META("created_at") > "2024-01-01"

Boolean Logic

-- AND
WHERE SIMILAR_TO("AI") > 0.7 AND META("topic") == "ML"

-- OR
WHERE CONTAINS("machine learning") OR CONTAINS("deep learning")

-- NOT
WHERE NOT CONTAINS("deprecated")

-- Complex combinations
WHERE (SIMILAR_TO("AI") > 0.8 OR CONTAINS("artificial intelligence"))
  AND META("date") > "2024-01-01"
  AND NOT META("archived") == true

Functions

-- Built-in functions
WHERE LENGTH(content) > 100
WHERE COUNT("AI") > 3

-- Custom functions (after registration)
WHERE word_count(content) > 50

ORDER BY Clause

-- Order by similarity score (for SIMILAR_TO queries)
ORDER BY SIMILARITY DESC

-- Order by metadata fields
ORDER BY META("date") DESC
ORDER BY META("score") ASC

-- Multiple fields
ORDER BY META("priority") DESC, META("date") DESC

LIMIT Clause

-- Limit number of results
LIMIT 10
LIMIT 100

Complete Examples

Basic Retrieval

SELECT CHUNK
WHERE CONTAINS("machine learning")
LIMIT 10
SELECT SENTENCE
WHERE SIMILAR_TO("AI agents and autonomous systems") > 0.75
ORDER BY SIMILARITY DESC
LIMIT 5

Metadata Filtering

SELECT DOCUMENT
WHERE META("author") == "Alice"
  AND META("date") > "2024-01-01"
ORDER BY META("date") DESC

Hybrid Query

SELECT SENTENCE
WHERE SIMILAR_TO("neural networks") > 0.7
  AND META("topic") == "deep learning"
  AND LENGTH(content) > 50
ORDER BY SIMILARITY DESC
LIMIT 20

Context Windows

SELECT SPAN(SENTENCE, window=2)
WHERE SIMILAR_TO("transformer architecture")
ORDER BY SIMILARITY DESC
LIMIT 5

Operator Reference

Operator Description Returns Example
SIMILAR_TO("text") Semantic similarity (vector-based) Float (0-1) SIMILAR_TO("query") > 0.8
MATCH("text") Exact text match (case-sensitive) Boolean MATCH("exact phrase")
CONTAINS("text") Substring match (case-insensitive) Boolean CONTAINS("keyword")
META("field") Metadata field access Any META("field") == value

Function Reference

Function Description Example
LENGTH Text length LENGTH(content) > 100
COUNT Count occurrences COUNT("word") > 3
NOW Current timestamp META("date") < NOW()

Type System

Register metadata field types for type-safe comparisons:

from nlql import register_meta_field, NumberType, DateType, TextType

# Register field types
register_meta_field("score", NumberType)
register_meta_field("created_at", DateType)
register_meta_field("status", TextType)

Then use in queries:

WHERE META("score") > 0.8
WHERE META("created_at") > "2024-01-01"
WHERE META("status") == "active"