Getting Started with NLQL¶

This guide will help you get started with NLQL, from installation to executing your first queries.

Installation¶

Basic Installation¶

Install NLQL with pip:

pip install python-nlql

This installs the core NLQL engine with the Lark parser. For semantic search capabilities, you'll also need an embedding provider.

With Embedding Support¶

For semantic similarity operations (SIMILAR_TO), install with text support:

pip install python-nlql[text]

This includes sentence-transformers for default embedding functionality.

With Vector Database Adapters¶

Install with specific vector database support:

# ChromaDB
pip install python-nlql[chroma]

# FAISS
pip install python-nlql[faiss]

# Qdrant
pip install python-nlql[qdrant]

# All adapters
pip install python-nlql[all]

Your First Query¶

Using In-Memory Data¶

The simplest way to get started is with the built-in MemoryAdapter:

from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter and add data
adapter = MemoryAdapter()
adapter.add_text("AI agents are autonomous systems", {"topic": "AI"})
adapter.add_text("Machine learning powers modern AI", {"topic": "ML"})
adapter.add_text("Natural language processing", {"topic": "NLP"})

# Initialize NLQL with explicit adapter
nlql = NLQL(adapter=adapter)

# Execute a simple query
results = nlql.execute("SELECT CHUNK LIMIT 2")

# Print results
for result in results:
    print(result.content)

Using a Vector Database¶

With ChromaDB (requires ChromaAdapter - coming soon):

import chromadb
from nlql import NLQL
from nlql.adapters import ChromaAdapter  # Coming soon

# Create ChromaDB client and collection
client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents
collection.add(
    documents=["AI agents are autonomous", "ML powers modern AI"],
    ids=["doc1", "doc2"],
)

# Create adapter and initialize NLQL
adapter = ChromaAdapter(collection)
nlql = NLQL(adapter=adapter)

# Query with semantic search
results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("artificial intelligence")
    LIMIT 5
""")

Basic Query Syntax¶

SELECT Clause¶

Choose the granularity of results:

-- Full documents
SELECT DOCUMENT

-- Chunks (default from vector DBs)
SELECT CHUNK

-- Individual sentences
SELECT SENTENCE

-- Sliding window with context
SELECT SPAN(SENTENCE, window=3)

WHERE Clause¶

Filter results with various operators:

-- Semantic similarity
WHERE SIMILAR_TO("AI agents") > 0.8

-- Text matching
WHERE CONTAINS("machine learning")

-- Metadata filtering
WHERE META("date") > "2024-01-01"

-- Combine conditions
WHERE SIMILAR_TO("AI") > 0.7 AND META("topic") == "ML"

ORDER BY and LIMIT¶

-- Order by similarity score
ORDER BY SIMILARITY DESC

-- Order by metadata field
ORDER BY META("date") DESC

-- Limit results
LIMIT 10

Complete Example¶

from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter
adapter = MemoryAdapter()

# Add documents with metadata
adapter.add_text(
    "AI agents can perceive their environment and take actions.",
    {"date": "2024-01-15", "author": "Alice", "topic": "AI"}
)
adapter.add_text(
    "Machine learning models learn from data without explicit programming.",
    {"date": "2024-01-20", "author": "Bob", "topic": "ML"}
)
adapter.add_text(
    "Natural language processing enables computers to understand human language.",
    {"date": "2024-01-25", "author": "Alice", "topic": "NLP"}
)

# Or use batch add
texts = [
    "AI agents can perceive their environment and take actions.",
    "Machine learning models learn from data without explicit programming.",
    "Natural language processing enables computers to understand human language.",
]
metadatas = [
    {"date": "2024-01-15", "author": "Alice", "topic": "AI"},
    {"date": "2024-01-20", "author": "Bob", "topic": "ML"},
    {"date": "2024-01-25", "author": "Alice", "topic": "NLP"},
]
adapter.add_texts(texts, metadatas)

# Initialize NLQL
nlql = NLQL(adapter=adapter)

# Execute a complex query
results = nlql.execute("""
    SELECT CHUNK
    WHERE META("author") == "Alice"
    LIMIT 5
""")

# Process results
for i, result in enumerate(results, 1):
    print(f"\n--- Result {i} ---")
    print(f"Content: {result.content}")
    print(f"Metadata: {result.metadata}")

Semantic Search Example¶

NLQL supports semantic search using the SIMILAR_TO operator, which uses vector embeddings to find semantically similar content:

from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter with AI-related content
adapter = MemoryAdapter()

adapter.add_text(
    "Artificial intelligence and machine learning are revolutionizing technology.",
    {"category": "AI", "author": "Alice", "year": 2024}
)

adapter.add_text(
    "Neural networks form the foundation of modern deep learning systems.",
    {"category": "ML", "author": "Bob", "year": 2024}
)

adapter.add_text(
    "Natural language processing enables computers to understand human language.",
    {"category": "NLP", "author": "Alice", "year": 2023}
)

nlql = NLQL(adapter=adapter)

# Semantic search query
results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("deep learning and neural networks") > 0.5
    ORDER BY SIMILARITY DESC
    LIMIT 3
""")

# Results are ordered by semantic similarity
for i, result in enumerate(results, 1):
    similarity = result.metadata['similarity']
    print(f"{i}. [{similarity:.3f}] {result.content}")
    print(f"   Category: {result.metadata['category']}\n")

Output:

1. [0.814] Neural networks form the foundation of modern deep learning systems.
   Category: ML

2. [0.777] Artificial intelligence and machine learning are revolutionizing technology.
   Category: AI

3. [0.609] Natural language processing enables computers to understand human language.
   Category: NLP

How Semantic Search Works¶

Automatic Vectorization: When you use SIMILAR_TO("query"), NLQL automatically:
Embeds the query text using the default model (all-MiniLM-L6-v2)
Embeds all text chunks in your data
Computes cosine similarity scores
Similarity Scores: The similarity score (0-1) is stored in metadata["similarity"] and can be:
Used in WHERE clause: WHERE SIMILAR_TO("query") > 0.8
Used in ORDER BY: ORDER BY SIMILARITY DESC
Accessed in results: result.metadata['similarity']
Hybrid Queries: Combine semantic search with metadata filtering:

results = nlql.execute("""
    SELECT CHUNK
    WHERE
        SIMILAR_TO("AI technology") > 0.6
        AND META("year") == 2024
        AND META("author") == "Alice"
    ORDER BY SIMILARITY DESC
""")

Installation for Semantic Search¶

To use semantic search, install with text support:

pip install python-nlql[text]

This installs sentence-transformers for the default embedding provider.

Extensibility (Optional)¶

NLQL is highly extensible. You can customize functions, operators, and embedding providers to fit your specific needs:

from nlql import register_function, register_operator

# Add custom function for WHERE/ORDER BY
@register_function("word_count")
def word_count(text: str) -> int:
    return len(text.split())

# Add domain-specific operator
@register_operator("HAS_EMAIL")
def has_email(text: str) -> bool:
    import re
    return bool(re.search(r'[\w\.-]+@[\w\.-]+', text))

# Use in queries
results = nlql.execute("""
    SELECT CHUNK
    WHERE word_count(content) > 50 AND HAS_EMAIL(content)
""")

What you can extend: - 🔧 Custom Functions: Add reusable logic for WHERE and ORDER BY clauses - 🎯 Custom Operators: Create domain-specific operators (e.g., HAS_EMAIL, REGEX) - 🤖 Embedding Providers: Use your own embedding models (OpenAI, Cohere, etc.) - 🏢 Instance-Level Registration: Different NLQL instances can have different implementations

📚 Learn More: See the Extensibility Guide for complete documentation and examples. Check the examples/ directory in the repository for runnable code samples.

Next Steps¶

Learn about Query Syntax in detail
Explore Data Sources and adapters
Discover Extensibility options for advanced customization
Check the API Reference