Skip to content

Getting Started with NLQL

This guide will help you get started with NLQL, from installation to executing your first queries.

Installation

Basic Installation

Install NLQL with pip:

pip install python-nlql

This installs the core NLQL engine with the Lark parser. For semantic search capabilities, you'll also need an embedding provider.

With Embedding Support

For semantic similarity operations (SIMILAR_TO), install with text support:

pip install python-nlql[text]

This includes sentence-transformers for default embedding functionality.

With Vector Database Adapters

Install with specific vector database support:

# ChromaDB
pip install python-nlql[chroma]

# FAISS
pip install python-nlql[faiss]

# Qdrant
pip install python-nlql[qdrant]

# All adapters
pip install python-nlql[all]

Your First Query

Using In-Memory Data

The simplest way to get started is with the built-in MemoryAdapter:

from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter and add data
adapter = MemoryAdapter()
adapter.add_text("AI agents are autonomous systems", {"topic": "AI"})
adapter.add_text("Machine learning powers modern AI", {"topic": "ML"})
adapter.add_text("Natural language processing", {"topic": "NLP"})

# Initialize NLQL with explicit adapter
nlql = NLQL(adapter=adapter)

# Execute a simple query
results = nlql.execute("SELECT CHUNK LIMIT 2")

# Print results
for result in results:
    print(result.content)

Using a Vector Database

With ChromaDB (requires ChromaAdapter - coming soon):

import chromadb
from nlql import NLQL
from nlql.adapters import ChromaAdapter  # Coming soon

# Create ChromaDB client and collection
client = chromadb.Client()
collection = client.create_collection("my_docs")

# Add documents
collection.add(
    documents=["AI agents are autonomous", "ML powers modern AI"],
    ids=["doc1", "doc2"],
)

# Create adapter and initialize NLQL
adapter = ChromaAdapter(collection)
nlql = NLQL(adapter=adapter)

# Query with semantic search
results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("artificial intelligence")
    LIMIT 5
""")

Basic Query Syntax

SELECT Clause

Choose the granularity of results:

-- Full documents
SELECT DOCUMENT

-- Chunks (default from vector DBs)
SELECT CHUNK

-- Individual sentences
SELECT SENTENCE

-- Sliding window with context
SELECT SPAN(SENTENCE, window=3)

WHERE Clause

Filter results with various operators:

-- Semantic similarity
WHERE SIMILAR_TO("AI agents") > 0.8

-- Text matching
WHERE CONTAINS("machine learning")

-- Metadata filtering
WHERE META("date") > "2024-01-01"

-- Combine conditions
WHERE SIMILAR_TO("AI") > 0.7 AND META("topic") == "ML"

ORDER BY and LIMIT

-- Order by similarity score
ORDER BY SIMILARITY DESC

-- Order by metadata field
ORDER BY META("date") DESC

-- Limit results
LIMIT 10

Complete Example

from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter
adapter = MemoryAdapter()

# Add documents with metadata
adapter.add_text(
    "AI agents can perceive their environment and take actions.",
    {"date": "2024-01-15", "author": "Alice", "topic": "AI"}
)
adapter.add_text(
    "Machine learning models learn from data without explicit programming.",
    {"date": "2024-01-20", "author": "Bob", "topic": "ML"}
)
adapter.add_text(
    "Natural language processing enables computers to understand human language.",
    {"date": "2024-01-25", "author": "Alice", "topic": "NLP"}
)

# Or use batch add
texts = [
    "AI agents can perceive their environment and take actions.",
    "Machine learning models learn from data without explicit programming.",
    "Natural language processing enables computers to understand human language.",
]
metadatas = [
    {"date": "2024-01-15", "author": "Alice", "topic": "AI"},
    {"date": "2024-01-20", "author": "Bob", "topic": "ML"},
    {"date": "2024-01-25", "author": "Alice", "topic": "NLP"},
]
adapter.add_texts(texts, metadatas)

# Initialize NLQL
nlql = NLQL(adapter=adapter)

# Execute a complex query
results = nlql.execute("""
    SELECT CHUNK
    WHERE META("author") == "Alice"
    LIMIT 5
""")

# Process results
for i, result in enumerate(results, 1):
    print(f"\n--- Result {i} ---")
    print(f"Content: {result.content}")
    print(f"Metadata: {result.metadata}")

Semantic Search Example

NLQL supports semantic search using the SIMILAR_TO operator, which uses vector embeddings to find semantically similar content:

from nlql import NLQL
from nlql.adapters import MemoryAdapter

# Create adapter with AI-related content
adapter = MemoryAdapter()

adapter.add_text(
    "Artificial intelligence and machine learning are revolutionizing technology.",
    {"category": "AI", "author": "Alice", "year": 2024}
)

adapter.add_text(
    "Neural networks form the foundation of modern deep learning systems.",
    {"category": "ML", "author": "Bob", "year": 2024}
)

adapter.add_text(
    "Natural language processing enables computers to understand human language.",
    {"category": "NLP", "author": "Alice", "year": 2023}
)

nlql = NLQL(adapter=adapter)

# Semantic search query
results = nlql.execute("""
    SELECT CHUNK
    WHERE SIMILAR_TO("deep learning and neural networks") > 0.5
    ORDER BY SIMILARITY DESC
    LIMIT 3
""")

# Results are ordered by semantic similarity
for i, result in enumerate(results, 1):
    similarity = result.metadata['similarity']
    print(f"{i}. [{similarity:.3f}] {result.content}")
    print(f"   Category: {result.metadata['category']}\n")

Output:

1. [0.814] Neural networks form the foundation of modern deep learning systems.
   Category: ML

2. [0.777] Artificial intelligence and machine learning are revolutionizing technology.
   Category: AI

3. [0.609] Natural language processing enables computers to understand human language.
   Category: NLP

How Semantic Search Works

  1. Automatic Vectorization: When you use SIMILAR_TO("query"), NLQL automatically:
  2. Embeds the query text using the default model (all-MiniLM-L6-v2)
  3. Embeds all text chunks in your data
  4. Computes cosine similarity scores

  5. Similarity Scores: The similarity score (0-1) is stored in metadata["similarity"] and can be:

  6. Used in WHERE clause: WHERE SIMILAR_TO("query") > 0.8
  7. Used in ORDER BY: ORDER BY SIMILARITY DESC
  8. Accessed in results: result.metadata['similarity']

  9. Hybrid Queries: Combine semantic search with metadata filtering:

results = nlql.execute("""
    SELECT CHUNK
    WHERE
        SIMILAR_TO("AI technology") > 0.6
        AND META("year") == 2024
        AND META("author") == "Alice"
    ORDER BY SIMILARITY DESC
""")

To use semantic search, install with text support:

pip install python-nlql[text]

This installs sentence-transformers for the default embedding provider.

Extensibility (Optional)

NLQL is highly extensible. You can customize functions, operators, and embedding providers to fit your specific needs:

from nlql import register_function, register_operator

# Add custom function for WHERE/ORDER BY
@register_function("word_count")
def word_count(text: str) -> int:
    return len(text.split())

# Add domain-specific operator
@register_operator("HAS_EMAIL")
def has_email(text: str) -> bool:
    import re
    return bool(re.search(r'[\w\.-]+@[\w\.-]+', text))

# Use in queries
results = nlql.execute("""
    SELECT CHUNK
    WHERE word_count(content) > 50 AND HAS_EMAIL(content)
""")

What you can extend: - 🔧 Custom Functions: Add reusable logic for WHERE and ORDER BY clauses - 🎯 Custom Operators: Create domain-specific operators (e.g., HAS_EMAIL, REGEX) - 🤖 Embedding Providers: Use your own embedding models (OpenAI, Cohere, etc.) - 🏢 Instance-Level Registration: Different NLQL instances can have different implementations

📚 Learn More: See the Extensibility Guide for complete documentation and examples. Check the examples/ directory in the repository for runnable code samples.

Next Steps