Data Sources and Adapters¶
NLQL uses an explicit adapter pattern for data sources. This design ensures:
- Clear separation of concerns: Data source logic is separate from query execution
- High extensibility: Easy to add support for new data sources
- Type safety: Users know exactly which adapter they're using
- No magic: No auto-detection or hidden behavior
Users must explicitly create an adapter for their data source and pass it to NLQL.
Built-in Adapters¶
MemoryAdapter¶
Simple in-memory storage for testing, prototyping, and small datasets.
The MemoryAdapter provides several convenient methods for adding data:
from nlql import NLQL
from nlql.adapters import MemoryAdapter
# Create adapter
adapter = MemoryAdapter()
# Method 1: Add single text
adapter.add_text("AI agents are autonomous systems", {"topic": "AI"})
# Method 2: Add multiple texts at once
texts = [
"Machine learning powers modern AI",
"Natural language processing enables text understanding"
]
metadatas = [{"topic": "ML"}, {"topic": "NLP"}]
adapter.add_texts(texts, metadatas)
# Method 3: Add a long document with automatic chunking
long_document = """
This is a very long document that will be automatically split into chunks.
Each chunk will be around 500 characters by default.
You can customize the chunk size and overlap to suit your needs.
"""
adapter.add_document(
long_document,
metadata={"source": "paper.pdf"},
chunk_size=500,
chunk_overlap=50
)
# Method 4: Add individual chunks with full control
adapter.add_chunk("Custom chunk", {"custom": "metadata"}, chunk_id="my_id")
# Use with NLQL
nlql = NLQL(adapter=adapter)
results = nlql.execute("SELECT CHUNK LIMIT 10")
# Check adapter size
print(f"Total chunks: {len(adapter)}")
# Clear all data
adapter.clear()
Capabilities: - ✅ Metadata filtering - ✅ Batch operations - ✅ Automatic document chunking - ❌ Semantic search (no embeddings by default)
Vector Database Adapters¶
ChromaDB (Coming Soon)¶
import chromadb
from nlql import NLQL
from nlql.adapters import ChromaAdapter # Coming soon
# Create ChromaDB collection
client = chromadb.Client()
collection = client.create_collection("docs")
# Add documents to ChromaDB
collection.add(
documents=["AI agents are autonomous", "ML powers AI"],
ids=["1", "2"],
metadatas=[{"topic": "AI"}, {"topic": "ML"}],
)
# Create adapter and use with NLQL
adapter = ChromaAdapter(collection)
nlql = NLQL(adapter=adapter)
results = nlql.execute("""
SELECT CHUNK
WHERE SIMILAR_TO("artificial intelligence") > 0.7
LIMIT 5
""")
Capabilities: - ✅ Semantic search - ✅ Metadata filtering - ✅ Hybrid queries
FAISS (Coming Soon)¶
import faiss
from nlql import NLQL
from nlql.adapters import FAISSAdapter # Coming soon
# Create FAISS index
index = faiss.IndexFlatL2(384) # dimension
# Create adapter and use with NLQL
adapter = FAISSAdapter(index)
nlql = NLQL(adapter=adapter)
Qdrant (Coming Soon)¶
from qdrant_client import QdrantClient
from nlql import NLQL
from nlql.adapters import QdrantAdapter # Coming soon
client = QdrantClient(":memory:")
collection = client.get_collection("my_collection")
# Create adapter and use with NLQL
adapter = QdrantAdapter(collection)
nlql = NLQL(adapter=adapter)
Custom Adapters¶
Create custom adapters for your data sources:
from nlql.adapters import BaseAdapter, QueryPlan
from nlql.text.units import TextUnit, Chunk
class MyCustomAdapter(BaseAdapter):
def __init__(self, my_data_source):
self.source = my_data_source
def query(self, plan: QueryPlan) -> list[TextUnit]:
# Implement query logic
results = []
# Apply filters from plan
if plan.filters:
# Filter by metadata
pass
if plan.query_text:
# Perform semantic search
pass
# Convert to TextUnit objects
for item in results:
chunk = Chunk(
content=item["text"],
metadata=item.get("metadata", {}),
chunk_id=item["id"],
position=0,
)
results.append(chunk)
# Apply limit
if plan.limit:
results = results[:plan.limit]
return results
def supports_semantic_search(self) -> bool:
return True # If your source supports it
def supports_metadata_filter(self) -> bool:
return True # If your source supports it
Use your custom adapter:
Adapter Capabilities¶
Adapters declare their capabilities to enable query optimization:
| Capability | Description | Impact |
|---|---|---|
semantic_search |
Supports SIMILAR_TO | Enables push-down of similarity queries |
metadata_filter |
Supports META filters | Enables push-down of metadata conditions |
NLQL automatically routes queries based on these capabilities:
- Supported operations → Pushed down to data source (fast)
- Unsupported operations → Executed in-memory (slower, but flexible)
Best Practices¶
1. Choose the Right Adapter¶
- MemoryAdapter: Testing, small datasets, no semantic search needed
- Vector DB adapters: Production, large datasets, semantic search required
2. Leverage Push-down¶
Structure queries to maximize push-down:
-- Good: Filters pushed to vector DB
SELECT CHUNK
WHERE SIMILAR_TO("AI") > 0.8 AND META("topic") == "ML"
-- Less optimal: Complex logic requires in-memory execution
SELECT CHUNK
WHERE (SIMILAR_TO("AI") > 0.8 OR CONTAINS("machine learning"))
AND LENGTH(content) > 100
3. Use Appropriate Limits¶
Always use LIMIT to avoid retrieving too much data:
4. Index Metadata Fields¶
For vector databases, ensure metadata fields used in queries are indexed.
Next Steps¶
- Learn about Extensibility
- Check API Reference