Qdrant - Vector Similarity Search Engine

High-performance vector database written in Rust for production RAG and semantic search.

When to use Qdrant

Use Qdrant when:

Building production RAG systems requiring low latency Need hybrid search (vectors + metadata filtering) Require horizontal scaling with sharding/replication Want on-premise deployment with full data control Need multi-vector storage per record (dense + sparse) Building real-time recommendation systems

Key features:

Rust-powered: Memory-safe, high performance Rich filtering: Filter by any payload field during search Multiple vectors: Dense, sparse, multi-dense per point Quantization: Scalar, product, binary for memory efficiency Distributed: Raft consensus, sharding, replication REST + gRPC: Both APIs with full feature parity

Use alternatives instead:

Chroma: Simpler setup, embedded use cases FAISS: Maximum raw speed, research/batch processing Pinecone: Fully managed, zero ops preferred Weaviate: GraphQL preference, built-in vectorizers Quick start Installation

Python client

pip install qdrant-client

Docker (recommended for development)

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

Docker with persistent storage

docker run -p 6333:6333 -p 6334:6334 \ -v $(pwd)/qdrant_storage:/qdrant/storage \ qdrant/qdrant

Basic usage from qdrant_client import QdrantClient from qdrant_client.models import Distance, VectorParams, PointStruct

Connect to Qdrant

client = QdrantClient(host="localhost", port=6333)

Create collection

client.create_collection( collection_name="documents", vectors_config=VectorParams(size=384, distance=Distance.COSINE) )

Insert vectors with payload

client.upsert( collection_name="documents", points=[ PointStruct( id=1, vector=[0.1, 0.2, ...], # 384-dim vector payload={"title": "Doc 1", "category": "tech"} ), PointStruct( id=2, vector=[0.3, 0.4, ...], payload={"title": "Doc 2", "category": "science"} ) ] )

Search with filtering

results = client.search( collection_name="documents", query_vector=[0.15, 0.25, ...], query_filter={ "must": [{"key": "category", "match": {"value": "tech"}}] }, limit=10 )

for point in results: print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")

Core concepts Points - Basic data unit from qdrant_client.models import PointStruct

Point = ID + Vector(s) + Payload

point = PointStruct( id=123, # Integer or UUID string vector=[0.1, 0.2, 0.3, ...], # Dense vector payload={ # Arbitrary JSON metadata "title": "Document title", "category": "tech", "timestamp": 1699900000, "tags": ["python", "ml"] } )

Batch upsert (recommended)

client.upsert( collection_name="documents", points=[point1, point2, point3], wait=True # Wait for indexing )

Collections - Vector containers from qdrant_client.models import VectorParams, Distance, HnswConfigDiff

Create with HNSW configuration

client.create_collection( collection_name="documents", vectors_config=VectorParams( size=384, # Vector dimensions distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN ), hnsw_config=HnswConfigDiff( m=16, # Connections per node (default 16) ef_construct=100, # Build-time accuracy (default 100) full_scan_threshold=10000 # Switch to brute force below this ), on_disk_payload=True # Store payload on disk )

Collection info

info = client.get_collection("documents") print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")

Distance metrics Metric Use Case Range COSINE Text embeddings, normalized vectors 0 to 2 EUCLID Spatial data, image features 0 to ∞ DOT Recommendations, unnormalized -∞ to ∞ MANHATTAN Sparse features, discrete data 0 to ∞ Search operations Basic search

Simple nearest neighbor search

results = client.search( collection_name="documents", query_vector=[0.1, 0.2, ...], limit=10, with_payload=True, with_vectors=False # Don't return vectors (faster) )

Filtered search from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

Complex filtering

results = client.search( collection_name="documents", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="category", match=MatchValue(value="tech")), FieldCondition(key="timestamp", range=Range(gte=1699000000)) ], must_not=[ FieldCondition(key="status", match=MatchValue(value="archived")) ] ), limit=10 )

Shorthand filter syntax

results = client.search( collection_name="documents", query_vector=query_embedding, query_filter={ "must": [ {"key": "category", "match": {"value": "tech"}}, {"key": "price", "range": {"gte": 10, "lte": 100}} ] }, limit=10 )

Batch search from qdrant_client.models import SearchRequest

Multiple queries in one request

results = client.search_batch( collection_name="documents", requests=[ SearchRequest(vector=[0.1, ...], limit=5), SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}), SearchRequest(vector=[0.3, ...], limit=10) ] )

RAG integration With sentence-transformers from sentence_transformers import SentenceTransformer from qdrant_client import QdrantClient from qdrant_client.models import VectorParams, Distance, PointStruct

Initialize

encoder = SentenceTransformer("all-MiniLM-L6-v2") client = QdrantClient(host="localhost", port=6333)

Create collection

client.create_collection( collection_name="knowledge_base", vectors_config=VectorParams(size=384, distance=Distance.COSINE) )

Index documents

documents = [ {"id": 1, "text": "Python is a programming language", "source": "wiki"}, {"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"}, ]

points = [ PointStruct( id=doc["id"], vector=encoder.encode(doc["text"]).tolist(), payload={"text": doc["text"], "source": doc["source"]} ) for doc in documents ] client.upsert(collection_name="knowledge_base", points=points)

RAG retrieval

def retrieve(query: str, top_k: int = 5) -> list[dict]: query_vector = encoder.encode(query).tolist() results = client.search( collection_name="knowledge_base", query_vector=query_vector, limit=top_k ) return [{"text": r.payload["text"], "score": r.score} for r in results]

Use in RAG pipeline

context = retrieve("What is Python?") prompt = f"Context: {context}\n\nQuestion: What is Python?"

With LangChain from langchain_community.vectorstores import Qdrant from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2") vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs") retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

With LlamaIndex from llama_index.vector_stores.qdrant import QdrantVectorStore from llama_index.core import VectorStoreIndex, StorageContext

vector_store = QdrantVectorStore(client=client, collection_name="llama_docs") storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context) query_engine = index.as_query_engine()

Multi-vector support Named vectors (different embedding models) from qdrant_client.models import VectorParams, Distance

Collection with multiple vector types

client.create_collection( collection_name="hybrid_search", vectors_config={ "dense": VectorParams(size=384, distance=Distance.COSINE), "sparse": VectorParams(size=30000, distance=Distance.DOT) } )

Insert with named vectors

client.upsert( collection_name="hybrid_search", points=[ PointStruct( id=1, vector={ "dense": dense_embedding, "sparse": sparse_embedding }, payload={"text": "document text"} ) ] )

Search specific vector

results = client.search( collection_name="hybrid_search", query_vector=("dense", query_dense), # Specify which vector limit=10 )

Sparse vectors (BM25, SPLADE) from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector

Collection with sparse vectors

client.create_collection( collection_name="sparse_search", vectors_config={}, sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))} )

Insert sparse vector

client.upsert( collection_name="sparse_search", points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})] )

Quantization (memory optimization) from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType

Scalar quantization (4x memory reduction)

client.create_collection( collection_name="quantized", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization( scalar=ScalarQuantizationConfig( type=ScalarType.INT8, quantile=0.99, # Clip outliers always_ram=True # Keep quantized in RAM ) ) )

Search with rescoring

results = client.search( collection_name="quantized", query_vector=query, search_params={"quantization": {"rescore": True}}, # Rescore top results limit=10 )

Payload indexing from qdrant_client.models import PayloadSchemaType

Create payload index for faster filtering

client.create_payload_index( collection_name="documents", field_name="category", field_schema=PayloadSchemaType.KEYWORD )

client.create_payload_index( collection_name="documents", field_name="timestamp", field_schema=PayloadSchemaType.INTEGER )

Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL

Production deployment Qdrant Cloud from qdrant_client import QdrantClient

Connect to Qdrant Cloud

client = QdrantClient( url="https://your-cluster.cloud.qdrant.io", api_key="your-api-key" )

Performance tuning

Optimize for search speed (higher recall)

client.update_collection( collection_name="documents", hnsw_config=HnswConfigDiff(ef_construct=200, m=32) )

Optimize for indexing speed (bulk loads)

client.update_collection( collection_name="documents", optimizer_config={"indexing_threshold": 20000} )

Best practices Batch operations - Use batch upsert/search for efficiency Payload indexing - Index fields used in filters Quantization - Enable for large collections (>1M vectors) Sharding - Use for collections >10M vectors On-disk storage - Enable on_disk_payload for large payloads Connection pooling - Reuse client instances Common issues

Slow search with filters:

Create payload index for filtered fields

client.create_payload_index( collection_name="docs", field_name="category", field_schema=PayloadSchemaType.KEYWORD )

Out of memory:

Enable quantization and on-disk storage

client.create_collection( collection_name="large_collection", vectors_config=VectorParams(size=384, distance=Distance.COSINE), quantization_config=ScalarQuantization(...), on_disk_payload=True )

Connection issues:

Use timeout and retry

client = QdrantClient( host="localhost", port=6333, timeout=30, prefer_grpc=True # gRPC for better performance )

References Advanced Usage - Distributed mode, hybrid search, recommendations Troubleshooting - Common issues, debugging, performance tuning Resources GitHub: https://github.com/qdrant/qdrant (22k+ stars) Docs: https://qdrant.tech/documentation/ Python Client: https://github.com/qdrant/qdrant-client Cloud: https://cloud.qdrant.io Version: 1.12.0+ License: Apache 2.0

qdrant-vector-search

安装

Python client

Docker (recommended for development)

Docker with persistent storage

Connect to Qdrant

Create collection

Insert vectors with payload

Search with filtering

Point = ID + Vector(s) + Payload

Batch upsert (recommended)

Create with HNSW configuration

Collection info

Simple nearest neighbor search

Complex filtering

Shorthand filter syntax

Multiple queries in one request

Initialize

Create collection

Index documents

RAG retrieval

Use in RAG pipeline

Collection with multiple vector types

Insert with named vectors

Search specific vector

Collection with sparse vectors

Insert sparse vector

Scalar quantization (4x memory reduction)

Search with rescoring

Create payload index for faster filtering

Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL

Connect to Qdrant Cloud

Optimize for search speed (higher recall)

Optimize for indexing speed (bulk loads)

Create payload index for filtered fields

Enable quantization and on-disk storage

Use timeout and retry