Embedding Models
Configure the embedding model used for dense vector search.
Available Models
| Model | Dimensions | Languages | Speed |
|---|---|---|---|
| BGE-M3 (default) | 1024 | 100+ | Fast |
| E5-Large | 1024 | 100+ | Medium |
| OpenAI Ada-002 | 1536 | 50+ | Fast |
| Custom | Variable | Variable | Variable |
Changing the Model
python
# Per-request
results = client.search(
query="...",
options={"embedding_model": "e5-large"}
)
# Organization default
client.settings.update({
"default_embedding_model": "e5-large"
})Using Custom Models
Bring your own embedding model:
python
client = LakehouseClient(
api_key="...",
embedding_endpoint="https://your-model.com/embed"
)Your endpoint must accept:
json
{
"texts": ["text1", "text2"],
"model": "your-model-name"
}And return:
json
{
"embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]]
}Model Comparison
BGE-M3 (Recommended)
- Best overall quality
- Excellent multi-language support
- Balanced speed/quality
E5-Large
- Strong English performance
- Good for domain-specific fine-tuning
OpenAI Ada-002
- Easy integration
- Good general performance
- Higher latency
Re-indexing
When changing models, re-index your documents:
python
client.documents.reindex(
model="new-model",
batch_size=100
)