Embedding Models

Configure the embedding model used for dense vector search.

Available Models

Model	Dimensions	Languages	Speed
BGE-M3 (default)	1024	100+	Fast
E5-Large	1024	100+	Medium
OpenAI Ada-002	1536	50+	Fast
Custom	Variable	Variable	Variable

Changing the Model

python

# Per-request
results = client.search(
    query="...",
    options={"embedding_model": "e5-large"}
)

# Organization default
client.settings.update({
    "default_embedding_model": "e5-large"
})

Using Custom Models

Bring your own embedding model:

python

client = LakehouseClient(
    api_key="...",
    embedding_endpoint="https://your-model.com/embed"
)

Your endpoint must accept:

json

{
  "texts": ["text1", "text2"],
  "model": "your-model-name"
}

And return:

json

{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]]
}

Model Comparison

BGE-M3 (Recommended)

Best overall quality
Excellent multi-language support
Balanced speed/quality

E5-Large

Strong English performance
Good for domain-specific fine-tuning

OpenAI Ada-002

Easy integration
Good general performance
Higher latency

Re-indexing

When changing models, re-index your documents:

python

client.documents.reindex(
    model="new-model",
    batch_size=100
)