pyseekdb.utils.embedding_functions
Embedding function implementations for pyseekdb.
This module provides various embedding function implementations that can be used with pyseekdb collections.
Classes
|
A convenient embedding function for Amazon Bedrock embedding models using boto3. |
|
A convenient embedding function for Cohere embedding models using LiteLLM. |
|
A convenient embedding function for Google Vertex AI embedding models. |
|
A convenient embedding function for Jina AI embedding models. |
|
A custom embedding function using LiteLLM to access various embedding models. |
|
A convenient embedding function for Mistral text embedding models. |
|
A convenient embedding function for Morph embedding models. |
|
A convenient embedding function for Ollama embedding models. |
|
Generic ONNX runtime embedding function. |
|
Base embedding function for OpenAI-compatible embedding APIs. |
|
A convenient embedding function for OpenAI embedding models. |
|
A convenient embedding function for Qwen (Alibaba Cloud) embedding models. |
An embedding function using sentence-transformers with a specific model. |
|
|
A convenient embedding function for SiliconFlow embedding models. |
A convenient embedding function for Tencent Hunyuan embedding models. |
|
|
A convenient embedding function for Voyage AI embedding models. |
- class pyseekdb.utils.embedding_functions.AmazonBedrockEmbeddingFunction(session: Any, model_name: str = 'amazon.titan-embed-text-v2', **kwargs: Any)[source]
Bases:
EmbeddingFunction[str|list[str]]A convenient embedding function for Amazon Bedrock embedding models using boto3.
For more information about Amazon Bedrock models, see https://docs.aws.amazon.com/bedrock/
This embedding function runs remotely on Amazon Bedrock’s servers, and requires AWS credentials configured via boto3.
Example
pip install pyseekdb boto3
- static build_from_config(config: dict[str, Any]) AmazonBedrockEmbeddingFunction[source]
Build an AmazonBedrockEmbeddingFunction from its configuration dictionary.
- Parameters:
config – Dictionary containing the embedding function’s configuration. Note: AWS credentials are NOT stored in config for security reasons. Credentials should be provided via environment variables, IAM roles, or passed as additional parameters.
- Returns:
Restored AmazonBedrockEmbeddingFunction instance
- Raises:
ValueError – If the configuration is invalid or missing required fields
- property dimension: int
Get the dimension of embeddings produced by this function.
Returns the known dimension for models without making an API call. If the model is in the known dimensions list, that value is returned.
If the model is not in the known dimensions list, falls back to making an API call to get the embedding and infer the dimension.
- Returns:
The dimension of embeddings for this model.
- Return type:
int
- get_config() dict[str, Any][source]
Get the configuration dictionary for the AmazonBedrockEmbeddingFunction.
- Returns:
Dictionary containing configuration needed to restore this embedding function. Note: AWS credentials are NOT stored in the config for security reasons. Credentials should be provided via environment variables, IAM roles, or passed as parameters when restoring.
- class pyseekdb.utils.embedding_functions.CohereEmbeddingFunction(model_name: str = 'embed-english-v3.0', api_key_env: str | None = None, input_type: str | None = None, **kwargs: Any)[source]
Bases:
LiteLLMBaseEmbeddingFunctionA convenient embedding function for Cohere embedding models using LiteLLM.
For more information about Cohere models, see https://docs.cohere.com/docs/cohere-embed
For LiteLLM documentation, see https://docs.litellm.ai/docs/embedding/supported_embedding
Example
pip install pyseekdb litellm
- static build_from_config(config: dict[str, Any]) CohereEmbeddingFunction[source]
Build a CohereEmbeddingFunction from its configuration dictionary.
- Parameters:
config – Dictionary containing the embedding function’s configuration
- Returns:
Restored CohereEmbeddingFunction instance
- Raises:
ValueError – If the configuration is invalid or missing required fields
- property dimension: int
Get the dimension of embeddings produced by this function.
Returns the known dimension for models without making an API call. If the model is in the known dimensions list, that value is returned.
If the model is not in the known dimensions list, falls back to making an API call to get the embedding and infer the dimension.
- Returns:
The dimension of embeddings for this model.
- Return type:
int
- class pyseekdb.utils.embedding_functions.GoogleVertexEmbeddingFunction(model_name: str = 'textembedding-gecko', project_id: str = 'cloud-large-language-models', region: str = 'us-central1', api_key_env: str | None = 'GOOGLE_VERTEX_API_KEY')[source]
Bases:
EmbeddingFunction[str|list[str]]A convenient embedding function for Google Vertex AI embedding models.
For more information about Google Vertex AI models, see https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api
Example
pip install pyseekdb google-cloud-aiplatform
- get_config() dict[str, Any][source]
Get the configuration dictionary for the embedding function.
This method should return a dictionary that contains all the information needed to restore the embedding function after restart.
- Returns:
Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.
- class pyseekdb.utils.embedding_functions.JinaEmbeddingFunction(model_name: str = 'jina-embeddings-v3', api_key_env: str | None = None, **kwargs: Any)[source]
Bases:
LiteLLMBaseEmbeddingFunctionA convenient embedding function for Jina AI embedding models.
This class provides a simplified interface to Jina AI embedding models using LiteLLM.
For more information about Jina AI models, see https://jina.ai/embeddings
For LiteLLM documentation, see https://docs.litellm.ai/docs/embedding/supported_embedding
Example
pip install pyseekdb litellm
- static build_from_config(config: dict[str, Any]) JinaEmbeddingFunction[source]
Build a JinaEmbeddingFunction from its configuration dictionary.
- Parameters:
config – Dictionary containing the embedding function’s configuration
- Returns:
Restored JinaEmbeddingFunction instance
- Raises:
ValueError – If the configuration is invalid or missing required fields
- property dimension: int
Get the dimension of embeddings produced by this function.
Returns the known dimension for models without making an API call. If the model is in the known dimensions list, that value is returned.
If the model is not in the known dimensions list, falls back to making an API call to get the embedding and infer the dimension.
- Returns:
The dimension of embeddings for this model.
- Return type:
int
- class pyseekdb.utils.embedding_functions.LiteLLMBaseEmbeddingFunction(model_name: str, api_key_env: str | None = None, **kwargs: Any)[source]
Bases:
EmbeddingFunction[str|list[str]]A custom embedding function using LiteLLM to access various embedding models.
LiteLLM provides a unified interface to access embedding models from multiple providers including OpenAI, Hugging Face, Cohere, and many others.
You can extend this class to create your own embedding function by overriding the __call__ method. See https://docs.litellm.ai/docs/embedding/supported_embedding for more information.
Example
pip install pyseekdb litellm
- class pyseekdb.utils.embedding_functions.MistralEmbeddingFunction(model_name: str = 'mistral-embed', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]
Bases:
OpenAIBaseEmbeddingFunctionA convenient embedding function for Mistral text embedding models.
This class provides a simplified interface to Mistral text embeddings using the OpenAI-compatible API.
Note: The embeddings API only accepts the model name and input texts.
For more information about Mistral embeddings, see: https://docs.mistral.ai/capabilities/embeddings/text_embeddings
Example
pip install pyseekdb openai
- class pyseekdb.utils.embedding_functions.MorphEmbeddingFunction(model_name: str, api_key_env: str | None = None, api_base: str | None = None, **kwargs: Any)[source]
Bases:
OpenAIBaseEmbeddingFunctionA convenient embedding function for Morph embedding models.
This class provides a simplified interface to Morph embedding models using the OpenAI-compatible API.
Example
pip install pyseekdb openai
- class pyseekdb.utils.embedding_functions.OllamaEmbeddingFunction(model_name: str = 'nomic-embed-text', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]
Bases:
OpenAIBaseEmbeddingFunctionA convenient embedding function for Ollama embedding models.
This class provides a simplified interface to Ollama embedding models using the OpenAI-compatible API. Ollama provides OpenAI-compatible API endpoints for embedding generation.
For more information about Ollama, see https://docs.ollama.com/
Note: Before using a model, you need to pull it locally using ollama pull <model_name>.
Example
pip install pyseekdb openai
- class pyseekdb.utils.embedding_functions.OnnxEmbeddingFunction(model_name: str, hf_model_id: str, dimension: int, download_path: Path | None = None, preferred_providers: list[str] | None = None)[source]
Bases:
objectGeneric ONNX runtime embedding function.
This class handles model download, tokenizer/model loading, and embedding generation using onnxruntime.
- property dimension: int
Get the dimension of embeddings produced by this function.
- property model: Any
Get the model.
- Returns:
The model.
- property tokenizer: Any
Get the tokenizer for the model.
- Returns:
The tokenizer for the model.
- class pyseekdb.utils.embedding_functions.OpenAIBaseEmbeddingFunction(model_name: str, api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]
Bases:
EmbeddingFunction[str|list[str]]Base embedding function for OpenAI-compatible embedding APIs.
This class provides a common implementation for embedding functions that use OpenAI-compatible APIs. It uses the openai package to make API calls.
Subclasses should override: - _get_default_api_base(): Return the default API base URL - _get_default_api_key_env(): Return the default API key environment variable name - _get_model_dimensions(): Return a dict mapping model names to their default dimensions - Optionally override __init__ to set model-specific defaults
Example: .. code-block:: python
import pyseekdb from pyseekdb.utils.embedding_functions import OpenAIBaseEmbeddingFunction
- class MyEmbeddingFunction(OpenAIBaseEmbeddingFunction):
- def _get_default_api_base(self):
return “https://api.example.com/v1”
- def _get_default_api_key_env(self):
return “MY_API_KEY”
- def _get_model_dimensions(self):
return {“model-v1”: 1536, “model-v2”: 1024}
- property dimension: int
Get the dimension of embeddings produced by this function.
Returns the known dimension for models without making an API call. If the dimensions parameter is specified, that value is returned. Otherwise, the default dimension for the model is returned.
If the model is not in the known dimensions list, falls back to calling the parent’s dimension detection (which may make an API call).
- Returns:
The dimension of embeddings for this model.
- Return type:
int
- class pyseekdb.utils.embedding_functions.OpenAIEmbeddingFunction(model_name: str = 'text-embedding-3-small', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]
Bases:
OpenAIBaseEmbeddingFunctionA convenient embedding function for OpenAI embedding models.
This class provides a simplified interface to OpenAI embedding models using the OpenAI API.
For more information about OpenAI models, see https://platform.openai.com/docs/guides/embeddings
Example
pip install pyseekdb openai
- class pyseekdb.utils.embedding_functions.QwenEmbeddingFunction(model_name: str, api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]
Bases:
OpenAIBaseEmbeddingFunctionA convenient embedding function for Qwen (Alibaba Cloud) embedding models.
This class provides a simplified interface to Qwen embedding models using the OpenAI-compatible API. Qwen provides OpenAI-compatible API endpoints for embedding generation.
Example
pip install pyseekdb openai
- class pyseekdb.utils.embedding_functions.SentenceTransformerEmbeddingFunction(model_name: str = 'all-MiniLM-L6-v2', device: str = 'cpu', normalize_embeddings: bool = False, **kwargs: Any)[source]
Bases:
EmbeddingFunction[str|list[str]]An embedding function using sentence-transformers with a specific model.
Example
pip install pyseekdb sentence-transformers
- get_config() dict[str, Any][source]
Get the configuration dictionary for the embedding function.
This method should return a dictionary that contains all the information needed to restore the embedding function after restart.
- Returns:
Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.
- class pyseekdb.utils.embedding_functions.SiliconflowEmbeddingFunction(model_name: str = 'BAAI/bge-large-zh-v1.5', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]
Bases:
OpenAIBaseEmbeddingFunctionA convenient embedding function for SiliconFlow embedding models.
This class provides a simplified interface to SiliconFlow embedding models using the OpenAI-compatible API. SiliconFlow provides OpenAI-compatible API endpoints for embedding generation.
For more information about SiliconFlow models, see https://docs.siliconflow.cn/en/api-reference/embeddings/create-embeddings
Example
pip install pyseekdb openai
- class pyseekdb.utils.embedding_functions.TencentHunyuanEmbeddingFunction(model_name: str = 'hunyuan-embedding', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]
Bases:
OpenAIBaseEmbeddingFunctionA convenient embedding function for Tencent Hunyuan embedding models.
This class provides a simplified interface to Tencent Hunyuan embedding models using the OpenAI-compatible API. Tencent Hunyuan provides OpenAI-compatible API endpoints for embedding generation.
For more information about Tencent Hunyuan models, see https://cloud.tencent.com/document/product/1729/111007
Note: The embedding interface currently only supports input and model parameters. The model is fixed as hunyuan-embedding and dimensions are fixed at 1024.
Example
pip install pyseekdb openai
- property dimension: int
Get the dimension of embeddings produced by this function.
Returns the known dimension for models without making an API call. If the dimensions parameter is specified, that value is returned. Otherwise, the default dimension for the model is returned.
If the model is not in the known dimensions list, falls back to calling the parent’s dimension detection (which may make an API call).
- Returns:
The dimension of embeddings for this model.
- Return type:
int
- class pyseekdb.utils.embedding_functions.VoyageaiEmbeddingFunction(model_name: str = 'voyage-4-large', api_key_env: str | None = None, input_type: str | None = None, truncation: bool | None = None, output_dimension: int | None = None, **kwargs: Any)[source]
Bases:
EmbeddingFunction[str|list[str]]A convenient embedding function for Voyage AI embedding models.
This class provides a simplified interface to Voyage AI embedding models using the voyageai package.
For more information about Voyage AI models, see https://docs.voyageai.com/docs/embeddings
Example
pip install pyseekdb voyageai
- get_config() dict[str, Any][source]
Get the configuration dictionary for the embedding function.
This method should return a dictionary that contains all the information needed to restore the embedding function after restart.
- Returns:
Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.