pyseekdb.utils.embedding_functions

Embedding function implementations for pyseekdb.

This module provides various embedding function implementations that can be used with pyseekdb collections.

Classes

AmazonBedrockEmbeddingFunction(session[, ...])

A convenient embedding function for Amazon Bedrock embedding models using boto3.

CohereEmbeddingFunction([model_name, ...])

A convenient embedding function for Cohere embedding models using LiteLLM.

GoogleVertexEmbeddingFunction([model_name, ...])

A convenient embedding function for Google Vertex AI embedding models.

JinaEmbeddingFunction([model_name, api_key_env])

A convenient embedding function for Jina AI embedding models.

LiteLLMBaseEmbeddingFunction(model_name[, ...])

A custom embedding function using LiteLLM to access various embedding models.

MistralEmbeddingFunction([model_name, ...])

A convenient embedding function for Mistral text embedding models.

MorphEmbeddingFunction(model_name[, ...])

A convenient embedding function for Morph embedding models.

OllamaEmbeddingFunction([model_name, ...])

A convenient embedding function for Ollama embedding models.

OnnxEmbeddingFunction(model_name, ...[, ...])

Generic ONNX runtime embedding function.

OpenAIBaseEmbeddingFunction(model_name[, ...])

Base embedding function for OpenAI-compatible embedding APIs.

OpenAIEmbeddingFunction([model_name, ...])

A convenient embedding function for OpenAI embedding models.

QwenEmbeddingFunction(model_name[, ...])

A convenient embedding function for Qwen (Alibaba Cloud) embedding models.

SentenceTransformerEmbeddingFunction([...])

An embedding function using sentence-transformers with a specific model.

SiliconflowEmbeddingFunction([model_name, ...])

A convenient embedding function for SiliconFlow embedding models.

TencentHunyuanEmbeddingFunction([...])

A convenient embedding function for Tencent Hunyuan embedding models.

VoyageaiEmbeddingFunction([model_name, ...])

A convenient embedding function for Voyage AI embedding models.

class pyseekdb.utils.embedding_functions.AmazonBedrockEmbeddingFunction(session: Any, model_name: str = 'amazon.titan-embed-text-v2', **kwargs: Any)[source]

Bases: EmbeddingFunction[str | list[str]]

A convenient embedding function for Amazon Bedrock embedding models using boto3.

For more information about Amazon Bedrock models, see https://docs.aws.amazon.com/bedrock/

This embedding function runs remotely on Amazon Bedrock’s servers, and requires AWS credentials configured via boto3.

Example

pip install pyseekdb boto3

static build_from_config(config: dict[str, Any]) AmazonBedrockEmbeddingFunction[source]

Build an AmazonBedrockEmbeddingFunction from its configuration dictionary.

Parameters:

config – Dictionary containing the embedding function’s configuration. Note: AWS credentials are NOT stored in config for security reasons. Credentials should be provided via environment variables, IAM roles, or passed as additional parameters.

Returns:

Restored AmazonBedrockEmbeddingFunction instance

Raises:

ValueError – If the configuration is invalid or missing required fields

property dimension: int

Get the dimension of embeddings produced by this function.

Returns the known dimension for models without making an API call. If the model is in the known dimensions list, that value is returned.

If the model is not in the known dimensions list, falls back to making an API call to get the embedding and infer the dimension.

Returns:

The dimension of embeddings for this model.

Return type:

int

get_config() dict[str, Any][source]

Get the configuration dictionary for the AmazonBedrockEmbeddingFunction.

Returns:

Dictionary containing configuration needed to restore this embedding function. Note: AWS credentials are NOT stored in the config for security reasons. Credentials should be provided via environment variables, IAM roles, or passed as parameters when restoring.

class pyseekdb.utils.embedding_functions.CohereEmbeddingFunction(model_name: str = 'embed-english-v3.0', api_key_env: str | None = None, input_type: str | None = None, **kwargs: Any)[source]

Bases: LiteLLMBaseEmbeddingFunction

A convenient embedding function for Cohere embedding models using LiteLLM.

For more information about Cohere models, see https://docs.cohere.com/docs/cohere-embed

For LiteLLM documentation, see https://docs.litellm.ai/docs/embedding/supported_embedding

Example

pip install pyseekdb litellm

static build_from_config(config: dict[str, Any]) CohereEmbeddingFunction[source]

Build a CohereEmbeddingFunction from its configuration dictionary.

Parameters:

config – Dictionary containing the embedding function’s configuration

Returns:

Restored CohereEmbeddingFunction instance

Raises:

ValueError – If the configuration is invalid or missing required fields

property dimension: int

Get the dimension of embeddings produced by this function.

Returns the known dimension for models without making an API call. If the model is in the known dimensions list, that value is returned.

If the model is not in the known dimensions list, falls back to making an API call to get the embedding and infer the dimension.

Returns:

The dimension of embeddings for this model.

Return type:

int

get_config() dict[str, Any][source]

Get the configuration dictionary for the CohereEmbeddingFunction.

Returns:

Dictionary containing configuration needed to restore this embedding function

static name() str[source]

Get the unique name identifier for CohereEmbeddingFunction.

Returns:

The name identifier for this embedding function type

class pyseekdb.utils.embedding_functions.GoogleVertexEmbeddingFunction(model_name: str = 'textembedding-gecko', project_id: str = 'cloud-large-language-models', region: str = 'us-central1', api_key_env: str | None = 'GOOGLE_VERTEX_API_KEY')[source]

Bases: EmbeddingFunction[str | list[str]]

A convenient embedding function for Google Vertex AI embedding models.

For more information about Google Vertex AI models, see https://docs.cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api

Example

pip install pyseekdb google-cloud-aiplatform

get_config() dict[str, Any][source]

Get the configuration dictionary for the embedding function.

This method should return a dictionary that contains all the information needed to restore the embedding function after restart.

Returns:

Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.

class pyseekdb.utils.embedding_functions.JinaEmbeddingFunction(model_name: str = 'jina-embeddings-v3', api_key_env: str | None = None, **kwargs: Any)[source]

Bases: LiteLLMBaseEmbeddingFunction

A convenient embedding function for Jina AI embedding models.

This class provides a simplified interface to Jina AI embedding models using LiteLLM.

For more information about Jina AI models, see https://jina.ai/embeddings

For LiteLLM documentation, see https://docs.litellm.ai/docs/embedding/supported_embedding

Example

pip install pyseekdb litellm

static build_from_config(config: dict[str, Any]) JinaEmbeddingFunction[source]

Build a JinaEmbeddingFunction from its configuration dictionary.

Parameters:

config – Dictionary containing the embedding function’s configuration

Returns:

Restored JinaEmbeddingFunction instance

Raises:

ValueError – If the configuration is invalid or missing required fields

property dimension: int

Get the dimension of embeddings produced by this function.

Returns the known dimension for models without making an API call. If the model is in the known dimensions list, that value is returned.

If the model is not in the known dimensions list, falls back to making an API call to get the embedding and infer the dimension.

Returns:

The dimension of embeddings for this model.

Return type:

int

get_config() dict[str, Any][source]

Get the configuration dictionary for the JinaEmbeddingFunction.

Returns:

Dictionary containing configuration needed to restore this embedding function

static name() str[source]

Get the unique name identifier for JinaEmbeddingFunction.

Returns:

The name identifier for this embedding function type

class pyseekdb.utils.embedding_functions.LiteLLMBaseEmbeddingFunction(model_name: str, api_key_env: str | None = None, **kwargs: Any)[source]

Bases: EmbeddingFunction[str | list[str]]

A custom embedding function using LiteLLM to access various embedding models.

LiteLLM provides a unified interface to access embedding models from multiple providers including OpenAI, Hugging Face, Cohere, and many others.

You can extend this class to create your own embedding function by overriding the __call__ method. See https://docs.litellm.ai/docs/embedding/supported_embedding for more information.

Example

pip install pyseekdb litellm

class pyseekdb.utils.embedding_functions.MistralEmbeddingFunction(model_name: str = 'mistral-embed', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]

Bases: OpenAIBaseEmbeddingFunction

A convenient embedding function for Mistral text embedding models.

This class provides a simplified interface to Mistral text embeddings using the OpenAI-compatible API.

Note: The embeddings API only accepts the model name and input texts.

For more information about Mistral embeddings, see: https://docs.mistral.ai/capabilities/embeddings/text_embeddings

Example

pip install pyseekdb openai

get_config() dict[str, Any][source]

Get the configuration dictionary for the OpenAIBaseEmbeddingFunction.

Subclasses should override the name() method to provide the correct name for routing.

Returns:

Dictionary containing configuration needed to restore this embedding function

class pyseekdb.utils.embedding_functions.MorphEmbeddingFunction(model_name: str, api_key_env: str | None = None, api_base: str | None = None, **kwargs: Any)[source]

Bases: OpenAIBaseEmbeddingFunction

A convenient embedding function for Morph embedding models.

This class provides a simplified interface to Morph embedding models using the OpenAI-compatible API.

Example

pip install pyseekdb openai

get_config() dict[str, Any][source]

Get the configuration dictionary for the OpenAIBaseEmbeddingFunction.

Subclasses should override the name() method to provide the correct name for routing.

Returns:

Dictionary containing configuration needed to restore this embedding function

static name() str[source]

Get the unique name identifier for MorphEmbeddingFunction.

Returns:

The name identifier for this embedding function type

class pyseekdb.utils.embedding_functions.OllamaEmbeddingFunction(model_name: str = 'nomic-embed-text', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]

Bases: OpenAIBaseEmbeddingFunction

A convenient embedding function for Ollama embedding models.

This class provides a simplified interface to Ollama embedding models using the OpenAI-compatible API. Ollama provides OpenAI-compatible API endpoints for embedding generation.

For more information about Ollama, see https://docs.ollama.com/

Note: Before using a model, you need to pull it locally using ollama pull <model_name>.

Example

pip install pyseekdb openai

get_config() dict[str, Any][source]

Get the configuration dictionary for the OpenAIBaseEmbeddingFunction.

Subclasses should override the name() method to provide the correct name for routing.

Returns:

Dictionary containing configuration needed to restore this embedding function

static name() str[source]

Get the unique name identifier for OllamaEmbeddingFunction.

Returns:

The name identifier for this embedding function type

class pyseekdb.utils.embedding_functions.OnnxEmbeddingFunction(model_name: str, hf_model_id: str, dimension: int, download_path: Path | None = None, preferred_providers: list[str] | None = None)[source]

Bases: object

Generic ONNX runtime embedding function.

This class handles model download, tokenizer/model loading, and embedding generation using onnxruntime.

property dimension: int

Get the dimension of embeddings produced by this function.

max_tokens() int[source]

Get the maximum number of tokens supported by the model.

property model: Any

Get the model.

Returns:

The model.

property tokenizer: Any

Get the tokenizer for the model.

Returns:

The tokenizer for the model.

class pyseekdb.utils.embedding_functions.OpenAIBaseEmbeddingFunction(model_name: str, api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]

Bases: EmbeddingFunction[str | list[str]]

Base embedding function for OpenAI-compatible embedding APIs.

This class provides a common implementation for embedding functions that use OpenAI-compatible APIs. It uses the openai package to make API calls.

Subclasses should override: - _get_default_api_base(): Return the default API base URL - _get_default_api_key_env(): Return the default API key environment variable name - _get_model_dimensions(): Return a dict mapping model names to their default dimensions - Optionally override __init__ to set model-specific defaults

Example: .. code-block:: python

import pyseekdb from pyseekdb.utils.embedding_functions import OpenAIBaseEmbeddingFunction

class MyEmbeddingFunction(OpenAIBaseEmbeddingFunction):
def _get_default_api_base(self):

return “https://api.example.com/v1

def _get_default_api_key_env(self):

return “MY_API_KEY”

def _get_model_dimensions(self):

return {“model-v1”: 1536, “model-v2”: 1024}

property dimension: int

Get the dimension of embeddings produced by this function.

Returns the known dimension for models without making an API call. If the dimensions parameter is specified, that value is returned. Otherwise, the default dimension for the model is returned.

If the model is not in the known dimensions list, falls back to calling the parent’s dimension detection (which may make an API call).

Returns:

The dimension of embeddings for this model.

Return type:

int

get_config() dict[str, Any][source]

Get the configuration dictionary for the OpenAIBaseEmbeddingFunction.

Subclasses should override the name() method to provide the correct name for routing.

Returns:

Dictionary containing configuration needed to restore this embedding function

class pyseekdb.utils.embedding_functions.OpenAIEmbeddingFunction(model_name: str = 'text-embedding-3-small', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]

Bases: OpenAIBaseEmbeddingFunction

A convenient embedding function for OpenAI embedding models.

This class provides a simplified interface to OpenAI embedding models using the OpenAI API.

For more information about OpenAI models, see https://platform.openai.com/docs/guides/embeddings

Example

pip install pyseekdb openai

get_config() dict[str, Any][source]

Get the configuration dictionary for the OpenAIBaseEmbeddingFunction.

Subclasses should override the name() method to provide the correct name for routing.

Returns:

Dictionary containing configuration needed to restore this embedding function

class pyseekdb.utils.embedding_functions.QwenEmbeddingFunction(model_name: str, api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]

Bases: OpenAIBaseEmbeddingFunction

A convenient embedding function for Qwen (Alibaba Cloud) embedding models.

This class provides a simplified interface to Qwen embedding models using the OpenAI-compatible API. Qwen provides OpenAI-compatible API endpoints for embedding generation.

Example

pip install pyseekdb openai

get_config() dict[str, Any][source]

Get the configuration dictionary for the QwenEmbeddingFunction.

Returns:

Dictionary containing configuration needed to restore this embedding function

static name() str[source]

Get the unique name identifier for QwenEmbeddingFunction.

Returns:

The name identifier for this embedding function type

class pyseekdb.utils.embedding_functions.SentenceTransformerEmbeddingFunction(model_name: str = 'all-MiniLM-L6-v2', device: str = 'cpu', normalize_embeddings: bool = False, **kwargs: Any)[source]

Bases: EmbeddingFunction[str | list[str]]

An embedding function using sentence-transformers with a specific model.

Example

pip install pyseekdb sentence-transformers

get_config() dict[str, Any][source]

Get the configuration dictionary for the embedding function.

This method should return a dictionary that contains all the information needed to restore the embedding function after restart.

Returns:

Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.

class pyseekdb.utils.embedding_functions.SiliconflowEmbeddingFunction(model_name: str = 'BAAI/bge-large-zh-v1.5', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]

Bases: OpenAIBaseEmbeddingFunction

A convenient embedding function for SiliconFlow embedding models.

This class provides a simplified interface to SiliconFlow embedding models using the OpenAI-compatible API. SiliconFlow provides OpenAI-compatible API endpoints for embedding generation.

For more information about SiliconFlow models, see https://docs.siliconflow.cn/en/api-reference/embeddings/create-embeddings

Example

pip install pyseekdb openai

get_config() dict[str, Any][source]

Get the configuration dictionary for the OpenAIBaseEmbeddingFunction.

Subclasses should override the name() method to provide the correct name for routing.

Returns:

Dictionary containing configuration needed to restore this embedding function

class pyseekdb.utils.embedding_functions.TencentHunyuanEmbeddingFunction(model_name: str = 'hunyuan-embedding', api_key_env: str | None = None, api_base: str | None = None, dimensions: int | None = None, **kwargs: Any)[source]

Bases: OpenAIBaseEmbeddingFunction

A convenient embedding function for Tencent Hunyuan embedding models.

This class provides a simplified interface to Tencent Hunyuan embedding models using the OpenAI-compatible API. Tencent Hunyuan provides OpenAI-compatible API endpoints for embedding generation.

For more information about Tencent Hunyuan models, see https://cloud.tencent.com/document/product/1729/111007

Note: The embedding interface currently only supports input and model parameters. The model is fixed as hunyuan-embedding and dimensions are fixed at 1024.

Example

pip install pyseekdb openai

property dimension: int

Get the dimension of embeddings produced by this function.

Returns the known dimension for models without making an API call. If the dimensions parameter is specified, that value is returned. Otherwise, the default dimension for the model is returned.

If the model is not in the known dimensions list, falls back to calling the parent’s dimension detection (which may make an API call).

Returns:

The dimension of embeddings for this model.

Return type:

int

get_config() dict[str, Any][source]

Get the configuration dictionary for the OpenAIBaseEmbeddingFunction.

Subclasses should override the name() method to provide the correct name for routing.

Returns:

Dictionary containing configuration needed to restore this embedding function

class pyseekdb.utils.embedding_functions.VoyageaiEmbeddingFunction(model_name: str = 'voyage-4-large', api_key_env: str | None = None, input_type: str | None = None, truncation: bool | None = None, output_dimension: int | None = None, **kwargs: Any)[source]

Bases: EmbeddingFunction[str | list[str]]

A convenient embedding function for Voyage AI embedding models.

This class provides a simplified interface to Voyage AI embedding models using the voyageai package.

For more information about Voyage AI models, see https://docs.voyageai.com/docs/embeddings

Example

pip install pyseekdb voyageai

get_config() dict[str, Any][source]

Get the configuration dictionary for the embedding function.

This method should return a dictionary that contains all the information needed to restore the embedding function after restart.

Returns:

Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.