pyseekdb.utils.embedding_functions.HuggingFaceSparseEmbeddingFunction

class pyseekdb.utils.embedding_functions.HuggingFaceSparseEmbeddingFunction(model_name: str = 'prithivida/Splade_PP_en_v1', device: str = 'cpu', task: Literal['document', 'query'] = 'document', **kwargs: Any)[source]

Bases: SparseEmbeddingFunction

Sparse embedding function powered by HuggingFace SparseEncoder models.

Uses sentence_transformers.SparseEncoder to produce sparse vectors (e.g., SPLADE activations) for keyword-based retrieval.

The model is loaded lazily and cached at the class level, so multiple instances sharing the same model_name reuse one loaded model.

Parameters:
  • model_name – HuggingFace model identifier (e.g. "prithivida/Splade_PP_en_v1").

  • device – Compute device ("cpu", "cuda", "cuda:0", etc.).

  • task – Encoding mode — "document" for indexing, "query" for searching. Defaults to "document".

  • **kwargs – Extra keyword arguments forwarded to SparseEncoder().

__init__(model_name: str = 'prithivida/Splade_PP_en_v1', device: str = 'cpu', task: Literal['document', 'query'] = 'document', **kwargs: Any)[source]

Methods

__init__([model_name, device, task])

build_from_config(config)

Restore instance from configuration dictionary.

get_config()

Get configuration dictionary (for persistence).

name()

Return unique name identifier (for registration and routing).

support_persistence(sparse_embedding_function)

Check if the sparse embedding function supports persistence.

Attributes

models

static build_from_config(config: dict[str, Any]) HuggingFaceSparseEmbeddingFunction[source]

Restore instance from configuration dictionary.

get_config() dict[str, Any][source]

Get configuration dictionary (for persistence).

Returns:

Configuration dictionary. Should NOT include ‘name’ field (handled by upper layer).

static name() str[source]

Return unique name identifier (for registration and routing).