pyseekdb.utils.embedding_functions.HuggingFaceSparseEmbeddingFunction

class pyseekdb.utils.embedding_functions.HuggingFaceSparseEmbeddingFunction(model_name: str = 'prithivida/Splade_PP_en_v1', device: str = 'cpu', task: Literal['document', 'query'] = 'document', **kwargs: Any)[source]

Bases: SparseEmbeddingFunction

Sparse embedding function powered by HuggingFace SparseEncoder models.

Uses sentence_transformers.SparseEncoder to produce sparse vectors (e.g., SPLADE activations) for keyword-based retrieval.

The model is loaded lazily and cached at the class level, so multiple instances sharing the same model_name reuse one loaded model.

Parameters:
  • model_name – HuggingFace model identifier (e.g. "prithivida/Splade_PP_en_v1").

  • device – Compute device ("cpu", "cuda", "cuda:0", etc.).

  • task – Encoding mode — "document" for indexing, "query" for searching. Defaults to "document".

  • **kwargs – Extra keyword arguments forwarded to SparseEncoder().

__init__(model_name: str = 'prithivida/Splade_PP_en_v1', device: str = 'cpu', task: Literal['document', 'query'] = 'document', **kwargs: Any)[source]

Methods

__init__([model_name, device, task])

build_from_config(config)

Restore instance from configuration dictionary.

embed_query(documents)

Encode queries into sparse vectors using encode_query.

get_config()

Get configuration dictionary (for persistence).

name()

Return unique name identifier (for registration and routing).

support_persistence(sparse_embedding_function)

Check if the sparse embedding function supports persistence.

Attributes

models

static build_from_config(config: dict[str, Any]) HuggingFaceSparseEmbeddingFunction[source]

Restore instance from configuration dictionary.

embed_query(documents: str | list[str]) list[SparseVector][source]

Encode queries into sparse vectors using encode_query.

Regardless of the task setting, this method always uses the query encoding path, which is typically preferred at search time for asymmetric models (e.g., SPLADE).

Parameters:

documents – A single string or list of strings.

Returns:

List of SparseVector instances, one per input query.

get_config() dict[str, Any][source]

Get configuration dictionary (for persistence).

Returns:

Configuration dictionary. Should NOT include ‘name’ field (handled by upper layer).

static name() str[source]

Return unique name identifier (for registration and routing).