API Reference

This page contains the auto-generated API reference for pyseekdb.

Main Package

pyseekdb - Unified vector database client wrapper

Based on seekdb and pymysql, providing a simple and unified API.

Supports two modes:

  • Embedded mode - using local seekdb

  • Remote server mode - connecting to remote server via pymysql (supports both seekdb Server and OceanBase Server)

Examples:

Embedded mode - Collection management:

import pyseekdb
client = pyseekdb.Client(path="./seekdb.db", database="test")
collection = client.get_or_create_collection("my_collection")

Remote server mode (seekdb Server) - Collection management:

import pyseekdb
client = pyseekdb.Client(
    host='localhost',
    port=2881,
    tenant="sys",
    database="test",
    user="root",
    password="pass"
)
collection = client.get_or_create_collection("my_collection")

Remote server mode (OceanBase Server) - Collection management:

import pyseekdb
client = pyseekdb.Client(
    host='localhost',
    port=2881,
    tenant="test",
    database="test",
    user="root",
    password="pass"
)
collection = client.get_or_create_collection("my_collection")

Admin client - Database management:

import pyseekdb
admin = pyseekdb.AdminClient(path="./seekdb.db")
admin.create_database("new_db")
databases = admin.list_databases()
class pyseekdb.AdminAPI[source]

Bases: ABC

Abstract admin API interface for database management. Defines the contract for database operations.

abstractmethod create_database(name: str, tenant: str = 'test') None[source]

Create database

Parameters:
  • name – database name

  • tenant – tenant name (for OceanBase)

abstractmethod delete_database(name: str, tenant: str = 'test') None[source]

Delete database

Parameters:
  • name – database name

  • tenant – tenant name (for OceanBase)

abstractmethod get_database(name: str, tenant: str = 'test') Database[source]

Get database object

Parameters:
  • name – database name

  • tenant – tenant name (for OceanBase)

Returns:

Database object

abstractmethod list_databases(limit: int | None = None, offset: int | None = None, tenant: str = 'test') Sequence[Database][source]

List all databases

Parameters:
  • limit – maximum number of results to return

  • offset – number of results to skip

  • tenant – tenant name (for OceanBase)

Returns:

Sequence of Database objects

pyseekdb.AdminClient(path: str | None = None, host: str | None = None, port: int | None = None, tenant: str = 'sys', user: str | None = None, password: str = '', **kwargs) _AdminClientProxy[source]

Smart admin client factory function (proxy pattern)

Automatically selects embedded or remote server mode based on parameters: - If path is provided, uses embedded mode - If host/port is provided, uses remote server mode (supports both seekdb Server and OceanBase Server)

Returns a lightweight AdminClient proxy that only exposes database operations. For collection management, use Client().

Parameters:
  • path – seekdb data directory path (embedded mode)

  • host – server address (remote server mode)

  • port – server port (remote server mode, default 2881)

  • tenant – tenant name (remote server mode, default “sys” for seekdb Server, “test” for OceanBase)

  • user – username (remote server mode, without tenant suffix)

  • password – password (remote server mode). If not provided, will be retrieved from SEEKDB_PASSWORD environment variable

  • **kwargs – other parameters

Returns:

A proxy that only exposes database operations

Return type:

_AdminClientProxy

Examples

>>> # Embedded mode
>>> admin = AdminClient(path="/path/to/seekdb")
>>> admin.create_database("new_db")  # ✅ Available
>>> # admin.create_collection("coll")  # ❌ Not available
>>> # Remote server mode (seekdb Server)
>>> admin = AdminClient(
...     host='localhost',
...     port=2881,
...     tenant="sys",
...     user="root",
...     password="pass"
... )
>>> # Remote server mode (OceanBase Server)
>>> admin = AdminClient(
...     host='localhost',
...     port=2881,
...     tenant="test",
...     user="root",
...     password="pass"
... )
class pyseekdb.BaseClient[source]

Bases: BaseConnection, AdminAPI

Abstract base class for all clients.

Design Pattern: 1. Provides public collection management methods (create_collection, get_collection, etc.) 2. Defines internal operation interfaces (_collection_* methods) called by Collection objects 3. Subclasses implement all abstract methods to provide specific business logic

Benefits of this design: - Collection object interface is unified regardless of which client created it - Different clients can have completely different underlying implementations (SQL/gRPC/REST) - Easy to extend with new client types

Inherits connection management from BaseConnection and database operations from AdminAPI.

count_collection() int[source]

Count the total number of collections.

Returns:

The number of collections.

Examples

>>> count = client.count_collection()
>>> print(f"Database has {count} collections")
create_collection(name: str, configuration: ~pyseekdb.client.configuration.Configuration | ~pyseekdb.client.configuration.HNSWConfiguration | None = <pyseekdb.client.client_base._NotProvided object>, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>, **kwargs) Collection[source]

Create a new collection.

Parameters:
  • name – The name of the collection to create. Must contain only alphanumeric characters or underscores.

  • configuration – Index configuration. Defaults to None (uses HNSW with Cosine distance and dimension 384). Can be a Configuration or HNSWConfiguration object. If set to None, the dimension will be inferred from the embedding function.

  • embedding_function – The embedding function to use for this collection. Defaults to DefaultEmbeddingFunction (all-MiniLM-L6-v2). If set to None, no embedding function will be used (embeddings must be provided manually).

  • **kwargs – Additional parameters for collection creation.

Returns:

The created Collection object.

Raises:
  • ValueError – If the collection name is invalid, already exists, or if the configuration/embedding function combination is invalid (e.g., dimension mismatch).

  • TypeError – If the configuration object is of an invalid type.

Examples

Create a collection with default settings:

>>> client.create_collection("my_collection")

Create a collection with a custom embedding function:

>>> from pyseekdb import DefaultEmbeddingFunction
>>> ef = DefaultEmbeddingFunction(model_name="all-MiniLM-L6-v2")
>>> collection = client.create_collection("my_docs", embedding_function=ef)

Create a collection with specific configuration:

>>> from pyseekdb import HNSWConfiguration
>>> config = HNSWConfiguration(dimension=128, distance="l2")
>>> collection = client.create_collection(
...     "custom_config",
...     configuration=config,
...     embedding_function=None
... )
create_database(name: str, tenant: str = 'test') None[source]

Create database

Parameters:
  • name – database name

  • tenant – tenant name (for OceanBase)

delete_collection(name: str) None[source]

Delete a collection.

Parameters:

name – The name of the collection to delete.

Raises:

ValueError – If the collection does not exist.

Examples

>>> client.delete_collection("my_collection")
delete_database(name: str, tenant: str = 'test') None[source]

Delete database

Parameters:
  • name – database name

  • tenant – tenant name (for OceanBase)

detect_db_type_and_version() tuple[str, Version][source]

Detect database type and version.

Works for all three modes: seekdb-embedded, seekdb-server, and oceanbase. Version detection is case-insensitive for seekdb.

Returns:

(“seekdb”, Version(“x.x.x.x”)) or (“oceanbase”, Version(“x.x.x.x”))

Return type:

(db_type, version)

Raises:

ValueError – If unable to detect database type or version

Examples

>>> db_type, version = client.detect_db_type_and_version()
>>> version > Version("1.0.0.0")
True
get_collection(name: str, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>) Collection[source]
get_database(name: str, tenant: str = 'test') Database[source]

Get database object

Parameters:
  • name – database name

  • tenant – tenant name (for OceanBase)

get_or_create_collection(name: str, configuration: ~pyseekdb.client.configuration.Configuration | ~pyseekdb.client.configuration.HNSWConfiguration | None = <pyseekdb.client.client_base._NotProvided object>, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>, **kwargs) Collection[source]

Get a collection if it exists, otherwise create it.

Parameters:
  • name – The name of the collection.

  • configuration – Index configuration. Defaults to None (uses HNSW with Cosine distance and dimension 384). Can be a Configuration or HNSWConfiguration object. If set to None, the dimension will be inferred from the embedding function.

  • embedding_function – The embedding function to use for this collection. Defaults to DefaultEmbeddingFunction (all-MiniLM-L6-v2). If set to None, no embedding function will be used (embeddings must be provided manually).

  • **kwargs – Additional parameters passed to create_collection if the collection is created.

Returns:

The existing or newly created Collection object.

Raises:

ValueError – If the configuration/embedding function combination is invalid (e.g., dimension mismatch).

Examples

>>> collection = client.get_or_create_collection("my_collection")
has_collection(name: str) bool[source]

Check if a collection exists.

Parameters:

name – The name of the collection to check.

Returns:

True if the collection exists, False otherwise.

Examples

>>> if client.has_collection("my_collection"):
...     print("Collection exists!")
list_collections() list[Collection][source]

List all collections in the database.

Returns:

A list of Collection objects.

Examples

>>> collections = client.list_collections()
>>> for col in collections:
...     print(col.name)
list_databases(limit: int | None = None, offset: int | None = None, tenant: str = 'test') Sequence[Database][source]

List all databases

Parameters:
  • limit – maximum number of results to return

  • offset – number of results to skip

  • tenant – tenant name (for OceanBase)

class pyseekdb.BaseConnection[source]

Bases: ABC

Abstract base class for connection management. Defines unified connection interface for all clients.

abstractmethod get_raw_connection() Any[source]

Get raw connection object

abstractmethod is_connected() bool[source]

Check connection status

abstract property mode: str

Return client mode (e.g., ‘SeekdbEmbeddedClient’, ‘RemoteServerClient’)

pyseekdb.Client(path: str | None = None, host: str | None = None, port: int | None = None, tenant: str = 'sys', database: str = 'test', user: str | None = None, password: str = '', **kwargs) _ClientProxy[source]

Smart client factory function (returns ClientProxy for collection operations only)

Automatically selects embedded or remote server mode based on parameters: - If path is provided, uses embedded mode - If host/port is provided, uses remote server mode (supports both seekdb Server and OceanBase Server) - If neither path nor host is provided, defaults to embedded mode with current working directory as path

Returns a ClientProxy that only exposes collection operations. For database management, use AdminClient().

Parameters:
  • path – seekdb data directory path (embedded mode). If not provided and host is also not provided, defaults to current working directory

  • host – server address (remote server mode)

  • port – server port (remote server mode, default 2881)

  • tenant – tenant name (remote server mode, default “sys” for seekdb Server, “test” for OceanBase)

  • database – database name

  • user – username (remote server mode, without tenant suffix)

  • password – password (remote server mode). If not provided, will be retrieved from SEEKDB_PASSWORD environment variable

  • **kwargs – other parameters

Returns:

A proxy that only exposes collection operations

Return type:

_ClientProxy

Examples

>>> # Embedded mode with explicit path
>>> client = Client(path="/path/to/seekdb", database="db1")
>>> client.create_collection("my_collection")  # ✅ Available
>>> # Embedded mode (default, uses current working directory)
>>> client = Client(database="db1")
>>> client.create_collection("my_collection")  # ✅ Available
>>> # Remote server mode (seekdb Server)
>>> client = Client(
...     host='localhost',
...     port=2881,
...     tenant="sys",
...     database="db1",
...     user="root",
...     password="pass"
... )
>>> # Remote server mode (OceanBase Server)
>>> client = Client(
...     host='localhost',
...     port=2881,
...     tenant="test",
...     database="db1",
...     user="root",
...     password="pass"
... )
class pyseekdb.ClientAPI[source]

Bases: ABC

Client API interface for collection operations only. This is what end users interact with through the Client proxy.

abstractmethod create_collection(name: str, configuration: ~pyseekdb.client.configuration.Configuration | ~pyseekdb.client.configuration.HNSWConfiguration | None = <pyseekdb.client.client_base._NotProvided object>, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>, **kwargs) Collection[source]

Create collection

Parameters:
  • name – Collection name

  • configuration – Index configuration (Configuration or HNSWConfiguration). For backward compatibility, HNSWConfiguration is still accepted. Configuration can include fulltext analyzer configuration (FulltextIndexConfig).

  • embedding_function – Embedding function to convert documents to embeddings. Defaults to DefaultEmbeddingFunction. If explicitly set to None, collection will not have an embedding function. If provided, the dimension in configuration should match the embedding function’s output dimension.

  • **kwargs – Additional parameters

abstractmethod delete_collection(name: str) None[source]

Delete collection

abstractmethod get_collection(name: str, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>) Collection[source]

Get an existing collection.

Parameters:
  • name – The name of the collection to retrieve.

  • embedding_function – The embedding function to use. If not provided, it will try to load the function used when creating the collection. If explicitly set to None, no embedding function will be used.

Returns:

The Collection object.

Raises:

ValueError – If the collection does not exist.

Examples

>>> collection = client.get_collection("my_collection")
abstractmethod has_collection(name: str) bool[source]

Check if collection exists

abstractmethod list_collections() list[Collection][source]

List all collections

class pyseekdb.Collection(client: Any, name: str, collection_id: str | None = None, dimension: int | None = None, embedding_function: EmbeddingFunction[EmbeddingDocuments] | None = None, distance: str | None = None, **metadata)[source]

Bases: object

Collection unified interface class

Design Principles: - Collection is a lightweight wrapper that only holds metadata - All operations delegate to the client via self._client._collection_*() methods - Different clients (OceanBase, Seekdb, Milvus, etc.) provide different implementations - Users see identical interface regardless of which client created the collection

add(ids: str | list[str], embeddings: list[float] | list[list[float]] | None = None, metadatas: dict | list[dict] | None = None, documents: str | list[str] | None = None, **kwargs) None[source]

Add data to collection

Parameters:
  • ids – Single ID or list of IDs

  • embeddings – Single embedding or list of embeddings (optional if documents provided and embedding_function is set)

  • metadatas – Single metadata dict or list of metadata dicts (optional)

  • documents – Single document or list of documents (optional) If provided without embeddings, embedding_function will be used to generate embeddings

  • **kwargs – Additional parameters

Examples

# Add single item with embeddings collection.add(ids=”1”, embeddings=[0.1, 0.2, 0.3], metadatas={“tag”: “A”})

# Add multiple items with embeddings collection.add(

ids=[“1”, “2”, “3”], embeddings=[[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]], metadatas=[{“tag”: “A”}, {“tag”: “B”}, {“tag”: “C”}]

)

# Add items with documents (embeddings will be auto-generated if embedding_function is set) collection.add(

ids=[“1”, “2”], documents=[“Hello world”, “How are you?”], metadatas=[{“tag”: “A”}, {“tag”: “B”}]

)

property client: Any

Associated client

count() int[source]

Get the number of items in collection

Returns:

Item count

Examples

count = collection.count() print(f”Collection has {count} items”)

delete(ids: str | list[str] | None = None, where: dict[str, Any] | None = None, where_document: dict[str, Any] | None = None, **kwargs) None[source]

Delete data from collection

Parameters:
  • ids – Single ID or list of IDs to delete (optional)

  • where – Filter condition on metadata (optional)

  • where_document – Filter condition on documents (optional)

  • **kwargs – Additional parameters

Note

At least one of ids, where, or where_document must be provided

Examples

# Delete by IDs collection.delete(ids=[“1”, “2”, “3”])

# Delete by metadata filter collection.delete(where={“tag”: “A”})

# Delete by document filter collection.delete(where_document={“$contains”: “keyword”})

property dimension: int | None

Vector dimension

property distance: str | None

Distance metric used by the index (e.g., ‘l2’, ‘cosine’, ‘inner_product’)

property embedding_function: EmbeddingFunction[EmbeddingDocuments] | None

Embedding function for this collection

fork(forked_name: str) Collection[source]

Fork (duplicate) this collection to create a new collection with the same data.

The forked collection is independent - modifications to one collection do not affect the other. The original collection remains unchanged.

Parameters:

forked_name – Name for the new forked collection. Must be a valid collection name (letters, digits, and underscores only, not empty).

Returns:

The newly created forked collection.

Return type:

Collection

Raises:

ValueError – If fork is not enabled for this database, if the collection name is invalid, or if a collection with the given name already exists.

Note

  • Fork is only available for seekdb database version 1.1.0.0 or higher.

Examples: .. code-block:: python

# Fork a collection original = client.get_collection(“my_collection”) forked = original.fork(“my_collection_backup”)

# Verify both collections have the same data assert original.count() == forked.count()

# Add data to forked collection (original is unaffected) forked.add(ids=”new_id”, embeddings=[1.0, 2.0, 3.0], documents=”New document”) assert original.count() == 3 # Original unchanged assert forked.count() == 4 # Forked has new data

get(ids: str | list[str] | None = None, where: dict[str, Any] | None = None, where_document: dict[str, Any] | None = None, limit: int | None = None, offset: int | None = None, include: list[str] | None = None, **kwargs) dict[str, Any][source]

Get data from collection by IDs or filters

Parameters:
  • ids – Single ID or list of IDs to retrieve (optional)

  • where – Filter condition on metadata (optional)

  • where_document – Filter condition on documents (optional)

  • limit – Maximum number of results to return (optional)

  • offset – Number of results to skip (optional)

  • include – Fields to include in results, e.g., [“metadatas”, “documents”, “embeddings”] (optional)

  • **kwargs – Additional parameters

Returns:

  • ids: List[str] - List of IDs

  • documents: Optional[List[str]] - List of documents (if included)

  • metadatas: Optional[List[Dict]] - List of metadata dictionaries (if included)

  • embeddings: Optional[List[List[float]]] - List of embeddings (if included)

Return type:

Dict with keys (chromadb-compatible format)

Note

If no parameters provided, returns all data (up to limit)

Examples

# Get by single ID results = collection.get(ids=”1”) # results[“ids”] contains [“1”] # results[“documents”] contains document for ID “1”

# Get by multiple IDs results = collection.get(ids=[“1”, “2”, “3”]) # results[“ids”] contains [“1”, “2”, “3”] # results[“documents”] contains documents for all IDs

# Get by filter results = collection.get(

where={“tag”: “A”}, limit=10

) # results[“ids”] contains all matching IDs # results[“documents”] contains all matching documents

# Get all data results = collection.get(limit=100)

Hybrid search combining full-text search and vector similarity search

Parameters:
  • query – Full-text search configuration dict with: - where_document: Document filter conditions (e.g., {“$contains”: “text”}) - where: Metadata filter conditions (e.g., {“page”: {“$gte”: 5}}) - n_results: Number of results for full-text search (optional)

  • knn – Vector search configuration dict with: - query_texts: Query text(s) to be embedded (optional if query_embeddings provided) - query_embeddings: Query vector(s) (optional if query_texts provided) - where: Metadata filter conditions (optional) - n_results: Number of results for vector search (optional)

  • rank – Ranking configuration dict (e.g., {“rrf”: {“rank_window_size”: 60, “rank_constant”: 60}})

  • n_results – Final number of results to return after ranking (default: 10)

  • include – Fields to include in results (e.g., [“documents”, “metadatas”, “embeddings”])

  • **kwargs – Additional parameters

Returns:

  • ids: List[List[str]] - List of ID lists (one list for hybrid search result)

  • documents: Optional[List[List[str]]] - List of document lists (if included)

  • metadatas: Optional[List[List[Dict]]] - List of metadata lists (if included)

  • embeddings: Optional[List[List[List[float]]]] - List of embedding lists (if included)

  • distances: Optional[List[List[float]]] - List of distance lists

Return type:

Dict with keys (query-compatible format)

Examples

# Hybrid search with both full-text and vector search results = collection.hybrid_search(

query={

“where_document”: {“$contains”: “machine learning”}, “where”: {“category”: {“$eq”: “science”}}, “n_results”: 10

}, knn={

“query_texts”: [“AI research”], “where”: {“year”: {“$gte”: 2020}}, “n_results”: 10

}, rank={“rrf”: {}}, n_results=5, include=[“documents”, “metadatas”, “embeddings”]

) # results[“ids”][0] contains IDs for the hybrid search # results[“documents”][0] contains documents for the hybrid search # results[“distances”][0] contains distances for the hybrid search

property id: str | None

Collection ID

property metadata: dict[str, Any]

Collection metadata

property name: str

Collection name

peek(limit: int = 10) dict[str, Any][source]

Quickly preview the first few items in the collection

Parameters:

limit – Number of items to preview (default: 10)

Returns:

  • ids: List[str] - List of IDs

  • documents: List[str] - List of documents (always included)

  • metadatas: List[Dict] - List of metadata dictionaries (always included)

  • embeddings: List[List[float]] - List of embeddings (always included)

Return type:

Dict with keys (chromadb-compatible format)

Examples

# Preview first 5 items (returns all columns by default) preview = collection.peek(limit=5) for i in range(len(preview[“ids”])):

print(f”ID: {preview[‘ids’][i]}, Document: {preview[‘documents’][i]}”) print(f”Metadata: {preview[‘metadatas’][i]}, Embedding: {preview[‘embeddings’][i]}”)

query(query_embeddings: list[float] | list[list[float]] | None = None, query_texts: str | list[str] | None = None, n_results: int = 10, where: dict[str, Any] | None = None, where_document: dict[str, Any] | None = None, include: list[str] | None = None, **kwargs) dict[str, Any][source]

Query collection by vector similarity

Parameters:
  • query_embeddings – Query vector(s) (optional if query_texts provided)

  • query_texts – Query text(s) to be embedded (optional if query_embeddings provided)

  • n_results – Number of results to return (default: 10)

  • where – Filter condition on metadata supporting: - Comparison operators: $eq, $lt, $gt, $lte, $gte, $ne, $in, $nin - Logical operators: $or, $and, $not

  • where_document – Filter condition on documents supporting: - $contains: full-text search - $regex: regular expression matching - Logical operators: $or, $and

  • include – Fields to include in results, e.g., [“documents”, “metadatas”, “embeddings”] (optional) By default, returns “documents” and “metadatas”. Always includes “_id”.

  • **kwargs – Additional parameters

Returns:

  • ids: List[List[str]] - List of ID lists, one list per query

  • documents: Optional[List[List[str]]] - List of document lists, one list per query (if included)

  • metadatas: Optional[List[List[Dict]]] - List of metadata lists, one list per query (if included)

  • embeddings: Optional[List[List[List[float]]]] - List of embedding lists, one list per query (if included)

  • distances: Optional[List[List[float]]] - List of distance lists, one list per query

Return type:

Dict with keys (chromadb-compatible format)

Examples

# Query by single embedding results = collection.query(

query_embeddings=[0.1, 0.2, 0.3], n_results=5

) # results[“ids”][0] contains IDs for the query # results[“documents”][0] contains documents for the query # results[“distances”][0] contains distances for the query

# Query by multiple embeddings results = collection.query(

query_embeddings=[[11.1, 12.1, 13.1], [1.1, 2.3, 3.2]], n_results=5

) # results[“ids”][0] contains IDs for first query # results[“ids”][1] contains IDs for second query

# Query with filters results = collection.query(

query_embeddings=[[0.1, 0.2, 0.3]], where={“chapter”: {“$gte”: 3}}, where_document={“$contains”: “machine learning”}, include=[“documents”, “metadatas”, “embeddings”]

)

# Query by texts (will be embedded automatically) results = collection.query(

query_texts=[“my query text”], n_results=10

)

# Query by multiple texts results = collection.query(

query_texts=[“text1”, “text2”], n_results=10

)

update(ids: str | list[str], embeddings: list[float] | list[list[float]] | None = None, metadatas: dict | list[dict] | None = None, documents: str | list[str] | None = None, **kwargs) None[source]

Update existing data in collection

Parameters:
  • ids – Single ID or list of IDs to update

  • embeddings – New embeddings (optional)

  • metadatas – New metadata (optional)

  • documents – New documents (optional)

  • **kwargs – Additional parameters

Note

IDs must exist, otherwise an error will be raised

Examples

# Update single item collection.update(ids=”1”, metadatas={“tag”: “B”})

# Update multiple items collection.update(

ids=[“1”, “2”], embeddings=[[0.9, 0.8], [0.7, 0.6]]

)

upsert(ids: str | list[str], embeddings: list[float] | list[list[float]] | None = None, metadatas: dict | list[dict] | None = None, documents: str | list[str] | None = None, **kwargs) None[source]

Insert or update data in collection

Parameters:
  • ids – Single ID or list of IDs

  • embeddings – embeddings (optional if documents provided)

  • metadatas – Metadata (optional)

  • documents – Documents (optional)

  • **kwargs – Additional parameters

Note

If ID exists, update it; otherwise, insert new data

Examples

# Upsert single item collection.upsert(ids=”1”, embeddings=[0.1, 0.2], metadatas={“tag”: “A”})

# Upsert multiple items collection.upsert(

ids=[“1”, “2”, “3”], embeddings=[[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]

)

class pyseekdb.Configuration(hnsw: HNSWConfiguration | None = None, fulltext_config: FulltextIndexConfig | None = None)[source]

Bases: object

Configuration for collection creation

Parameters:
  • hnsw – HNSWConfiguration or None

  • fulltext_config – FulltextIndexConfig or None. If None, defaults to FulltextIndexConfig(analyzer=’ik’)

class pyseekdb.Database(name: str, tenant: str | None = None, charset: str | None = None, collation: str | None = None, **kwargs)[source]

Bases: object

Database object representing a database instance.

Note

  • tenant is None for embedded/server mode (no tenant concept)

  • tenant is set for OceanBase mode (multi-tenant architecture)

class pyseekdb.DefaultEmbeddingFunction(model_name: str = 'all-MiniLM-L6-v2', preferred_providers: list[str] | None = None)[source]

Bases: EmbeddingFunction[str | list[str]]

Default embedding function using ONNX runtime.

Uses the ‘all-MiniLM-L6-v2’ model via ONNX, which produces 384-dimensional embeddings. This is a lightweight, fast model suitable for general-purpose text embeddings.

Example

>>> ef = DefaultEmbeddingFunction()
>>> embeddings = ef(["Hello world", "How are you?"])
>>> print(len(embeddings[0]))  # 384
static build_from_config(_config: dict[str, Any]) Self[source]
property dimension: int

Get the dimension of embeddings produced by this function.

get_config() dict[str, Any][source]

Get the configuration dictionary for the embedding function.

This method should return a dictionary that contains all the information needed to restore the embedding function after restart.

Returns:

Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.

static name() str[source]
class pyseekdb.EmbeddingFunction(*args, **kwargs)[source]

Bases: Protocol[D]

Protocol for embedding functions that convert documents to vectors.

This is similar to Chroma’s EmbeddingFunction interface. Implementations should convert text documents to vector embeddings.

Implementations should also provide: - name(): Static method that returns a unique name identifier for routing (not persisted in config) - get_config(): Instance method that returns a configuration dictionary - build_from_config(config): Static method that restores an instance from config

Example

>>> class MyEmbeddingFunction(EmbeddingFunction[Documents]):
...     @staticmethod
...     def name() -> str:
...         return "my_embedding_function"
...     def __call__(self, documents: Documents) -> Embeddings:
...         # Convert documents to embeddings
...         return [[0.1, 0.2, ...], [0.3, 0.4, ...]]
...     def get_config(self) -> Dict[str, Any]:
...         return {...}  # Note: 'name' is not included
...     @staticmethod
...     def build_from_config(config: Dict[str, Any]) -> "MyEmbeddingFunction":
...         return MyEmbeddingFunction(...)
>>>
>>> ef = MyEmbeddingFunction()
>>> embeddings = ef(["Hello", "World"])
>>> config = ef.get_config()
>>> restored_ef = MyEmbeddingFunction.build_from_config(config)
abstractmethod get_config() dict[str, Any][source]

Get the configuration dictionary for the embedding function.

This method should return a dictionary that contains all the information needed to restore the embedding function after restart.

Returns:

Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.

static support_persistence(embedding_function: Any) bool[source]

Check if the embedding function supports persistence.

class pyseekdb.FulltextIndexConfig(analyzer: str = 'ik', properties: dict[str, str | int | float | bool] | None = None)[source]

Bases: object

Fulltext analyzer configuration for fulltext indexing.

Parameters:
  • analyzer – Analyzer name, can be ‘space’, ‘ngram’, ‘ngram2’, ‘beng’, ‘ik’ and so on (default: ‘ik’)

  • properties – Optional dictionary of parser-specific parameters (key: string, value: primitive type)

analyzer: str = 'ik'
properties: dict[str, str | int | float | bool] | None = None
class pyseekdb.HNSWConfiguration(dimension: int, distance: str = 'l2', properties: dict[str, str | int | float | bool] | None = None)[source]

Bases: object

HNSW (Hierarchical Navigable Small World) index configuration

Parameters:
  • dimension – Vector dimension (number of elements in each vector)

  • distance – Distance metric for similarity calculation (e.g., ‘l2’, ‘cosine’, ‘inner_product’)

  • properties – Optional dictionary of properties for the HNSW index (key: string, value: primitive type)

  • configuration](https (Please refer to [HNSW) – //en.oceanbase.com/docs/common-oceanbase-database-10000000003351043) for detailed information.

dimension: int
distance: str = 'l2'
properties: dict[str, str | int | float | bool] | None = None
class pyseekdb.RemoteServerClient(host: str = 'localhost', port: int = 2881, tenant: str = 'sys', database: str = 'test', user: str = 'root', password: str = '', charset: str = 'utf8mb4', **kwargs)[source]

Bases: BaseClient

Remote server mode client (connecting via pymysql, lazy loading)

Supports both seekdb Server and OceanBase Server. Uses user@tenant format for authentication.

create_database(name: str, tenant: str = 'test') None[source]

Create database (remote server has tenant concept, uses client’s tenant)

Parameters:
  • name – database name

  • tenant – tenant name (if different from client tenant, will use client tenant)

Note

Remote server has multi-tenant architecture. Database is scoped to client’s tenant.

delete_database(name: str, tenant: str = 'test') None[source]

Delete database (remote server has tenant concept, uses client’s tenant)

Parameters:
  • name – database name

  • tenant – tenant name (if different from client tenant, will use client tenant)

Note

Remote server has multi-tenant architecture. Database is scoped to client’s tenant.

get_database(name: str, tenant: str = 'test') Database[source]

Get database object (remote server has tenant concept, uses client’s tenant)

Parameters:
  • name – database name

  • tenant – tenant name (if different from client tenant, will use client tenant)

Returns:

Database object with tenant information

Note

Remote server has multi-tenant architecture. Database is scoped to client’s tenant.

get_raw_connection() pymysql.Connection[source]

Get raw connection object

is_connected() bool[source]

Check connection status

list_databases(limit: int | None = None, offset: int | None = None, tenant: str = 'test') Sequence[Database][source]

List all databases (remote server has tenant concept, uses client’s tenant)

Parameters:
  • limit – maximum number of results to return

  • offset – number of results to skip

  • tenant – tenant name (if different from client tenant, will use client tenant)

Returns:

Sequence of Database objects with tenant information

Note

Remote server has multi-tenant architecture. Lists databases in client’s tenant.

property mode: str

Return client mode (e.g., ‘SeekdbEmbeddedClient’, ‘RemoteServerClient’)

class pyseekdb.SeekdbEmbeddedClient(path: str = './seekdb.db', database: str = 'test', **kwargs)[source]

Bases: BaseClient

Embedded seekdb client (lazy connection)

Note: Only available on Linux platforms. pylibseekdb dependency is Linux-only.

create_database(name: str, tenant: str = 'test') None[source]

Create database (tenant parameter ignored for embedded mode)

Parameters:
  • name – database name

  • tenant – ignored for embedded mode (no tenant concept)

delete_database(name: str, tenant: str = 'test') None[source]

Delete database (tenant parameter ignored for embedded mode)

Parameters:
  • name – database name

  • tenant – ignored for embedded mode (no tenant concept)

get_database(name: str, tenant: str = 'test') Database[source]

Get database object (tenant parameter ignored for embedded mode)

Parameters:
  • name – database name

  • tenant – ignored for embedded mode (no tenant concept)

get_raw_connection() Any[source]

Get raw connection object

is_connected() bool[source]

Check connection status

list_databases(limit: int | None = None, offset: int | None = None, tenant: str = 'test') Sequence[Database][source]

List all databases (tenant parameter ignored for embedded mode)

Parameters:
  • limit – maximum number of results to return

  • offset – number of results to skip

  • tenant – ignored for embedded mode (no tenant concept)

property mode: str

Return client mode (e.g., ‘SeekdbEmbeddedClient’, ‘RemoteServerClient’)

class pyseekdb.Version(version_str: str)[source]

Bases: object

Represents a version number with support for comparison operations.

Supports versions in format: x.x.x or x.x.x.x (3 or 4 numeric parts)

Examples

>>> v1 = Version("1.0.1.0")
>>> v2 = Version("1.0.0.1")
>>> v1 > v2
True
>>> v1 = Version("1.2.3")
>>> v2 = Version("1.2.4")
>>> v1 < v2
True
property build: int

Get build version number (0 if not specified)

property major: int

Get major version number

property minor: int

Get minor version number

property parts: tuple[int, int, int, int]

Get version parts as tuple

property patch: int

Get patch version number

pyseekdb.get_default_embedding_function() DefaultEmbeddingFunction[source]

Get or create the default embedding function instance.

Returns:

DefaultEmbeddingFunction instance

pyseekdb.register_embedding_function(embedding_function_class: type[T]) type[T][source]

Decorator to automatically register an embedding function class.

This decorator can be used as a class decorator to automatically register an embedding function when the class is defined, eliminating the need to manually call EmbeddingFunctionRegistry.register().

Parameters:

embedding_function_class – The embedding function class to register. Must implement: - A static name() method that returns a unique identifier - A get_config() instance method that returns configuration dict - A static build_from_config(config) method to restore instances

Returns:

The same class (for use as a decorator).

Raises:

ValueError – If the class doesn’t have the required methods or if the name is already registered to a different class.

Example

>>> from pyseekdb.client.embedding_function import (
...     EmbeddingFunction, Documents, Embeddings, register_embedding_function
... )
>>> from typing import Dict, Any
>>>
>>> @register_embedding_function
... class MyCustomEmbeddingFunction(EmbeddingFunction[Documents]):
...     def __init__(self, model_name: str = "my-model"):
...         self.model_name = model_name
...
...     def __call__(self, input: list[str]|str) -> list[list[float]]:
...         # Your embedding logic
...         return [[0.1, 0.2, 0.3] for _ in (input if isinstance(input, list) else [input])]
...
...     @staticmethod
...     def name() -> str:
...         return "my_custom_embedding"
...
...     def get_config(self) -> Dict[str, Any]:
...         return {"model_name": self.model_name}
...
...     @staticmethod
...     def build_from_config(config: Dict[str, Any]) -> "MyCustomEmbeddingFunction":
...         return MyCustomEmbeddingFunction(model_name=config.get("model_name", "my-model"))
>>>
>>> # The class is now automatically registered!
>>> # You can use it immediately when creating collections
>>> import pyseekdb
>>> client = pyseekdb.Client(path="./seekdb.db")
>>> ef = MyCustomEmbeddingFunction()
>>> collection = client.create_collection("my_collection", embedding_function=ef)

Utility Modules

Embedding Functions

The following embedding function classes are available in pyseekdb.utils.embedding_functions:

pyseekdb.utils.embedding_functions

Embedding function implementations for pyseekdb.