API Reference
This page contains the auto-generated API reference for pyseekdb.
Main Package
pyseekdb - Unified vector database client wrapper
Based on seekdb and pymysql, providing a simple and unified API.
Supports two modes:
Embedded mode - using local seekdb
Remote server mode - connecting to remote server via pymysql (supports both seekdb Server and OceanBase Server)
Examples:
Embedded mode - Collection management:
import pyseekdb
client = pyseekdb.Client(path="./seekdb.db", database="test")
collection = client.get_or_create_collection("my_collection")
Remote server mode (seekdb Server) - Collection management:
import pyseekdb
client = pyseekdb.Client(
host='localhost',
port=2881,
tenant="sys",
database="test",
user="root",
password="pass"
)
collection = client.get_or_create_collection("my_collection")
Remote server mode (OceanBase Server) - Collection management:
import pyseekdb
client = pyseekdb.Client(
host='localhost',
port=2881,
tenant="test",
database="test",
user="root",
password="pass"
)
collection = client.get_or_create_collection("my_collection")
Admin client - Database management:
import pyseekdb
admin = pyseekdb.AdminClient(path="./seekdb.db")
admin.create_database("new_db")
databases = admin.list_databases()
- class pyseekdb.AdminAPI[source]
Bases:
ABCAbstract admin API interface for database management. Defines the contract for database operations.
- abstractmethod create_database(name: str, tenant: str = 'test') None[source]
Create database
- Parameters:
name – database name
tenant – tenant name (for OceanBase)
- abstractmethod delete_database(name: str, tenant: str = 'test') None[source]
Delete database
- Parameters:
name – database name
tenant – tenant name (for OceanBase)
- abstractmethod get_database(name: str, tenant: str = 'test') Database[source]
Get database object
- Parameters:
name – database name
tenant – tenant name (for OceanBase)
- Returns:
Database object
- abstractmethod list_databases(limit: int | None = None, offset: int | None = None, tenant: str = 'test') Sequence[Database][source]
List all databases
- Parameters:
limit – maximum number of results to return
offset – number of results to skip
tenant – tenant name (for OceanBase)
- Returns:
Sequence of Database objects
- pyseekdb.AdminClient(path: str | None = None, host: str | None = None, port: int | None = None, tenant: str = 'sys', user: str | None = None, password: str = '', **kwargs) _AdminClientProxy[source]
Smart admin client factory function (proxy pattern)
Automatically selects embedded or remote server mode based on parameters: - If path is provided, uses embedded mode - If host/port is provided, uses remote server mode (supports both seekdb Server and OceanBase Server)
Returns a lightweight AdminClient proxy that only exposes database operations. For collection management, use Client().
- Parameters:
path – seekdb data directory path (embedded mode)
host – server address (remote server mode)
port – server port (remote server mode, default 2881)
tenant – tenant name (remote server mode, default “sys” for seekdb Server, “test” for OceanBase)
user – username (remote server mode, without tenant suffix)
password – password (remote server mode). If not provided, will be retrieved from SEEKDB_PASSWORD environment variable
**kwargs – other parameters
- Returns:
A proxy that only exposes database operations
- Return type:
_AdminClientProxy
Examples
>>> # Embedded mode >>> admin = AdminClient(path="/path/to/seekdb") >>> admin.create_database("new_db") # ✅ Available >>> # admin.create_collection("coll") # ❌ Not available
>>> # Remote server mode (seekdb Server) >>> admin = AdminClient( ... host='localhost', ... port=2881, ... tenant="sys", ... user="root", ... password="pass" ... )
>>> # Remote server mode (OceanBase Server) >>> admin = AdminClient( ... host='localhost', ... port=2881, ... tenant="test", ... user="root", ... password="pass" ... )
- class pyseekdb.BaseClient[source]
Bases:
BaseConnection,AdminAPIAbstract base class for all clients.
Design Pattern: 1. Provides public collection management methods (create_collection, get_collection, etc.) 2. Defines internal operation interfaces (_collection_* methods) called by Collection objects 3. Subclasses implement all abstract methods to provide specific business logic
Benefits of this design: - Collection object interface is unified regardless of which client created it - Different clients can have completely different underlying implementations (SQL/gRPC/REST) - Easy to extend with new client types
Inherits connection management from BaseConnection and database operations from AdminAPI.
- count_collection() int[source]
Count the total number of collections.
- Returns:
The number of collections.
Examples
>>> count = client.count_collection() >>> print(f"Database has {count} collections")
- create_collection(name: str, configuration: ~pyseekdb.client.configuration.Configuration | ~pyseekdb.client.configuration.HNSWConfiguration | None = <pyseekdb.client.client_base._NotProvided object>, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>, **kwargs) Collection[source]
Create a new collection.
- Parameters:
name – The name of the collection to create. Must contain only alphanumeric characters or underscores.
configuration – Index configuration. Defaults to None (uses HNSW with Cosine distance and dimension 384). Can be a
ConfigurationorHNSWConfigurationobject. If set to None, the dimension will be inferred from the embedding function.embedding_function – The embedding function to use for this collection. Defaults to
DefaultEmbeddingFunction(all-MiniLM-L6-v2). If set to None, no embedding function will be used (embeddings must be provided manually).**kwargs – Additional parameters for collection creation.
- Returns:
The created
Collectionobject.- Raises:
ValueError – If the collection name is invalid, already exists, or if the configuration/embedding function combination is invalid (e.g., dimension mismatch).
TypeError – If the configuration object is of an invalid type.
Examples
Create a collection with default settings:
>>> client.create_collection("my_collection")
Create a collection with a custom embedding function:
>>> from pyseekdb import DefaultEmbeddingFunction >>> ef = DefaultEmbeddingFunction(model_name="all-MiniLM-L6-v2") >>> collection = client.create_collection("my_docs", embedding_function=ef)
Create a collection with specific configuration:
>>> from pyseekdb import HNSWConfiguration >>> config = HNSWConfiguration(dimension=128, distance="l2") >>> collection = client.create_collection( ... "custom_config", ... configuration=config, ... embedding_function=None ... )
- create_database(name: str, tenant: str = 'test') None[source]
Create database
- Parameters:
name – database name
tenant – tenant name (for OceanBase)
- delete_collection(name: str) None[source]
Delete a collection.
- Parameters:
name – The name of the collection to delete.
- Raises:
ValueError – If the collection does not exist.
Examples
>>> client.delete_collection("my_collection")
- delete_database(name: str, tenant: str = 'test') None[source]
Delete database
- Parameters:
name – database name
tenant – tenant name (for OceanBase)
- detect_db_type_and_version() tuple[str, Version][source]
Detect database type and version.
Works for all three modes: seekdb-embedded, seekdb-server, and oceanbase. Version detection is case-insensitive for seekdb.
- Returns:
(“seekdb”, Version(“x.x.x.x”)) or (“oceanbase”, Version(“x.x.x.x”))
- Return type:
(db_type, version)
- Raises:
ValueError – If unable to detect database type or version
Examples
>>> db_type, version = client.detect_db_type_and_version() >>> version > Version("1.0.0.0") True
- get_collection(name: str, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>) Collection[source]
- get_database(name: str, tenant: str = 'test') Database[source]
Get database object
- Parameters:
name – database name
tenant – tenant name (for OceanBase)
- get_or_create_collection(name: str, configuration: ~pyseekdb.client.configuration.Configuration | ~pyseekdb.client.configuration.HNSWConfiguration | None = <pyseekdb.client.client_base._NotProvided object>, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>, **kwargs) Collection[source]
Get a collection if it exists, otherwise create it.
- Parameters:
name – The name of the collection.
configuration – Index configuration. Defaults to None (uses HNSW with Cosine distance and dimension 384). Can be a
ConfigurationorHNSWConfigurationobject. If set to None, the dimension will be inferred from the embedding function.embedding_function – The embedding function to use for this collection. Defaults to
DefaultEmbeddingFunction(all-MiniLM-L6-v2). If set to None, no embedding function will be used (embeddings must be provided manually).**kwargs – Additional parameters passed to
create_collectionif the collection is created.
- Returns:
The existing or newly created
Collectionobject.- Raises:
ValueError – If the configuration/embedding function combination is invalid (e.g., dimension mismatch).
Examples
>>> collection = client.get_or_create_collection("my_collection")
- has_collection(name: str) bool[source]
Check if a collection exists.
- Parameters:
name – The name of the collection to check.
- Returns:
True if the collection exists, False otherwise.
Examples
>>> if client.has_collection("my_collection"): ... print("Collection exists!")
- list_collections() list[Collection][source]
List all collections in the database.
- Returns:
A list of
Collectionobjects.
Examples
>>> collections = client.list_collections() >>> for col in collections: ... print(col.name)
- class pyseekdb.BaseConnection[source]
Bases:
ABCAbstract base class for connection management. Defines unified connection interface for all clients.
- abstract property mode: str
Return client mode (e.g., ‘SeekdbEmbeddedClient’, ‘RemoteServerClient’)
- pyseekdb.Client(path: str | None = None, host: str | None = None, port: int | None = None, tenant: str = 'sys', database: str = 'test', user: str | None = None, password: str = '', **kwargs) _ClientProxy[source]
Smart client factory function (returns ClientProxy for collection operations only)
Automatically selects embedded or remote server mode based on parameters: - If path is provided, uses embedded mode - If host/port is provided, uses remote server mode (supports both seekdb Server and OceanBase Server) - If neither path nor host is provided, defaults to embedded mode with current working directory as path
Returns a ClientProxy that only exposes collection operations. For database management, use AdminClient().
- Parameters:
path – seekdb data directory path (embedded mode). If not provided and host is also not provided, defaults to current working directory
host – server address (remote server mode)
port – server port (remote server mode, default 2881)
tenant – tenant name (remote server mode, default “sys” for seekdb Server, “test” for OceanBase)
database – database name
user – username (remote server mode, without tenant suffix)
password – password (remote server mode). If not provided, will be retrieved from SEEKDB_PASSWORD environment variable
**kwargs – other parameters
- Returns:
A proxy that only exposes collection operations
- Return type:
_ClientProxy
Examples
>>> # Embedded mode with explicit path >>> client = Client(path="/path/to/seekdb", database="db1") >>> client.create_collection("my_collection") # ✅ Available
>>> # Embedded mode (default, uses current working directory) >>> client = Client(database="db1") >>> client.create_collection("my_collection") # ✅ Available
>>> # Remote server mode (seekdb Server) >>> client = Client( ... host='localhost', ... port=2881, ... tenant="sys", ... database="db1", ... user="root", ... password="pass" ... )
>>> # Remote server mode (OceanBase Server) >>> client = Client( ... host='localhost', ... port=2881, ... tenant="test", ... database="db1", ... user="root", ... password="pass" ... )
- class pyseekdb.ClientAPI[source]
Bases:
ABCClient API interface for collection operations only. This is what end users interact with through the Client proxy.
- abstractmethod create_collection(name: str, configuration: ~pyseekdb.client.configuration.Configuration | ~pyseekdb.client.configuration.HNSWConfiguration | None = <pyseekdb.client.client_base._NotProvided object>, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>, **kwargs) Collection[source]
Create collection
- Parameters:
name – Collection name
configuration – Index configuration (Configuration or HNSWConfiguration). For backward compatibility, HNSWConfiguration is still accepted. Configuration can include fulltext analyzer configuration (FulltextIndexConfig).
embedding_function – Embedding function to convert documents to embeddings. Defaults to DefaultEmbeddingFunction. If explicitly set to None, collection will not have an embedding function. If provided, the dimension in configuration should match the embedding function’s output dimension.
**kwargs – Additional parameters
- abstractmethod get_collection(name: str, embedding_function: ~pyseekdb.client.embedding_function.EmbeddingFunction[str | list[str]] | None | ~typing.Any = <pyseekdb.client.client_base._NotProvided object>) Collection[source]
Get an existing collection.
- Parameters:
name – The name of the collection to retrieve.
embedding_function – The embedding function to use. If not provided, it will try to load the function used when creating the collection. If explicitly set to None, no embedding function will be used.
- Returns:
The
Collectionobject.- Raises:
ValueError – If the collection does not exist.
Examples
>>> collection = client.get_collection("my_collection")
- abstractmethod list_collections() list[Collection][source]
List all collections
- class pyseekdb.Collection(client: Any, name: str, collection_id: str | None = None, dimension: int | None = None, embedding_function: EmbeddingFunction[EmbeddingDocuments] | None = None, distance: str | None = None, **metadata)[source]
Bases:
objectCollection unified interface class
Design Principles: - Collection is a lightweight wrapper that only holds metadata - All operations delegate to the client via self._client._collection_*() methods - Different clients (OceanBase, Seekdb, Milvus, etc.) provide different implementations - Users see identical interface regardless of which client created the collection
- add(ids: str | list[str], embeddings: list[float] | list[list[float]] | None = None, metadatas: dict | list[dict] | None = None, documents: str | list[str] | None = None, **kwargs) None[source]
Add data to collection
- Parameters:
ids – Single ID or list of IDs
embeddings – Single embedding or list of embeddings (optional if documents provided and embedding_function is set)
metadatas – Single metadata dict or list of metadata dicts (optional)
documents – Single document or list of documents (optional) If provided without embeddings, embedding_function will be used to generate embeddings
**kwargs – Additional parameters
Examples
# Add single item with embeddings collection.add(ids=”1”, embeddings=[0.1, 0.2, 0.3], metadatas={“tag”: “A”})
# Add multiple items with embeddings collection.add(
ids=[“1”, “2”, “3”], embeddings=[[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]], metadatas=[{“tag”: “A”}, {“tag”: “B”}, {“tag”: “C”}]
)
# Add items with documents (embeddings will be auto-generated if embedding_function is set) collection.add(
ids=[“1”, “2”], documents=[“Hello world”, “How are you?”], metadatas=[{“tag”: “A”}, {“tag”: “B”}]
)
- property client: Any
Associated client
- count() int[source]
Get the number of items in collection
- Returns:
Item count
Examples
count = collection.count() print(f”Collection has {count} items”)
- delete(ids: str | list[str] | None = None, where: dict[str, Any] | None = None, where_document: dict[str, Any] | None = None, **kwargs) None[source]
Delete data from collection
- Parameters:
ids – Single ID or list of IDs to delete (optional)
where – Filter condition on metadata (optional)
where_document – Filter condition on documents (optional)
**kwargs – Additional parameters
Note
At least one of ids, where, or where_document must be provided
Examples
# Delete by IDs collection.delete(ids=[“1”, “2”, “3”])
# Delete by metadata filter collection.delete(where={“tag”: “A”})
# Delete by document filter collection.delete(where_document={“$contains”: “keyword”})
- property dimension: int | None
Vector dimension
- property distance: str | None
Distance metric used by the index (e.g., ‘l2’, ‘cosine’, ‘inner_product’)
- property embedding_function: EmbeddingFunction[EmbeddingDocuments] | None
Embedding function for this collection
- fork(forked_name: str) Collection[source]
Fork (duplicate) this collection to create a new collection with the same data.
The forked collection is independent - modifications to one collection do not affect the other. The original collection remains unchanged.
- Parameters:
forked_name – Name for the new forked collection. Must be a valid collection name (letters, digits, and underscores only, not empty).
- Returns:
The newly created forked collection.
- Return type:
- Raises:
ValueError – If fork is not enabled for this database, if the collection name is invalid, or if a collection with the given name already exists.
Note
Fork is only available for seekdb database version 1.1.0.0 or higher.
Examples: .. code-block:: python
# Fork a collection original = client.get_collection(“my_collection”) forked = original.fork(“my_collection_backup”)
# Verify both collections have the same data assert original.count() == forked.count()
# Add data to forked collection (original is unaffected) forked.add(ids=”new_id”, embeddings=[1.0, 2.0, 3.0], documents=”New document”) assert original.count() == 3 # Original unchanged assert forked.count() == 4 # Forked has new data
- get(ids: str | list[str] | None = None, where: dict[str, Any] | None = None, where_document: dict[str, Any] | None = None, limit: int | None = None, offset: int | None = None, include: list[str] | None = None, **kwargs) dict[str, Any][source]
Get data from collection by IDs or filters
- Parameters:
ids – Single ID or list of IDs to retrieve (optional)
where – Filter condition on metadata (optional)
where_document – Filter condition on documents (optional)
limit – Maximum number of results to return (optional)
offset – Number of results to skip (optional)
include – Fields to include in results, e.g., [“metadatas”, “documents”, “embeddings”] (optional)
**kwargs – Additional parameters
- Returns:
ids: List[str] - List of IDs
documents: Optional[List[str]] - List of documents (if included)
metadatas: Optional[List[Dict]] - List of metadata dictionaries (if included)
embeddings: Optional[List[List[float]]] - List of embeddings (if included)
- Return type:
Dict with keys (chromadb-compatible format)
Note
If no parameters provided, returns all data (up to limit)
Examples
# Get by single ID results = collection.get(ids=”1”) # results[“ids”] contains [“1”] # results[“documents”] contains document for ID “1”
# Get by multiple IDs results = collection.get(ids=[“1”, “2”, “3”]) # results[“ids”] contains [“1”, “2”, “3”] # results[“documents”] contains documents for all IDs
# Get by filter results = collection.get(
where={“tag”: “A”}, limit=10
) # results[“ids”] contains all matching IDs # results[“documents”] contains all matching documents
# Get all data results = collection.get(limit=100)
- hybrid_search(query: dict[str, Any] | None = None, knn: dict[str, Any] | None = None, rank: dict[str, Any] | None = None, n_results: int = 10, include: list[str] | None = None, **kwargs) dict[str, Any][source]
Hybrid search combining full-text search and vector similarity search
- Parameters:
query – Full-text search configuration dict with: - where_document: Document filter conditions (e.g., {“$contains”: “text”}) - where: Metadata filter conditions (e.g., {“page”: {“$gte”: 5}}) - n_results: Number of results for full-text search (optional)
knn – Vector search configuration dict with: - query_texts: Query text(s) to be embedded (optional if query_embeddings provided) - query_embeddings: Query vector(s) (optional if query_texts provided) - where: Metadata filter conditions (optional) - n_results: Number of results for vector search (optional)
rank – Ranking configuration dict (e.g., {“rrf”: {“rank_window_size”: 60, “rank_constant”: 60}})
n_results – Final number of results to return after ranking (default: 10)
include – Fields to include in results (e.g., [“documents”, “metadatas”, “embeddings”])
**kwargs – Additional parameters
- Returns:
ids: List[List[str]] - List of ID lists (one list for hybrid search result)
documents: Optional[List[List[str]]] - List of document lists (if included)
metadatas: Optional[List[List[Dict]]] - List of metadata lists (if included)
embeddings: Optional[List[List[List[float]]]] - List of embedding lists (if included)
distances: Optional[List[List[float]]] - List of distance lists
- Return type:
Dict with keys (query-compatible format)
Examples
# Hybrid search with both full-text and vector search results = collection.hybrid_search(
- query={
“where_document”: {“$contains”: “machine learning”}, “where”: {“category”: {“$eq”: “science”}}, “n_results”: 10
}, knn={
“query_texts”: [“AI research”], “where”: {“year”: {“$gte”: 2020}}, “n_results”: 10
}, rank={“rrf”: {}}, n_results=5, include=[“documents”, “metadatas”, “embeddings”]
) # results[“ids”][0] contains IDs for the hybrid search # results[“documents”][0] contains documents for the hybrid search # results[“distances”][0] contains distances for the hybrid search
- property id: str | None
Collection ID
- property metadata: dict[str, Any]
Collection metadata
- property name: str
Collection name
- peek(limit: int = 10) dict[str, Any][source]
Quickly preview the first few items in the collection
- Parameters:
limit – Number of items to preview (default: 10)
- Returns:
ids: List[str] - List of IDs
documents: List[str] - List of documents (always included)
metadatas: List[Dict] - List of metadata dictionaries (always included)
embeddings: List[List[float]] - List of embeddings (always included)
- Return type:
Dict with keys (chromadb-compatible format)
Examples
# Preview first 5 items (returns all columns by default) preview = collection.peek(limit=5) for i in range(len(preview[“ids”])):
print(f”ID: {preview[‘ids’][i]}, Document: {preview[‘documents’][i]}”) print(f”Metadata: {preview[‘metadatas’][i]}, Embedding: {preview[‘embeddings’][i]}”)
- query(query_embeddings: list[float] | list[list[float]] | None = None, query_texts: str | list[str] | None = None, n_results: int = 10, where: dict[str, Any] | None = None, where_document: dict[str, Any] | None = None, include: list[str] | None = None, **kwargs) dict[str, Any][source]
Query collection by vector similarity
- Parameters:
query_embeddings – Query vector(s) (optional if query_texts provided)
query_texts – Query text(s) to be embedded (optional if query_embeddings provided)
n_results – Number of results to return (default: 10)
where – Filter condition on metadata supporting: - Comparison operators: $eq, $lt, $gt, $lte, $gte, $ne, $in, $nin - Logical operators: $or, $and, $not
where_document – Filter condition on documents supporting: - $contains: full-text search - $regex: regular expression matching - Logical operators: $or, $and
include – Fields to include in results, e.g., [“documents”, “metadatas”, “embeddings”] (optional) By default, returns “documents” and “metadatas”. Always includes “_id”.
**kwargs – Additional parameters
- Returns:
ids: List[List[str]] - List of ID lists, one list per query
documents: Optional[List[List[str]]] - List of document lists, one list per query (if included)
metadatas: Optional[List[List[Dict]]] - List of metadata lists, one list per query (if included)
embeddings: Optional[List[List[List[float]]]] - List of embedding lists, one list per query (if included)
distances: Optional[List[List[float]]] - List of distance lists, one list per query
- Return type:
Dict with keys (chromadb-compatible format)
Examples
# Query by single embedding results = collection.query(
query_embeddings=[0.1, 0.2, 0.3], n_results=5
) # results[“ids”][0] contains IDs for the query # results[“documents”][0] contains documents for the query # results[“distances”][0] contains distances for the query
# Query by multiple embeddings results = collection.query(
query_embeddings=[[11.1, 12.1, 13.1], [1.1, 2.3, 3.2]], n_results=5
) # results[“ids”][0] contains IDs for first query # results[“ids”][1] contains IDs for second query
# Query with filters results = collection.query(
query_embeddings=[[0.1, 0.2, 0.3]], where={“chapter”: {“$gte”: 3}}, where_document={“$contains”: “machine learning”}, include=[“documents”, “metadatas”, “embeddings”]
)
# Query by texts (will be embedded automatically) results = collection.query(
query_texts=[“my query text”], n_results=10
)
# Query by multiple texts results = collection.query(
query_texts=[“text1”, “text2”], n_results=10
)
- update(ids: str | list[str], embeddings: list[float] | list[list[float]] | None = None, metadatas: dict | list[dict] | None = None, documents: str | list[str] | None = None, **kwargs) None[source]
Update existing data in collection
- Parameters:
ids – Single ID or list of IDs to update
embeddings – New embeddings (optional)
metadatas – New metadata (optional)
documents – New documents (optional)
**kwargs – Additional parameters
Note
IDs must exist, otherwise an error will be raised
Examples
# Update single item collection.update(ids=”1”, metadatas={“tag”: “B”})
# Update multiple items collection.update(
ids=[“1”, “2”], embeddings=[[0.9, 0.8], [0.7, 0.6]]
)
- upsert(ids: str | list[str], embeddings: list[float] | list[list[float]] | None = None, metadatas: dict | list[dict] | None = None, documents: str | list[str] | None = None, **kwargs) None[source]
Insert or update data in collection
- Parameters:
ids – Single ID or list of IDs
embeddings – embeddings (optional if documents provided)
metadatas – Metadata (optional)
documents – Documents (optional)
**kwargs – Additional parameters
Note
If ID exists, update it; otherwise, insert new data
Examples
# Upsert single item collection.upsert(ids=”1”, embeddings=[0.1, 0.2], metadatas={“tag”: “A”})
# Upsert multiple items collection.upsert(
ids=[“1”, “2”, “3”], embeddings=[[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]
)
- class pyseekdb.Configuration(hnsw: HNSWConfiguration | None = None, fulltext_config: FulltextIndexConfig | None = None)[source]
Bases:
objectConfiguration for collection creation
- Parameters:
hnsw – HNSWConfiguration or None
fulltext_config – FulltextIndexConfig or None. If None, defaults to FulltextIndexConfig(analyzer=’ik’)
- class pyseekdb.Database(name: str, tenant: str | None = None, charset: str | None = None, collation: str | None = None, **kwargs)[source]
Bases:
objectDatabase object representing a database instance.
Note
tenant is None for embedded/server mode (no tenant concept)
tenant is set for OceanBase mode (multi-tenant architecture)
- class pyseekdb.DefaultEmbeddingFunction(model_name: str = 'all-MiniLM-L6-v2', preferred_providers: list[str] | None = None)[source]
Bases:
EmbeddingFunction[str|list[str]]Default embedding function using ONNX runtime.
Uses the ‘all-MiniLM-L6-v2’ model via ONNX, which produces 384-dimensional embeddings. This is a lightweight, fast model suitable for general-purpose text embeddings.
Example
>>> ef = DefaultEmbeddingFunction() >>> embeddings = ef(["Hello world", "How are you?"]) >>> print(len(embeddings[0])) # 384
- property dimension: int
Get the dimension of embeddings produced by this function.
- get_config() dict[str, Any][source]
Get the configuration dictionary for the embedding function.
This method should return a dictionary that contains all the information needed to restore the embedding function after restart.
- Returns:
Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.
- class pyseekdb.EmbeddingFunction(*args, **kwargs)[source]
Bases:
Protocol[D]Protocol for embedding functions that convert documents to vectors.
This is similar to Chroma’s EmbeddingFunction interface. Implementations should convert text documents to vector embeddings.
Implementations should also provide: - name(): Static method that returns a unique name identifier for routing (not persisted in config) - get_config(): Instance method that returns a configuration dictionary - build_from_config(config): Static method that restores an instance from config
Example
>>> class MyEmbeddingFunction(EmbeddingFunction[Documents]): ... @staticmethod ... def name() -> str: ... return "my_embedding_function" ... def __call__(self, documents: Documents) -> Embeddings: ... # Convert documents to embeddings ... return [[0.1, 0.2, ...], [0.3, 0.4, ...]] ... def get_config(self) -> Dict[str, Any]: ... return {...} # Note: 'name' is not included ... @staticmethod ... def build_from_config(config: Dict[str, Any]) -> "MyEmbeddingFunction": ... return MyEmbeddingFunction(...) >>> >>> ef = MyEmbeddingFunction() >>> embeddings = ef(["Hello", "World"]) >>> config = ef.get_config() >>> restored_ef = MyEmbeddingFunction.build_from_config(config)
- abstractmethod get_config() dict[str, Any][source]
Get the configuration dictionary for the embedding function.
This method should return a dictionary that contains all the information needed to restore the embedding function after restart.
- Returns:
Dictionary containing the embedding function’s configuration. Note: The ‘name’ field is not included as it’s handled by the upper layer for routing.
- class pyseekdb.FulltextIndexConfig(analyzer: str = 'ik', properties: dict[str, str | int | float | bool] | None = None)[source]
Bases:
objectFulltext analyzer configuration for fulltext indexing.
- Parameters:
analyzer – Analyzer name, can be ‘space’, ‘ngram’, ‘ngram2’, ‘beng’, ‘ik’ and so on (default: ‘ik’)
properties – Optional dictionary of parser-specific parameters (key: string, value: primitive type)
- analyzer: str = 'ik'
- properties: dict[str, str | int | float | bool] | None = None
- class pyseekdb.HNSWConfiguration(dimension: int, distance: str = 'l2', properties: dict[str, str | int | float | bool] | None = None)[source]
Bases:
objectHNSW (Hierarchical Navigable Small World) index configuration
- Parameters:
dimension – Vector dimension (number of elements in each vector)
distance – Distance metric for similarity calculation (e.g., ‘l2’, ‘cosine’, ‘inner_product’)
properties – Optional dictionary of properties for the HNSW index (key: string, value: primitive type)
configuration](https (Please refer to [HNSW) – //en.oceanbase.com/docs/common-oceanbase-database-10000000003351043) for detailed information.
- dimension: int
- distance: str = 'l2'
- properties: dict[str, str | int | float | bool] | None = None
- class pyseekdb.RemoteServerClient(host: str = 'localhost', port: int = 2881, tenant: str = 'sys', database: str = 'test', user: str = 'root', password: str = '', charset: str = 'utf8mb4', **kwargs)[source]
Bases:
BaseClientRemote server mode client (connecting via pymysql, lazy loading)
Supports both seekdb Server and OceanBase Server. Uses user@tenant format for authentication.
- create_database(name: str, tenant: str = 'test') None[source]
Create database (remote server has tenant concept, uses client’s tenant)
- Parameters:
name – database name
tenant – tenant name (if different from client tenant, will use client tenant)
Note
Remote server has multi-tenant architecture. Database is scoped to client’s tenant.
- delete_database(name: str, tenant: str = 'test') None[source]
Delete database (remote server has tenant concept, uses client’s tenant)
- Parameters:
name – database name
tenant – tenant name (if different from client tenant, will use client tenant)
Note
Remote server has multi-tenant architecture. Database is scoped to client’s tenant.
- get_database(name: str, tenant: str = 'test') Database[source]
Get database object (remote server has tenant concept, uses client’s tenant)
- Parameters:
name – database name
tenant – tenant name (if different from client tenant, will use client tenant)
- Returns:
Database object with tenant information
Note
Remote server has multi-tenant architecture. Database is scoped to client’s tenant.
- list_databases(limit: int | None = None, offset: int | None = None, tenant: str = 'test') Sequence[Database][source]
List all databases (remote server has tenant concept, uses client’s tenant)
- Parameters:
limit – maximum number of results to return
offset – number of results to skip
tenant – tenant name (if different from client tenant, will use client tenant)
- Returns:
Sequence of Database objects with tenant information
Note
Remote server has multi-tenant architecture. Lists databases in client’s tenant.
- property mode: str
Return client mode (e.g., ‘SeekdbEmbeddedClient’, ‘RemoteServerClient’)
- class pyseekdb.SeekdbEmbeddedClient(path: str = './seekdb.db', database: str = 'test', **kwargs)[source]
Bases:
BaseClientEmbedded seekdb client (lazy connection)
Note: Only available on Linux platforms. pylibseekdb dependency is Linux-only.
- create_database(name: str, tenant: str = 'test') None[source]
Create database (tenant parameter ignored for embedded mode)
- Parameters:
name – database name
tenant – ignored for embedded mode (no tenant concept)
- delete_database(name: str, tenant: str = 'test') None[source]
Delete database (tenant parameter ignored for embedded mode)
- Parameters:
name – database name
tenant – ignored for embedded mode (no tenant concept)
- get_database(name: str, tenant: str = 'test') Database[source]
Get database object (tenant parameter ignored for embedded mode)
- Parameters:
name – database name
tenant – ignored for embedded mode (no tenant concept)
- list_databases(limit: int | None = None, offset: int | None = None, tenant: str = 'test') Sequence[Database][source]
List all databases (tenant parameter ignored for embedded mode)
- Parameters:
limit – maximum number of results to return
offset – number of results to skip
tenant – ignored for embedded mode (no tenant concept)
- property mode: str
Return client mode (e.g., ‘SeekdbEmbeddedClient’, ‘RemoteServerClient’)
- class pyseekdb.Version(version_str: str)[source]
Bases:
objectRepresents a version number with support for comparison operations.
Supports versions in format: x.x.x or x.x.x.x (3 or 4 numeric parts)
Examples
>>> v1 = Version("1.0.1.0") >>> v2 = Version("1.0.0.1") >>> v1 > v2 True
>>> v1 = Version("1.2.3") >>> v2 = Version("1.2.4") >>> v1 < v2 True
- property build: int
Get build version number (0 if not specified)
- property major: int
Get major version number
- property minor: int
Get minor version number
- property parts: tuple[int, int, int, int]
Get version parts as tuple
- property patch: int
Get patch version number
- pyseekdb.get_default_embedding_function() DefaultEmbeddingFunction[source]
Get or create the default embedding function instance.
- Returns:
DefaultEmbeddingFunction instance
- pyseekdb.register_embedding_function(embedding_function_class: type[T]) type[T][source]
Decorator to automatically register an embedding function class.
This decorator can be used as a class decorator to automatically register an embedding function when the class is defined, eliminating the need to manually call EmbeddingFunctionRegistry.register().
- Parameters:
embedding_function_class – The embedding function class to register. Must implement: - A static name() method that returns a unique identifier - A get_config() instance method that returns configuration dict - A static build_from_config(config) method to restore instances
- Returns:
The same class (for use as a decorator).
- Raises:
ValueError – If the class doesn’t have the required methods or if the name is already registered to a different class.
Example
>>> from pyseekdb.client.embedding_function import ( ... EmbeddingFunction, Documents, Embeddings, register_embedding_function ... ) >>> from typing import Dict, Any >>> >>> @register_embedding_function ... class MyCustomEmbeddingFunction(EmbeddingFunction[Documents]): ... def __init__(self, model_name: str = "my-model"): ... self.model_name = model_name ... ... def __call__(self, input: list[str]|str) -> list[list[float]]: ... # Your embedding logic ... return [[0.1, 0.2, 0.3] for _ in (input if isinstance(input, list) else [input])] ... ... @staticmethod ... def name() -> str: ... return "my_custom_embedding" ... ... def get_config(self) -> Dict[str, Any]: ... return {"model_name": self.model_name} ... ... @staticmethod ... def build_from_config(config: Dict[str, Any]) -> "MyCustomEmbeddingFunction": ... return MyCustomEmbeddingFunction(model_name=config.get("model_name", "my-model")) >>> >>> # The class is now automatically registered! >>> # You can use it immediately when creating collections >>> import pyseekdb >>> client = pyseekdb.Client(path="./seekdb.db") >>> ef = MyCustomEmbeddingFunction() >>> collection = client.create_collection("my_collection", embedding_function=ef)
Utility Modules
Embedding Functions
The following embedding function classes are available in pyseekdb.utils.embedding_functions:
Embedding function implementations for pyseekdb. |