3. Collection (Table) Management
Collections are the primary data structures in pyseekdb, similar to tables in traditional databases. Each collection stores documents with vector embeddings, metadata, and full-text search capabilities.
3.1 Creating a Collection
import pyseekdb
from pyseekdb import (
DefaultEmbeddingFunction,
HNSWConfiguration,
Configuration,
FulltextIndexConfig
)
# Create a client
client = pyseekdb.Client(host="127.0.0.1", port=2881, database="test")
# Create a collection with default configuration
collection = client.create_collection(
name="my_collection"
# embedding_function defaults to DefaultEmbeddingFunction() (384 dimensions)
)
# Create a collection with custom embedding function
# Dimension will be automatically calculated from embedding function
ef = UserDefinedEmbeddingFunction(model_name='all-MiniLM-L6-v2')
collection = client.create_collection(
name="my_collection",
embedding_function=ef
)
# Recommended: Create a collection with Configuration wrapper
# Using IK parser (default for Chinese text)
config = Configuration(
hnsw=HNSWConfiguration(dimension=384, distance='cosine'),
fulltext_config=FulltextIndexConfig(analyzer='ik')
)
collection = client.create_collection(
name="my_collection",
configuration=config,
embedding_function=ef
)
# Recommended: Create a collection with Configuration (only HNSW config, uses default parser)
config = Configuration(
hnsw=HNSWConfiguration(dimension=384, distance='cosine')
)
collection = client.create_collection(
name="my_collection",
configuration=config,
embedding_function=ef
)
# Create a collection with Space parser (for space-separated languages)
config = Configuration(
hnsw=HNSWConfiguration(dimension=384, distance='cosine'),
fulltext_config=FulltextIndexConfig(analyzer='space')
)
collection = client.create_collection(
name="my_collection",
configuration=config,
embedding_function=ef
)
# Create a collection with Ngram parser and custom parameters
config = Configuration(
hnsw=HNSWConfiguration(dimension=384, distance='cosine'),
fulltext_config=FulltextIndexConfig(analyzer='ngram', properties={'ngram_token_size': 3})
)
collection = client.create_collection(
name="my_collection",
configuration=config,
embedding_function=ef
)
# Create a collection without embedding function (embeddings must be provided manually)
# Recommended: Use Configuration wrapper
config = Configuration(
hnsw=HNSWConfiguration(dimension=128, distance='cosine')
)
collection = client.create_collection(
name="my_collection",
configuration=config,
embedding_function=None # Explicitly disable embedding function
)
# Get or create collection (creates if doesn't exist)
collection = client.get_or_create_collection(
name="my_collection",
)
Parameters:
name(str): Collection name (required). Must be non-empty, use only letters/digits/underscore ([a-zA-Z0-9_]), and be at most 512 characters.configuration(Configuration, HNSWConfiguration, or None, optional): Index configurationRecommended:
Configuration- Wrapper class that can include bothHNSWConfigurationandFulltextIndexConfigUse
Configuration(hnsw=HNSWConfiguration(...))even when only vector index config is neededAllows easy addition of fulltext index config later
HNSWConfiguration: Vector index configuration withdimensionanddistancemetric (backward compatibility)If not provided, uses default (dimension=384, distance=’cosine’, analyzer=’ik’)
If set to
None, dimension will be calculated fromembedding_function
embedding_function(EmbeddingFunction, optional): Function to convert documents to embeddingsIf not provided, uses
DefaultEmbeddingFunction()(384 dimensions)If set to
None, collection will not have an embedding functionIf provided, the dimension will be automatically calculated and validated against
configuration.dimension
Fulltext Index Options:
'ik'(default): IK parser for Chinese text segmentation'space': Space-separated tokenizer for languages like English'ngram': N-gram tokenizer'ngram2': 2-gram tokenizer'beng': Bengali text parser
For more information about parser, please refer to create_index section tokenizer_option.
Note: When embedding_function is provided, the system will automatically calculate the vector dimension by calling the function. If configuration.dimension is also provided, it must match the embedding function’s dimension, otherwise a ValueError will be raised.
3.2 Getting a Collection
# Get an existing collection (uses default embedding function if collection doesn't have one)
collection = client.get_collection("my_collection")
# Get collection with specific embedding function
ef = DefaultEmbeddingFunction(model_name='all-MiniLM-L6-v2')
collection = client.get_collection("my_collection", embedding_function=ef)
# Get collection without embedding function
collection = client.get_collection("my_collection", embedding_function=None)
# Check if collection exists
if client.has_collection("my_collection"):
collection = client.get_collection("my_collection")
Parameters:
name(str): Collection name (required)embedding_function(EmbeddingFunction, optional): Embedding function to use for this collectionIf not provided, uses
DefaultEmbeddingFunction()by defaultIf set to
None, collection will not have an embedding functionImportant: The embedding function set here will be used for all operations on this collection (add, upsert, update, query, hybrid_search) when documents/texts are provided without embeddings
3.3 Listing Collections
# List all collections
collections = client.list_collections()
for coll in collections:
print(f"Collection: {coll.name}, Dimension: {coll.dimension}")
# Count collections in database
collection_count = client.count_collection()
print(f"Database has {collection_count} collections")
3.4 Deleting a Collection
# Delete a collection
client.delete_collection("my_collection")
3.5 Collection Properties
Each Collection object has the following properties:
name(str): Collection nameid(str, optional): Collection unique identifierdimension(int, optional): Vector dimensionembedding_function(EmbeddingFunction, optional): Embedding function associated with this collectiondistance(str): Distance metric used by the index (e.g., ‘l2’, ‘cosine’, ‘inner_product’)metadata(dict): Collection metadata
Accessing Embedding Function:
collection = client.get_collection("my_collection")
if collection.embedding_function is not None:
print(f"Collection uses embedding function: {collection.embedding_function}")
print(f"Embedding dimension: {collection.embedding_function.dimension}")