4. DML Operations
DML (Data Manipulation Language) operations allow you to insert, update, and delete data in collections.
4.1 Add Data
The add() method inserts new records into a collection. If a record with the same ID already exists, an error will be raised.
Behavior with Embedding Function:
If
embeddingsare provided: Embeddings are used directly,embedding_functionis NOT called (even if provided)If
embeddingsare NOT provided butdocumentsare provided:If collection has an
embedding_function(set during creation or retrieval), it will automatically generate embeddings from documentsIf collection does NOT have an
embedding_function, aValueErrorwill be raised
If neither
embeddingsnordocumentsare provided: AValueErrorwill be raised
# Add single item with embeddings (embedding_function not used)
collection.add(
ids="item1",
embeddings=[0.1, 0.2, 0.3],
documents="This is a document",
metadatas={"category": "AI", "score": 95}
)
# Add multiple items with embeddings (embedding_function not used)
collection.add(
ids=["item1", "item2", "item3"],
embeddings=[
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9]
],
documents=[
"Document 1",
"Document 2",
"Document 3"
],
metadatas=[
{"category": "AI", "score": 95},
{"category": "ML", "score": 88},
{"category": "DL", "score": 92}
]
)
# Add with only embeddings (no documents)
collection.add(
ids=["vec1", "vec2"],
embeddings=[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
)
# Add with only documents - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection.add(
ids=["doc1", "doc2"],
documents=["Text document 1", "Text document 2"],
metadatas=[{"tag": "A"}, {"tag": "B"}]
)
# The collection's embedding_function will automatically convert documents to embeddings
Parameters:
ids(str or List[str]): Single ID or list of IDs (required)embeddings(List[float] or List[List[float]], optional): Single embedding or list of embeddingsIf provided, used directly (embedding_function is ignored)
If not provided, must provide
documentsand collection must haveembedding_function
documents(str or List[str], optional): Single document or list of documentsIf
embeddingsnot provided,documentswill be converted to embeddings using collection’sembedding_function
metadatas(dict or List[dict], optional): Single metadata dict or list of metadata dicts
Note: The embedding_function used is the one associated with the collection (set during create_collection() or get_collection()). You cannot override it per-operation.
4.2 Update Data
The update() method updates existing records in a collection. Records must exist, otherwise an error will be raised.
Behavior with Embedding Function:
If
embeddingsare provided: Embeddings are used directly,embedding_functionis NOT calledIf
embeddingsare NOT provided butdocumentsare provided:If collection has an
embedding_function, it will automatically generate embeddings from documentsIf collection does NOT have an
embedding_function, aValueErrorwill be raised
If neither
embeddingsnordocumentsare provided: Only metadata will be updated (metadata-only update is allowed)
# Update single item - metadata only (embedding_function not used)
collection.update(
ids="item1",
metadatas={"category": "AI", "score": 98} # Update metadata only
)
# Update multiple items with embeddings (embedding_function not used)
collection.update(
ids=["item1", "item2"],
embeddings=[[0.9, 0.8, 0.7], [0.6, 0.5, 0.4]], # Update embeddings
documents=["Updated document 1", "Updated document 2"] # Update documents
)
# Update with documents only - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection.update(
ids="item1",
documents="New document text", # Embeddings will be auto-generated
metadatas={"category": "AI"}
)
# Update specific fields - only document (embeddings auto-generated)
collection.update(
ids="item1",
documents="New document text" # Only update document, embeddings auto-generated
)
Parameters:
ids(str or List[str]): Single ID or list of IDs to update (required)embeddings(List[float] or List[List[float]], optional): New embeddingsIf provided, used directly (embedding_function is ignored)
If not provided, can provide
documentsto auto-generate embeddings
documents(str or List[str], optional): New documentsIf
embeddingsnot provided,documentswill be converted to embeddings using collection’sembedding_function
metadatas(dict or List[dict], optional): New metadata
Note: Metadata-only updates (no embeddings, no documents) are allowed. The embedding_function used is the one associated with the collection.
4.3 Upsert Data
The upsert() method inserts new records or updates existing ones. If a record with the given ID exists, it will be updated; otherwise, a new record will be inserted.
Behavior with Embedding Function:
If
embeddingsare provided: Embeddings are used directly,embedding_functionis NOT calledIf
embeddingsare NOT provided butdocumentsare provided:If collection has an
embedding_function, it will automatically generate embeddings from documentsIf collection does NOT have an
embedding_function, aValueErrorwill be raised
If neither
embeddingsnordocumentsare provided: Only metadata will be upserted (metadata-only upsert is allowed)
# Upsert single item with embeddings (embedding_function not used)
collection.upsert(
ids="item1",
embeddings=[0.1, 0.2, 0.3],
documents="Document text",
metadatas={"category": "AI", "score": 95}
)
# Upsert multiple items with embeddings (embedding_function not used)
collection.upsert(
ids=["item1", "item2", "item3"],
embeddings=[
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
[0.7, 0.8, 0.9]
],
documents=["Doc 1", "Doc 2", "Doc 3"],
metadatas=[
{"category": "AI"},
{"category": "ML"},
{"category": "DL"}
]
)
# Upsert with documents only - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection.upsert(
ids=["item1", "item2"],
documents=["Document 1", "Document 2"],
metadatas=[{"category": "AI"}, {"category": "ML"}]
)
# The collection's embedding_function will automatically convert documents to embeddings
Parameters:
ids(str or List[str]): Single ID or list of IDs (required)embeddings(List[float] or List[List[float]], optional): EmbeddingsIf provided, used directly (embedding_function is ignored)
If not provided, can provide
documentsto auto-generate embeddings
documents(str or List[str], optional): DocumentsIf
embeddingsnot provided,documentswill be converted to embeddings using collection’sembedding_function
metadatas(dict or List[dict], optional): Metadata
Note: Metadata-only upserts (no embeddings, no documents) are allowed. The embedding_function used is the one associated with the collection.
4.4 Delete Data
The delete() method removes records from a collection. You can delete by IDs, metadata filters, or document filters.
# Delete by IDs
collection.delete(ids=["item1", "item2", "item3"])
# Delete by single ID
collection.delete(ids="item1")
# Delete by metadata filter
collection.delete(where={"category": {"$eq": "AI"}})
# Delete by comparison operator
collection.delete(where={"score": {"$lt": 50}})
# Delete by document filter
collection.delete(where_document={"$contains": "obsolete"})
# Delete with combined filters
collection.delete(
where={"category": {"$eq": "AI"}},
where_document={"$contains": "deprecated"}
)
Parameters:
ids(str or List[str], optional): Single ID or list of IDs to deletewhere(dict, optional): Metadata filter conditions (see Filter Operators section)where_document(dict, optional): Document filter conditions
Note: At least one of ids, where, or where_document must be provided.