4. DML Operations

DML (Data Manipulation Language) operations allow you to insert, update, and delete data in collections.

4.1 Add Data

The add() method inserts new records into a collection. If a record with the same ID already exists, an error will be raised.

Behavior with Embedding Function:

  1. If embeddings are provided: Embeddings are used directly, embedding_function is NOT called (even if provided)

  2. If embeddings are NOT provided but documents are provided:

    • If collection has an embedding_function (set during creation or retrieval), it will automatically generate embeddings from documents

    • If collection does NOT have an embedding_function, a ValueError will be raised

  3. If neither embeddings nor documents are provided: A ValueError will be raised

# Add single item with embeddings (embedding_function not used)
collection.add(
    ids="item1",
    embeddings=[0.1, 0.2, 0.3],
    documents="This is a document",
    metadatas={"category": "AI", "score": 95}
)

# Add multiple items with embeddings (embedding_function not used)
collection.add(
    ids=["item1", "item2", "item3"],
    embeddings=[
        [0.1, 0.2, 0.3],
        [0.4, 0.5, 0.6],
        [0.7, 0.8, 0.9]
    ],
    documents=[
        "Document 1",
        "Document 2",
        "Document 3"
    ],
    metadatas=[
        {"category": "AI", "score": 95},
        {"category": "ML", "score": 88},
        {"category": "DL", "score": 92}
    ]
)

# Add with only embeddings (no documents)
collection.add(
    ids=["vec1", "vec2"],
    embeddings=[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]
)

# Add with only documents - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection.add(
    ids=["doc1", "doc2"],
    documents=["Text document 1", "Text document 2"],
    metadatas=[{"tag": "A"}, {"tag": "B"}]
)
# The collection's embedding_function will automatically convert documents to embeddings

Parameters:

  • ids (str or List[str]): Single ID or list of IDs (required)

  • embeddings (List[float] or List[List[float]], optional): Single embedding or list of embeddings

    • If provided, used directly (embedding_function is ignored)

    • If not provided, must provide documents and collection must have embedding_function

  • documents (str or List[str], optional): Single document or list of documents

    • If embeddings not provided, documents will be converted to embeddings using collection’s embedding_function

  • metadatas (dict or List[dict], optional): Single metadata dict or list of metadata dicts

Note: The embedding_function used is the one associated with the collection (set during create_collection() or get_collection()). You cannot override it per-operation.

4.2 Update Data

The update() method updates existing records in a collection. Records must exist, otherwise an error will be raised.

Behavior with Embedding Function:

  1. If embeddings are provided: Embeddings are used directly, embedding_function is NOT called

  2. If embeddings are NOT provided but documents are provided:

    • If collection has an embedding_function, it will automatically generate embeddings from documents

    • If collection does NOT have an embedding_function, a ValueError will be raised

  3. If neither embeddings nor documents are provided: Only metadata will be updated (metadata-only update is allowed)

# Update single item - metadata only (embedding_function not used)
collection.update(
    ids="item1",
    metadatas={"category": "AI", "score": 98}  # Update metadata only
)

# Update multiple items with embeddings (embedding_function not used)
collection.update(
    ids=["item1", "item2"],
    embeddings=[[0.9, 0.8, 0.7], [0.6, 0.5, 0.4]],  # Update embeddings
    documents=["Updated document 1", "Updated document 2"]  # Update documents
)

# Update with documents only - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection.update(
    ids="item1",
    documents="New document text",  # Embeddings will be auto-generated
    metadatas={"category": "AI"}
)

# Update specific fields - only document (embeddings auto-generated)
collection.update(
    ids="item1",
    documents="New document text"  # Only update document, embeddings auto-generated
)

Parameters:

  • ids (str or List[str]): Single ID or list of IDs to update (required)

  • embeddings (List[float] or List[List[float]], optional): New embeddings

    • If provided, used directly (embedding_function is ignored)

    • If not provided, can provide documents to auto-generate embeddings

  • documents (str or List[str], optional): New documents

    • If embeddings not provided, documents will be converted to embeddings using collection’s embedding_function

  • metadatas (dict or List[dict], optional): New metadata

Note: Metadata-only updates (no embeddings, no documents) are allowed. The embedding_function used is the one associated with the collection.

4.3 Upsert Data

The upsert() method inserts new records or updates existing ones. If a record with the given ID exists, it will be updated; otherwise, a new record will be inserted.

Behavior with Embedding Function:

  1. If embeddings are provided: Embeddings are used directly, embedding_function is NOT called

  2. If embeddings are NOT provided but documents are provided:

    • If collection has an embedding_function, it will automatically generate embeddings from documents

    • If collection does NOT have an embedding_function, a ValueError will be raised

  3. If neither embeddings nor documents are provided: Only metadata will be upserted (metadata-only upsert is allowed)

# Upsert single item with embeddings (embedding_function not used)
collection.upsert(
    ids="item1",
    embeddings=[0.1, 0.2, 0.3],
    documents="Document text",
    metadatas={"category": "AI", "score": 95}
)

# Upsert multiple items with embeddings (embedding_function not used)
collection.upsert(
    ids=["item1", "item2", "item3"],
    embeddings=[
        [0.1, 0.2, 0.3],
        [0.4, 0.5, 0.6],
        [0.7, 0.8, 0.9]
    ],
    documents=["Doc 1", "Doc 2", "Doc 3"],
    metadatas=[
        {"category": "AI"},
        {"category": "ML"},
        {"category": "DL"}
    ]
)

# Upsert with documents only - embeddings auto-generated by embedding_function
# Requires: collection must have embedding_function set
collection.upsert(
    ids=["item1", "item2"],
    documents=["Document 1", "Document 2"],
    metadatas=[{"category": "AI"}, {"category": "ML"}]
)
# The collection's embedding_function will automatically convert documents to embeddings

Parameters:

  • ids (str or List[str]): Single ID or list of IDs (required)

  • embeddings (List[float] or List[List[float]], optional): Embeddings

    • If provided, used directly (embedding_function is ignored)

    • If not provided, can provide documents to auto-generate embeddings

  • documents (str or List[str], optional): Documents

    • If embeddings not provided, documents will be converted to embeddings using collection’s embedding_function

  • metadatas (dict or List[dict], optional): Metadata

Note: Metadata-only upserts (no embeddings, no documents) are allowed. The embedding_function used is the one associated with the collection.

4.4 Delete Data

The delete() method removes records from a collection. You can delete by IDs, metadata filters, or document filters.

# Delete by IDs
collection.delete(ids=["item1", "item2", "item3"])

# Delete by single ID
collection.delete(ids="item1")

# Delete by metadata filter
collection.delete(where={"category": {"$eq": "AI"}})

# Delete by comparison operator
collection.delete(where={"score": {"$lt": 50}})

# Delete by document filter
collection.delete(where_document={"$contains": "obsolete"})

# Delete with combined filters
collection.delete(
    where={"category": {"$eq": "AI"}},
    where_document={"$contains": "deprecated"}
)

Parameters:

  • ids (str or List[str], optional): Single ID or list of IDs to delete

  • where (dict, optional): Metadata filter conditions (see Filter Operators section)

  • where_document (dict, optional): Document filter conditions

Note: At least one of ids, where, or where_document must be provided.