5. DQL Operations

DQL (Data Query Language) operations allow you to retrieve data from collections using various query methods.

5.2 Get (Retrieve by IDs or Filters)

The get() method retrieves documents from a collection without vector similarity search. It supports filtering by IDs, metadata, and document content.

# Get by single ID
results = collection.get(ids="123")

# Get by multiple IDs
results = collection.get(ids=["1", "2", "3"])

# Get by metadata filter (simplified equality - both forms are supported)
results = collection.get(
    where={"category": "AI"},
    limit=10
)
# Or use explicit $eq operator:
# where={"category": {"$eq": "AI"}}

# Get by comparison operator
results = collection.get(
    where={"score": {"$gte": 90}},
    limit=10
)

# Get by $in operator
results = collection.get(
    where={"tag": {"$in": ["ml", "python"]}},
    limit=10
)

# Get by logical operators ($or) - simplified equality
results = collection.get(
    where={
        "$or": [
            {"category": "AI"},
            {"tag": "python"}
        ]
    },
    limit=10
)

# Get by document content filter
results = collection.get(
    where_document={"$contains": "machine learning"},
    limit=10
)

# Get with combined filters
results = collection.get(
    where={"category": {"$eq": "AI"}},
    where_document={"$contains": "machine"},
    limit=10
)

# Get with pagination
results = collection.get(limit=2, offset=1)

# Get with specific fields
results = collection.get(
    ids=["1", "2"],
    include=["documents", "metadatas", "embeddings"]
)

# Get all data (up to limit)
results = collection.get(limit=100)

Parameters:

  • ids (str or List[str], optional): Single ID or list of IDs to retrieve

  • where (dict, optional): Metadata filter conditions (see Filter Operators section)

  • where_document (dict, optional): Document content filter using $contains for full-text search

  • limit (int, optional): Maximum number of results to return

  • offset (int, optional): Number of results to skip for pagination

  • include (List[str], optional): List of fields to include: ["documents", "metadatas", "embeddings"]

Returns: Dict with keys (chromadb-compatible format):

  • ids: List[str] - List of IDs

  • documents: Optional[List[str]] - List of documents (if included)

  • metadatas: Optional[List[Dict]] - List of metadata dictionaries (if included)

  • embeddings: Optional[List[List[float]]] - List of embeddings (if included)

Usage:

# Get by single ID
results = collection.get(ids="123")
# results["ids"] contains ["123"]
# results["documents"] contains document for ID "123"

# Get by multiple IDs
results = collection.get(ids=["1", "2", "3"])
# results["ids"] contains ["1", "2", "3"]
# results["documents"] contains documents for all IDs

# Get by filter
results = collection.get(where={"category": {"$eq": "AI"}}, limit=10)
# results["ids"] contains all matching IDs
# results["documents"] contains all matching documents

Note: If no parameters provided, returns all data (up to limit).

5.4 Filter Operators

Metadata Filters (where parameter)

  • $eq (or direct equality) / $ne / $gt / $gte / $lt / $lte

  • $in / $nin for membership checks

  • $or / $and for logical composition

  • $not for negation

  • #id to filter by primary key (e.g., {"#id": {"$in": ["id1", "id2"]}})

Document Filters (where_document parameter)

  • $contains: full-text match

  • $not_contains: exclude matches

  • $or / $and combining multiple $contains clauses

5.5 Collection Information Methods

# Get item count
count = collection.count()
print(f"Collection has {count} items")

# Preview first few items in collection (returns all columns by default)
preview = collection.peek(limit=5)
for i in range(len(preview["ids"])):
    print(f"ID: {preview['ids'][i]}, Document: {preview['documents'][i]}")
    print(f"Metadata: {preview['metadatas'][i]}, Embedding: {preview['embeddings'][i]}")

# Count collections in database
collection_count = client.count_collection()
print(f"Database has {collection_count} collections")

Methods:

  • collection.count() - Get the number of items in the collection

  • collection.peek(limit=10) - Quickly preview the first few items in the collection

  • client.count_collection() - Count the number of collections in the current database