The knowledge endpoint allows you to store information from text chunks, PDFs, and documents, making it available for retrieval in relevant segments. This is useful for building knowledge-aware assistants, providing memory for agents, or functioning as a datastore with enhanced semantic retrieval capabilities.

How it works

When you upload documents or data to the knowledge API, they are processed by chunking the content into manageable segments and generating vector embeddings for each chunk. This process may vary slightly depending on whether the content is a file or plain text.

When querying the knowledge store, the query is embedded using the same model as during the storage phase. The system then retrieves the chunks with the most similar vector embeddings—i.e., the ones most semantically relevant to the query. Before returning the results, a reranking step is performed to reorder the retrieved elements so that the most relevant appear first.

Uploading information

You can upload files such as PDFs, or just plain text strings of arbitrary length.

Here is an example where we index a set of support tickets:

from opperai import Opper
# Our SDK supports Pydantic to provide structured output
from pydantic import BaseModel
from typing import Literal
import os

opper = Opper(http_bearer=os.getenv("OPPER_API_KEY"))

# Define the support ticket structure
class SupportTicket(BaseModel):
    ticket_id: str
    issue_description: str
    issue_resolution: str
    status: Literal['open', 'in_progress', 'resolved', 'closed']

def main():

    kb = opper.knowledge.get_by_name(knowledge_base_name="Tickets")

    if not kb:
        kb = opper.knowledge.create(
            name="Tickets"
        )

    ticket = SupportTicket(
        ticket_id="123",
        issue_description="I'm having trouble accessing my account. Whenever I try to log in, I receive an error message stating that my credentials are incorrect. I have tried resetting my password multiple times, but the issue persists. Please assist in resolving this matter as soon as possible.",
        issue_resolution="The issue was resolved by verifying the user's identity and resetting the account credentials from the backend. The user was able to log in successfully after the credentials were reset.",
        status="resolved"
    )

    opper.knowledge.add(
        knowledge_base_id=kb.id,
        key=ticket.ticket_id, # unique key, will overwrite existing data with that key
        content=ticket.model_dump_json(),
        metadata={
            "source": "our_ticket_system",
            "status": ticket.status
        }
    )

    res = opper.knowledge.query(knowledge_base_id=kb.id, query="Can't login", top_k=3)

    print(res)

main()

main()

This yields

{ 
    'id'="06ffa4f6-170c-4931-8675-9bc4f53c2a2d", 
    key='06ffa4f6-170c-4931-8675-9bc4f53c2a2d', 
    content='{"ticket_id":"123","issue_description":"I\'m having trouble accessing my account. Whenever I try to log in, I receive an error message stating that my credentials are incorrect. I have tried resetting my password multiple times, but the issue persists. Please assist in resolving this matter as soon as possible.","issue_resolution":"The issue was resolved by verifying the user\'s identity and resetting the account credentials from the backend. The user was able to log in successfully after the credentials were reset.",,"status":"resolved"}', 
    metadata={'source': 'our_ticket_system', 'status': 'resolved', 'priority': 'high', 'customer_name': 'John Doe'}, score=14.421875
}
For indexing files like PDFs and other documents, please refer to the API reference for more detailed instructions.

Specifying filters

We can specify filters in the query, like this:

tickets = opper.knowledge.query(
    knowledge_base_id=kb.id, 
    query="Can't login", 
    top_k=3,
    filters=[
        {"field": "status", "operation": "=", "value": "resolved"},
        {"field": "source", "operation": "=", "value": "our_ticket_system"},
    ]
)

Use in task completions

Retrieval results can be used as context to task completions, like this:

class SuggestResolution(BaseModel):
    thoughts: str
    message: str
    reference_ticket_ids: list[int]

completion = opper.call(
    name="suggest_resolution",
    instructions="Given a user question and a list of potentially relevant past tickets, provide a suggestion for a resolution to the support agent",
    input={
        "past_tickets": tickets,
        "user_issue": "Can't login"
    },  
    output_schema=SuggestResolution
)

print(completion.json_payload)

This yields

{
    'thoughts': "The user's issue seems similar to a previous ticket where the problem involved login errors and resetting credentials. It would be prudent to verify the user's identity and consider backend credential resetting, as was effective in the past case.", 
    'message': "Based on a similar past ticket, this issue might be resolved by verifying the user's identity and resetting their credentials at the backend. Once complete, guide the user to attempt logging in with the new credentials.", 
    'reference_ticket_ids': [123]
}
Here we pass retrieval results directly as context to the task. There may be metadata not necessary in the retrieved results so it is often smart to only pull the relevant parts of the retrieval results into the task.

Inspecting Knowledge Bases

On the portal you can see all knowledge bases you have created. You can also use the portal to create, update and delete them.