This is an interactive notebook. You can run it locally or use the links below:
๐ Prerequisites
Before you can run a Weave evaluation, complete the following prerequisites.- Install the W&B Weave SDK and log in with your API key.
- Install the OpenAI SDK and log in with your API key.
- Initialize your W&B project.
๐ Run your first evaluation
The following code sample shows how to evaluate an LLM using WeaveโsModel and Evaluation APIs. First, define a Weave model by subclassing weave.Model, specifying the model name and prompt format, and tracking a predict method with @weave.op. The predict method sends a prompt to OpenAI and parses the response into a structured output using a Pydantic schema (FruitExtract). Then, create a small evaluation dataset consisting of input sentences and expected targets. Next, define a custom scoring function (also tracked using @weave.op) that compares the modelโs output to the target label. Finally, wrap everything in a weave.Evaluation, specifying your dataset and scorers, and call evaluate() to run the evaluation pipeline asynchronously.
๐ Looking for more examples?
- Learn how to build an evlauation pipeline end-to-end.
- Learn how to evaluate a RAG application by building.