- Python
- TypeScript
Scorers are passed to a
weave.Evaluation object during evaluation. There are two types of Scorers in weave:- Function-based Scorers: Simple Python functions decorated with
@weave.op. - Class-based Scorers: Python classes that inherit from
weave.Scorerfor more complex evaluations.
Create your own Scorers
Ready-to-Use Scorers
While this guide shows you how to create custom scorers, Weave comes with a variety of predefined scorers and local SLM scorers that you can use right away, including:
Function-based Scorers
- Python
- TypeScript
These are functions decorated with When the evaluation is run,
@weave.op that return a dictionary. They’re great for simple evaluations like:evaluate_uppercase checks if the text is all uppercase.Class-based Scorers
- Python
- TypeScript
For more advanced evaluations, especially when you need to keep track of additional scorer metadata, try different prompts for your LLM-evaluators, or make multiple function calls, you can use the This class evaluates how good a summary is by comparing it to the original text.
Scorer class.Requirements:- Inherit from
weave.Scorer. - Define a
scoremethod decorated with@weave.op. - The
scoremethod must return a dictionary.
How Scorers Work
Scorer Keyword Arguments
- Python
- TypeScript
Scorers can access both the output from your AI system and the input data from the dataset row.When a weave Mapping Column Names with
Sometimes, the Now, the
- Input: If you would like your scorer to use data from your dataset row, such as a “label” or “target” column then you can easily make this available to the scorer by adding a
labelortargetkeyword argument to your scorer definition.
score class method) would have a parameter list like this:Evaluation is run, the output of the AI system is passed to the output parameter. The Evaluation also automatically tries to match any additional scorer argument names to your dataset columns. If customizing your scorer arguments or dataset columns is not feasible, you can use column mapping - see below for more.- Output: Include an
outputparameter in your scorer function’s signature to access the AI system’s output.
Mapping Column Names with column_map
Sometimes, the score methods’ argument names don’t match the column names in your dataset. You can fix this using a column_map.If you’re using a class-based scorer, pass a dictionary to the column_map attribute of Scorer when you initialize your scorer class. This dictionary maps your score method’s argument names to the dataset’s column names, in the order: {scorer_keyword_argument: dataset_column_name}.Example:text argument in the score method will receive data from the news_article dataset column.Notes:- Another equivalent option to map your columns is to subclass the
Scorerand overload thescoremethod mapping the columns explicitly.
Final summarization of the scorer
- Python
- TypeScript
During evaluation, the scorer will be computed for each row of your dataset. To provide a final score for the evaluation we provide an
auto_summarize depending on the returning type of the output.- Averages are computed for numerical columns
- Count and fraction for boolean columns
- Other column types are ignored
summarize method on the Scorer class and provide your own way of computing the final scores. The summarize function expects:- A single parameter
score_rows: This is a list of dictionaries, where each dictionary contains the scores returned by thescoremethod for a single row of your dataset. - It should return a dictionary containing the summarized scores.
In this example, the default auto_summarize would have returned the count and proportion of True.
If you want to learn more, check the implementation of CorrectnessLLMJudge.Applying Scorers to a Call
To apply scorers to your Weave ops, you’ll need to use the.call() method which provides access to both the operation’s result and its tracking information. This allows you to associate scorer results with specific calls in Weave’s database.
For more information on how to use the .call() method, see the Calling Ops guide.
- Python
- TypeScript
Here’s a basic example:You can also apply multiple scorers to the same call:Notes:
- Scorer results are automatically stored in Weave’s database
- Scorers run asynchronously after the main operation completes
- You can view scorer results in the UI or query them via the API
Use preprocess_model_input
You can use the preprocess_model_input parameter to modify dataset examples before they reach your model during evaluation.
For usage information and an example, see Using preprocess_model_input to format dataset rows before evaluating.
Score Analysis
In this section, we’ll show you how to analyze the scores for a single call, multiple calls, and all calls scored by a specific scorer.Analyze a single Call’s Scores
Single Call API
To retrieve the calls for a single call, you can use theget_call method.
Single Call UI

Analyze multiple Calls’ Scores
Multiple Calls API
To retrieve the calls for multiple calls, you can use theget_calls method.
Multiple Calls UI

Analyze all Calls scored by a specific Scorer
All Calls by Scorer API
To retrieve all calls scored by a specific scorer, you can use theget_calls method.
All Calls by Scorer UI
Finally, if you would like to see all the calls scored by a Scorer, navigate to the Scorers Tab in the UI and select “Programmatic Scorer” tab. Click your Scorer to open the Scorer details page.
View Traces button under Scores to view all the calls scored by your Scorer.

