[Evaluation] Normalize evaluator validation errors to EvaluationException with USER_ERROR blame by m7md7sien · Pull Request #47735 · Azure/azure-sdk-for-python

m7md7sien · 2026-06-28T19:49:36Z

Summary

Normalizes evaluation/validation error handling in azure-ai-evaluation so that user input and configuration errors are consistently raised as EvaluationException with blame=ErrorBlame.USER_ERROR (plus an appropriate category and target).

Previously several evaluators raised bare ValueError/TypeError for input/threshold validation, and a few existing EvaluationException raises did not set blame, so they defaulted to Unknown/InternalError even though they were caused by user input.

Changes

Raw ValueError/TypeError → EvaluationException(USER_ERROR)

ContentSafetyEvaluator — threshold type check
QAEvaluator — threshold type check
RougeScoreEvaluator — threshold type check
DocumentRetrievalEvaluator — ground-truth label and input-record validation
Task navigation efficiency evaluator — matching_mode and ground_truth validation

Existing EvaluationException missing USER_ERROR

Evaluator base (_base_eval.py) — conversation message mismatch, malformed tool-call parsing, and threshold-not-a-number checks now set blame=USER_ERROR (one category=UNKNOWN corrected to INVALID_VALUE)

Supporting

Added QA_EVALUATOR, ROUGE_EVALUATOR, and DOCUMENT_RETRIEVAL_EVALUATOR members to ErrorTarget
Updated the task navigation test to expect EvaluationException for an invalid matching_mode
CHANGELOG entry under 1.17.1 (Unreleased)

Intentionally left unchanged (not user errors)

"Evaluator returned invalid output" / "Invalid score value" across the prompty and tool evaluators remain SYSTEM_ERROR (malformed LLM output, not user input).
Internal/defensive checks (_conversation_aggregators.py UNKNOWN, _base_rai_svc_eval.py "Not implemented") are unchanged.

Validation

All affected unit tests pass (document retrieval, task navigation, threshold behavior, common validators, built-in & agent evaluators).
black (pinned 24.4.0, repo config) passes on all modified files.

…R_ERROR blame Convert raw ValueError/TypeError input and configuration validation failures in ContentSafety, QA, Rouge, DocumentRetrieval and TaskNavigationEfficiency evaluators to EvaluationException, and ensure user-validation errors across the evaluator base consistently set blame=ErrorBlame.USER_ERROR with appropriate category/target. Adds QA/Rouge/DocumentRetrieval ErrorTarget enum members and updates the task navigation test.

…ument_retrieval evaluator - Added QA_EVALUATOR, ROUGE_EVALUATOR, DOCUMENT_RETRIEVAL_EVALUATOR to ErrorTarget enum - Improved EvaluationException calls with blame/category/target in _base_eval.py, _content_safety.py, _qa.py, _rouge.py, _task_navigation_efficiency.py, _document_retrieval.py - Fixed ordering: isinstance type checks now run BEFORE comparison in DocumentRetrievalEvaluator.__init__ (azureml-assets c45cc1a fix) - Replaced bare ValueError/TypeError with EvaluationException in task_navigation_efficiency and document_retrieval evaluators - Updated test to expect EvaluationException instead of ValueError Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot

Pull request overview

This PR normalizes input/configuration validation error handling in the azure-ai-evaluation package so that user-caused validation failures are consistently raised as EvaluationException with blame=ErrorBlame.USER_ERROR (plus an appropriate category and target). Previously several evaluators raised bare ValueError/TypeError, and a few existing EvaluationException raises omitted blame, causing them to display as (InternalError) even when caused by user input (since EvaluationException.__str__ maps any non-USER_ERROR blame to "InternalError"). This change improves error classification surfaced to users and aligns with the package-wide validation convention already used in the common validators.

Changes:

Converted raw ValueError/TypeError validation raises to EvaluationException(USER_ERROR) in ContentSafetyEvaluator, QAEvaluator, RougeScoreEvaluator, DocumentRetrievalEvaluator, and the task navigation efficiency evaluator.
Added explicit blame=USER_ERROR (and corrected one category=UNKNOWN → INVALID_VALUE) to existing EvaluationException raises in the evaluator base (_base_eval.py), and added three new ErrorTarget members (QA_EVALUATOR, ROUGE_EVALUATOR, DOCUMENT_RETRIEVAL_EVALUATOR).
Updated the task navigation test to expect EvaluationException, plus a CHANGELOG entry under 1.17.1 (Unreleased).

I verified the new ErrorTarget values match their class names, no duplicate imports were introduced, ErrorBlame was already imported where used in _base_eval.py, the document-retrieval reordering (type checks now precede the >= comparison) is a correctness improvement and keeps existing test messages intact, and no existing tests still expect ValueError/TypeError for the changed evaluators. One note: changing these raises from ValueError/TypeError to EvaluationException (which is not a subclass of either) is a behavioral change for any downstream code catching those specific exception types — this is intentional and documented in the CHANGELOG.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`_exceptions.py`	Adds `QA_EVALUATOR`, `ROUGE_EVALUATOR`, `DOCUMENT_RETRIEVAL_EVALUATOR` to `ErrorTarget`.
`_evaluators/_common/_base_eval.py`	Adds `USER_ERROR` blame to conversation/tool-call/threshold validation raises; one category corrected to `INVALID_VALUE`.
`_evaluators/_content_safety/_content_safety.py`	Threshold type check now raises `EvaluationException(USER_ERROR)` instead of `TypeError`.
`_evaluators/_qa/_qa.py`	Threshold type check now raises `EvaluationException(USER_ERROR)`.
`_evaluators/_rouge/_rouge.py`	Threshold type check now raises `EvaluationException(USER_ERROR)`.
`_evaluators/_document_retrieval/_document_retrieval.py`	Label-bound/input-record validation normalized to `EvaluationException`; type checks reordered before the bound comparison; internal missing-threshold kept as `SYSTEM_ERROR`.
`_evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py`	`matching_mode` and `ground_truth` validation normalized to `EvaluationException(USER_ERROR)`.
`tests/unittests/test_task_navigation_efficiency_evaluators.py`	Updated to expect `EvaluationException` for invalid `matching_mode`.
`CHANGELOG.md`	Adds a Bugs Fixed entry under 1.17.1 (Unreleased).

…in task navigation evaluator (#1) * Fix black formatting and ground_truth empty validation category Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> * Delete accidentally committed log file Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

github-actions Bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jun 28, 2026

This was referenced Jun 28, 2026

[Evaluation] Normalize evaluator validation errors to EvaluationException with USER_ERROR blame Azure/azureml-assets#5181

Closed

[Evaluation] Normalize evaluator validation errors to EvaluationException with USER_ERROR blame Azure/azureml-assets#5182

Open

Copilot AI mentioned this pull request Jun 29, 2026

Fix type-check ordering before comparison in DocumentRetrievalEvaluator + standardize EvaluationException across evaluators #47746

Closed

Fix ground truth label validation checks

7204b7d

m7md7sien requested review from aprilk-ms and Copilot June 29, 2026 15:35

Copilot started reviewing on behalf of m7md7sien June 29, 2026 15:35 View session

m7md7sien marked this pull request as ready for review June 29, 2026 15:36

m7md7sien requested a review from a team as a code owner June 29, 2026 15:36

m7md7sien enabled auto-merge (squash) June 29, 2026 15:36

Copilot AI reviewed Jun 29, 2026

View reviewed changes

mmkawale reviewed Jun 29, 2026

View reviewed changes

Comment thread sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py

Add unit tests

1de5cb3

mmkawale approved these changes Jun 29, 2026

View reviewed changes

m7md7sien self-assigned this Jun 29, 2026

m7md7sien removed the request for review from aprilk-ms July 1, 2026 03:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Evaluation] Normalize evaluator validation errors to EvaluationException with USER_ERROR blame#47735

[Evaluation] Normalize evaluator validation errors to EvaluationException with USER_ERROR blame#47735
m7md7sien wants to merge 4 commits into
Azure:mainfrom
m7md7sien:mohessie/normalize-evaluator-exceptions

m7md7sien commented Jun 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

m7md7sien commented Jun 28, 2026

Summary

Changes

Intentionally left unchanged (not user errors)

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants