Fix legacy entity protocol: propagate call_entity failures and stop double-encoding results#167
Open
andystaples wants to merge 4 commits into
Open
Conversation
call_entity failures delivered via the legacy entity protocol (used by the Azure Functions Durable extension) were silently completing instead of raising. _handle_entity_event_raised now detects failed ResponseMessage payloads (exceptionType/failureDetails) and fails the task with an EntityOperationFailedException, matching the current entity protocol and the .NET SDK.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a correctness gap in the core durabletask SDK where entity operation failures were not propagated to orchestrations when using the legacy entity protocol (used by the Azure Functions Durable/WebJobs extension). It aligns legacy-protocol behavior with the current protocol (and .NET) by detecting failure indicators in legacy ResponseMessage payloads and failing the awaiting call_entity task accordingly.
Changes:
- Added legacy
ResponseMessagefailure detection/conversion helpers to produceTaskFailureDetailswhenfailureDetailsorexceptionTypeindicate an entity operation failure. - Updated legacy-protocol
eventRaisedhandling to failcall_entitytasks withEntityOperationFailedExceptionwhen failures are detected. - Added orchestration executor tests covering legacy-protocol entity call failure/success cases, and documented the fix in
CHANGELOG.md.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/durabletask/test_orchestration_executor.py | Adds regression tests for legacy-protocol entity call failure propagation (both failureDetails and exceptionType) and a success round-trip. |
| durabletask/worker.py | Detects legacy-protocol entity call failures in eventRaised handling and propagates them as task failures to orchestrations. |
| durabletask/internal/helpers.py | Adds helpers to extract/convert legacy ResponseMessage failure data into TaskFailureDetails. |
| CHANGELOG.md | Documents the user-visible fix under ## Unreleased → FIXED. |
The legacy entity protocol delivers the operation result as a serialized JSON string inside ResponseMessage.result. The success path used coerce, which does not parse JSON strings, so string/dict/list results arrived double-encoded and only numeric results with an explicit return_type worked. Deserialize the result string (type-directed) to match the current entity protocol. Adds parametrized legacy-protocol success round-trip tests.
Addresses PR review: _entity_task_id_map and _entity_lock_task_id_map entries were looked up with get() and never removed, so the maps could grow unbounded in long-running orchestrations. Use pop() to release each entry once its eventRaised is handled, matching the new-protocol handlers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two related fixes to
OrchestrationContext.call_entityunder the legacy entity protocol — the protocol the Azure Functions Durable (WebJobs) extension uses when talking to out-of-proc Python workers. The current entity protocol (entityOperationCompleted/entityOperationFailed, used bydurabletask.azuremanaged/ DTS) already behaved correctly; only the legacyeventRaised-based path was affected.1. Entity operation failures were not propagated
A failed entity operation was silently completing (typically returning
None) instead of raising, so orchestrations could never observe or handle the failure.Legacy-protocol entity results arrive as an
eventRaisedhandled by_handle_entity_event_raised, which unconditionally calledentity_task.complete(...)and never inspected the response for a failure indicator. The wire contract is DurableTask.Core'sResponseMessage, whereIsErrorResult => exceptionType != null || failureDetails != null(exceptionTypecarries the error message;failureDetailscarries a structuredFailureDetails).Fix:
_handle_entity_event_raisednow detects a failed response and fails the task with anEntityOperationFailedException, so an awaitingcall_entityraisesTaskFailedError— matching the current entity protocol and the .NET SDK (CallEntityAsyncthrowsEntityOperationFailedExceptionwhen the resultIsError).2. Successful results were double-encoded
The
ResponseMessage.resultfield holds a serialized JSON string of the return value (just like the new protocol'sentityOperationCompleted.output). The success path usedcoerce, which applies a type to an already-parsed value and does not parse JSON strings. As a result, string/dict/list returns arrived double-encoded (e.g. a returned"done"surfaced as"\"done\""), and only numeric returns with an explicitreturn_typehappened to work.Fix: deserialize the
resultstring (type-directed) instead of coercing it, mirroring the current-protocol handler.Also addressed
eventRaiseddispatch topop()_entity_task_id_map/_entity_lock_task_id_mapentries once handled (they were looked up withget()and never removed), preventing unbounded map growth in long-running orchestrations — consistent with the new-protocol handlers. (From PR review.)Changes
durabletask/worker.py: failure detection + propagation in_handle_entity_event_raised; type-directed deserialization of the result;pop()map cleanup.durabletask/internal/helpers.py:entity_response_failure_details(...)plus a recursiveFailureDetailsconverter.tests/durabletask/test_orchestration_executor.py: legacy-protocol tests for failure viafailureDetails, failure viaexceptionType, and a parametrized success round-trip acrossstr/int/float/bool/dict/list(noreturn_typerequired).CHANGELOG.md: FIXED entries for both bugs.Validation
durabletask.azuremanageduses the current protocol and is unaffected.