Skip to content

[build] Add shared DownloadFileWithRetry target and use it for SDK downloads#11647

Merged
jonathanpeppers merged 2 commits into
mainfrom
jonathanpeppers/download-file-with-retry
Jun 15, 2026
Merged

[build] Add shared DownloadFileWithRetry target and use it for SDK downloads#11647
jonathanpeppers merged 2 commits into
mainfrom
jonathanpeppers/download-file-with-retry

Conversation

@jonathanpeppers

Copy link
Copy Markdown
Member

Fixes transient MSB3923 ... ResponseEnded build failures when downloading SDK components (emulator, build-tools, NDK, JDK, etc.) from dl.google.com and aka.ms. Example failure: dnceng-public build 1461495:

error MSB3923: Failed to download file "https://dl.google.com/android/repository/emulator-darwin_x64-15004761.zip". The response ended prematurely. (ResponseEnded)

Root cause

MSBuild's built-in <DownloadFile> task has Retries and RetryDelayMilliseconds, but its internal IsRetriable check only retries when a HttpRequestException wraps an inner IOException. The ResponseEnded failure that flaky CDNs produce is thrown as a top-level HttpIOException (added in .NET 8), so Retries does not cover it — the task errors out on the first mid-stream disconnect with no retry attempt. (Confirmed by reading the failing build's log: zero "Retrying" messages.)

Change

New build-tools/scripts/DownloadFileWithRetry.targets exposes:

  • $(DownloadFileWithRetryFile) — absolute path to itself
  • DownloadOneFileWithRetry — target that wraps <DownloadFile> in three outer attempts (first two with ContinueOnError="WarnAndContinue", third lets the error propagate) and optionally verifies SHA-256.

When the caller supplies _DownloadSha256 and a cached file already exists with the wrong hash, the target deletes it (and any _DownloadCleanupOnMismatch siblings) up-front so the download attempts actually re-fetch in the same build, rather than failing at the post-download verify step. openjdk.targets uses this to invalidate a cached .sha256sum.txt if the archive ever fails verification.

Callers build a _DownloadFile item group in their target body with per-file params as AdditionalProperties metadata, then invoke:

<MSBuild Projects="@(_DownloadFile->'$(DownloadFileWithRetryFile)')"
    Targets="DownloadOneFileWithRetry"
    BuildInParallel="true" />

The projection turns each item into a distinct (project, properties) build request, so MSBuild fans them out across worker nodes. For multi-file callers (aapt2, androidsdk) this means downloads happen in parallel.

Why <MSBuild> instead of <CallTarget>

Two MSBuild semantics forced this:

  1. Items added inside a target body are not visible to a target invoked via <CallTarget> from that same body, so the shared target could not see a collection assembled by the caller.
  2. A target executes at most once per build context, so a <CallTarget> could not be used for openjdk's two-phase hash-file-then-archive flow.

Migrated callers

  • src/binutils/binutils.targets
  • src/bundletool/bundletool.targets
  • src/aapt2/aapt2.targets
  • src/openjdk/openjdk.targets (two-phase: .sha256sum.txt then archive)
  • src/androidsdk/androidsdk.targets (~30 SDK packages, now in parallel)

Verified locally

Scenario Result
bundletool single file Downloads successfully
aapt2 (3 files, ~200 MB) Downloads in parallel (~3.6 s)
Cache-hit re-run Target-skip (~0.4 s)
Corrupted cached file + correct expected hash Detected → deleted → re-fetched in same build
Wrong expected hash Errors and deletes both file and cleanup sibling

…wnloads

Fixes transient `MSB3923 ... ResponseEnded` build failures when downloading
SDK components (emulator, build-tools, NDK, JDK, etc.) from `dl.google.com`
and `aka.ms`. Example failure:
https://dev.azure.com/dnceng-public/public/_build/results?buildId=1461495

Root cause: MSBuild's built-in `<DownloadFile>` task has `Retries` and
`RetryDelayMilliseconds`, but its internal `IsRetriable` check only retries
when a `HttpRequestException` wraps an inner `IOException`. The
`ResponseEnded` failure that flaky CDNs produce is thrown as a top-level
`HttpIOException` (added in .NET 8), so `Retries` does NOT cover it - the
task errors out on the first mid-stream disconnect with no retry attempt.

New `build-tools/scripts/DownloadFileWithRetry.targets` exposes:
  * `$(DownloadFileWithRetryFile)` - absolute path to itself
  * `DownloadOneFileWithRetry`     - target that wraps `<DownloadFile>` in
                                     three outer attempts (first two with
                                     `ContinueOnError="WarnAndContinue"`,
                                     third lets the error propagate) and
                                     optionally verifies SHA-256.

When the caller supplies `_DownloadSha256` and a cached file already
exists with the wrong hash, the target deletes it (and any
`_DownloadCleanupOnMismatch` siblings) up-front so the download attempts
actually re-fetch in the same build, rather than failing at the post-download
verify step. `openjdk.targets` uses this to invalidate a cached
`.sha256sum.txt` if the archive ever fails verification.

Callers build a `_DownloadFile` item group in their target body with
per-file params as `AdditionalProperties` metadata, then invoke:

  <MSBuild Projects="@(_DownloadFile->'$(DownloadFileWithRetryFile)')"
      Targets="DownloadOneFileWithRetry"
      BuildInParallel="true" />

The projection turns each item into a distinct `(project, properties)`
build request, so MSBuild fans them out across worker nodes. For multi-file
callers (aapt2, androidsdk) this means downloads happen in parallel
(verified locally: 3 build-tools zips, ~200 MB, in 3.6 s).

Two MSBuild semantics forced the `<MSBuild>`-task dispatch (instead of a
simpler `<CallTarget>`):
  1. Items added inside a target body are NOT visible to a target invoked
     via `<CallTarget>` from that same body, so the shared target could not
     see a `FilesToDownload` collection assembled by the caller.
  2. A target executes at most once per build context, so a `<CallTarget>`
     could not be used for openjdk's two-phase
     hash-file-then-archive flow.

Migrated all five SDK download targets to the shared helper:
  * src/binutils/binutils.targets
  * src/bundletool/bundletool.targets
  * src/aapt2/aapt2.targets
  * src/openjdk/openjdk.targets (two-phase: .sha256sum.txt then archive)
  * src/androidsdk/androidsdk.targets

Verified end-to-end with real downloads:
  * bundletool single file: downloads successfully
  * aapt2 (3 files): downloads in parallel (~3.6 s)
  * cache-hit re-run: target-skips fast (~0.4 s)
  * corrupted cache + correct expected hash: detected, deleted, re-fetched
  * wrong expected hash: errors and deletes both file and cleanup sibling

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 12, 2026 20:08

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a shared MSBuild targets file to wrap <DownloadFile> with additional outer retry attempts (and optional SHA-256 verification/self-heal) to reduce transient CDN download failures during toolchain/SDK setup.

Changes:

  • Add build-tools/scripts/DownloadFileWithRetry.targets with DownloadOneFileWithRetry target (outer retries + optional SHA-256 verify and cleanup).
  • Migrate several tool/SDK download targets (binutils, bundletool, aapt2, openjdk, androidsdk) to dispatch downloads via <MSBuild> using per-item AdditionalProperties.
  • Enable parallel download dispatch for multi-file scenarios via BuildInParallel="true".

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
build-tools/scripts/DownloadFileWithRetry.targets New shared retry + SHA-256 verification wrapper around <DownloadFile>.
src/openjdk/openjdk.targets Switch OpenJDK download to two-phase hash-then-archive flow using the shared helper.
src/bundletool/bundletool.targets Use shared helper for bundletool download + SHA-256 verification.
src/binutils/binutils.targets Use shared helper for binutils archive download + SHA-256 verification.
src/androidsdk/androidsdk.targets Use shared helper for SDK package downloads and attempt parallelization.
src/aapt2/aapt2.targets Use shared helper for build-tools zip downloads and attempt parallelization.

Comment thread src/androidsdk/androidsdk.targets
Comment thread src/aapt2/aapt2.targets
Reviewer caught that '_DownloadBuildTools' and '_DownloadAndroidSdkPackages' had `Outputs="...%(_X.Identity)"` which is target batching - the target body runs once per item, so each <MSBuild BuildInParallel="true"/> call only saw one item to dispatch. Effectively serial.

Switch to item-transform syntax `Outputs="@(_X->'...')"` so the body runs exactly once and the inner <MSBuild> call fans out the full item set across worker nodes.

Verified parallelism is now real: worker nodes emit byte counts with different culture formatting (e.g. `58699878` from one node vs `58,699,878` from another), confirming separate-node execution.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jonathanpeppers jonathanpeppers added the ready-to-review This PR is ready to review/merge, I think any CI failures are just flaky (ignorable). label Jun 15, 2026
@jonathanpeppers jonathanpeppers merged commit 5f94934 into main Jun 15, 2026
40 checks passed
@jonathanpeppers jonathanpeppers deleted the jonathanpeppers/download-file-with-retry branch June 15, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-to-review This PR is ready to review/merge, I think any CI failures are just flaky (ignorable).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants