Skip to content

fix: patch Huffman OOB read and optimize encoders#11

Merged
LessUp merged 1 commit into
masterfrom
fix/cpp-security-and-optimization-20260701
Jul 1, 2026
Merged

fix: patch Huffman OOB read and optimize encoders#11
LessUp merged 1 commit into
masterfrom
fix/cpp-security-and-optimization-20260701

Conversation

@LessUp

@LessUp LessUp commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Summary

  • Security fix: Patch out-of-bounds read in Huffman decode table build. When a corrupt or single-child tree caused cur < 0, the e.next = cur assignment at loop exit overwrote the earlier e.next = root safety reset, producing a negative node index. On crafted input with a single-symbol frequency table (e.g. empty input), this led to table[-1] access (undefined behavior). Now guarded with a corrupt flag.
  • Optimization: Eliminate O(n) payload copy in arithmetic decode. BitReader now accepts (const uint8_t*, size_t) in addition to const vector&, letting the decoder read directly from the input buffer at the payload offset.
  • Optimization: Add reserve() to arithmetic and RLE encoders to reduce reallocation overhead.
  • Test coverage: Add all_same_byte.bin corpus file (4096 zero bytes) to exercise RLE single-run and Huffman single-symbol edge cases.

Test plan

  • make build — CMake build succeeds with -Wall -Wextra -Werror
  • make test — ctest lifecycle tests + CLI smoke (36 checks) all pass
  • make lint — clang-format dry-run passes
  • New all_same_byte.bin round-trips through all 4 algorithms

Generated with Devin

…ze encoders

Security fix:
- Fix out-of-bounds read in Huffman decode table build: when a corrupt
  or single-child tree caused cur < 0, the `e.next = cur` assignment at
  loop exit overwrote the earlier `e.next = root` safety reset, producing
  a negative node index. On crafted input with a single-symbol frequency
  table (e.g. empty input), this led to `table[-1]` access (undefined
  behavior). Now guarded with a `corrupt` flag so the root reset survives.

Optimization:
- Eliminate O(n) payload copy in arithmetic decode: BitReader now accepts
  (const uint8_t*, size_t) in addition to const vector&, letting the
  decoder read directly from the input buffer at the payload offset
  without allocating a temporary vector.
- Add Vec::reserve() to arithmetic and RLE encoders to reduce
  reallocation overhead (matching huffman/range encoders).

Test coverage:
- Add all_same_byte.bin corpus file (4096 zero bytes) to exercise
  RLE single-run and Huffman single-symbol edge cases, included in
  the CLI smoke test default corpus.

Generated with [Devin](https://devin.ai)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@LessUp LessUp merged commit f51d186 into master Jul 1, 2026
2 checks passed
@LessUp LessUp deleted the fix/cpp-security-and-optimization-20260701 branch July 1, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant