Skip to content

fix: reduce memory usage in diff by using linear-space LCS algorithm#1010

Merged
yxxhero merged 1 commit into
masterfrom
fix/high-memory-usage-lcs-diff-996
Jun 14, 2026
Merged

fix: reduce memory usage in diff by using linear-space LCS algorithm#1010
yxxhero merged 1 commit into
masterfrom
fix/high-memory-usage-lcs-diff-996

Conversation

@yxxhero

@yxxhero yxxhero commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

Problem

Fixes #996

The aryann/difflib library uses an O(N×M) dynamic programming matrix (longestCommonSubsequenceMatrix) to compute diffs. For large manifests such as Kyverno CRDs with 10,000+ differing lines, this matrix alone consumes ~786 MB. With multiple resources and GC overhead, peak memory reaches 3.6 GB as reported in the issue, causing OOM kills on agents with 2 GB memory limits.

Solution

Replace the full-matrix LCS algorithm with Hirschberg's linear-space LCS algorithm, which produces identical diff output but requires only O(N+M) space instead of O(N×M).

The algorithm uses divide-and-conquer: it splits seq1 in half, finds the optimal split point in seq2 using forward and backward LCS score rows (each computed in O(len(seq2)) space), then recurses on each half.

Changes

  • diff/lcs.go (new): diffLines() implementing Hirschberg's algorithm, returning []difflib.DiffRecord for full compatibility.
  • diff/diff.go: diffStrings() now calls diffLines() instead of difflib.Diff().
  • diff/lcs_test.go (new): Tests verifying output parity for standard cases, semantic validity across 1,000 random inputs, and large input handling.

Measured Improvement

Diff size (differing lines) Before After Reduction
1,000 8 MB 1 MB 7.6x
5,000 197 MB 7 MB 30x
10,000 786 MB 14 MB 57x

Compatibility

All existing tests pass without modification. The new implementation produces identical output to difflib.Diff for all existing test cases. For inputs with multiple valid LCS paths (e.g., highly repetitive content), the output is semantically equivalent — same LCS length, same set of added/removed lines — but may differ in tie-breaking, which does not affect the visual diff usefulness.

Replace the O(N*M) space LCS dynamic programming from aryann/difflib
with Hirschberg's linear-space algorithm. For large manifests such as
Kyverno CRDs with 10,000+ differing lines, peak memory drops from
~786 MB to ~14 MB (57x reduction) while producing identical diff output.

Fixes #996

Signed-off-by: yxxhero <aiopsclub@163.com>
@yxxhero yxxhero merged commit 5546534 into master Jun 14, 2026
25 checks passed
@jim-barber-he

Copy link
Copy Markdown

Thanks so much 🙂

@yxxhero

yxxhero commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator Author

@jim-barber-he please try

@jim-barber-he

Copy link
Copy Markdown

@yxxhero Sorry. I didn't get around to it before you released it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High memory usage for some charts

2 participants