C++: Support reasoning about whether a phi node overwrites the entire buffer#21836
C++: Support reasoning about whether a phi node overwrites the entire buffer#21836MathiasVP wants to merge 13 commits into
Conversation
…itialized instructions.
…ntire buffer is overwritten).
0df8ed9 to
07b8d7e
Compare
There was a problem hiding this comment.
Pull request overview
Updates the C++ IR-based SSA/dataflow implementation to better classify SSA definitions as “certain” (overwrites the entire buffer), including propagating certainty through PhiNodes, and adjusts/extends tests to cover the new behavior.
Changes:
- Refines write-certainty tracking by replacing a boolean “certain” flag with a
Certaintytype and updating write handling accordingly. - Changes
PhiNodecertainty so phi definitions can be “certain” when their inputs are certain, including SCC-based handling for phi cycles. - Updates existing dataflow test expectations and adds a new inline-expectations test suite for “certain/uncertain” SSA definitions.
Show a summary per file
| File | Description |
|---|---|
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImplCommon.qll | Introduces Certainty-typed write certainty and updates write classification logic. |
| cpp/ql/lib/semmle/code/cpp/ir/dataflow/internal/SsaImpl.qll | Adjusts SSA source-variable stringification and implements phi-node certainty propagation (incl. cycle handling). |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/test.cpp | Adds a regression test case exercising behavior around certainty/uninitialized flow in a loop. |
| cpp/ql/test/library-tests/dataflow/certain/test.ql | Adds an inline expectations harness to assert Ssa::Definition.isCertain() outcomes. |
| cpp/ql/test/library-tests/dataflow/certain/test.cpp | Adds C++ scenarios to validate “certain vs uncertain” SSA defs and phi behavior. |
| cpp/ql/test/library-tests/dataflow/certain/test.expected | Empty expected file for the new inline expectations test. |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/uninitialized.expected | Updates expected results for uninitialized/dataflow tests. |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/test-source-sink.expected | Updates expected source-to-sink flow results. |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/localFlow-ir.expected | Updates expected IR local flow output (notably SSA phi read formatting). |
| cpp/ql/test/library-tests/dataflow/dataflow-tests/dataflow-consistency.expected | Updates expected dataflow consistency outputs. |
| cpp/ql/test/experimental/query-tests/Security/CWE/CWE-193/constant-size/ConstantSizeArrayOffByOne.expected | Updates experimental query-test expectations to match new flow behavior. |
Copilot's findings
- Files reviewed: 10/11 changed files
- Comments generated: 3
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Please elaborate on this. As-is this makes no sense to me, since I would not consider "certainness" to be a meaningful property of phi nodes. |
| if ind = 0 | ||
| then result = "&" + base.toString() | ||
| else result = repeatStars(this.getIndirection() - 1) + base.toString() |
There was a problem hiding this comment.
Mixing direct field access ind with this.getIndirection() becomes very confusing to read. Same issue below.
| private predicate hasAnInput(PhiNode phi1, PhiNode phi2) { | ||
| definitionCycle(phi1) and | ||
| definitionCycle(phi2) and | ||
| getAnInput(phi1) = phi2 | ||
| } |
There was a problem hiding this comment.
This is wrong. You're attempting to define SCC-internal edges here, but if two SCCs are adjacent then this will accidentally merge them. You need this instead:
private predicate sccEdge(PhiNode phi1, PhiNode phi2) {
getAnInput(phi1) = phi2 and getAnInput+(phi2) = phi1
}
There was a problem hiding this comment.
Hm, yes I can see how these two definitions aren't equivalent and that yours captures what I intended to capture. Now I'm just trying to construct an example where the two definitions would cause a change in a test 🤔
If you have one at hand I'd love to know. Otherwise I'll spend some time trying to construct one, and if all else fail I'll just replace the code with your suggestion as it's obviously the right one
There was a problem hiding this comment.
Something like this perhaps:
while (..) { // phi_1
if (..) x = ..
// phi_2
}
while (..) { // phi_3
if (..) x = ..
// phi_4
}
Here phi_1 + phi_2 ought to form one SCC and phi_3 + phi_4 another, and your code would conflate them. I think if you make the assignment in the first loop certain, and the one in the second uncertain then you should end up with an observable difference.
| forex(PhiNode phi | phi = this.getAPhiNode() | | ||
| forall(PhiNode inp | phi.getAnInput() = inp and not this.hasPhiNode(inp) | inp.isCertain()) | ||
| ) |
There was a problem hiding this comment.
Factor out the range to make the recursion simpler. Add a member predicate
pragma[nomagic]
Definition getAnInput() {
result = this.getAPhiNode().getAnInput() and not this.hasPhiNode(result)
}
and use this to define the recursive forall:
forex(Definition inp | this.getAnInput() | inp.isCertain())
Note also that I've changed the type of inp from PhiNode to Definition - I believe that was also wrong in your version.
|
Thanks for the comments, Schack!
Forget the "certainness" naming for the moment and just consider it to be a property of whether an SSA definition overwrites the entire buffer pointed to. For example: void f(int* p) {
if(b) {
*(p + 1) = 1; // (1)
} else {
*(p + 1) = 2; // (2)
}
// (3)
}we don't know how large the buffer pointed to by Does that make things more clear? |
Sort of, yes. But I still think the concept is flawed. Firstly, a phi is not a write - and if you do think of it as a write in the sense |
That's also a perfectly good way one could model it, yes. In fact, I think that's exactly what is captured by a Chi instruction which we don't use in dataflow because it's based on the IR's way-too-imprecise-to-be-useful-in-dataflow memory SSA. Anyway, obviously this is not a direction I think we should go in for this PR, but I appreciate the comments. |
This PR is a combination of two fixes with the overarching theme of computing which SSA definitions are certain (which, in this context, means "overwrites the entire buffer").
A brief reminder of what we use "certain" vs. "uncertain" SSA writes for: In order to model flow through something like
(without knowing the concrete values for
i,j, andk) we model the write top[i]andp[j]as "uncertain" writes. That is, the write to the underlying SSA variable isn't overwritten. So whatever value was previously tracked by SSA is not killed by the SSA definition.The first improvement is in 8585bb6 which changes some of the SSA writes to be certain (whereas before they were only certain if the address they were writing to didn't involve pointer arithmetic).
The next change is slightly less well-motivated. This change is most likely necessary if we ever want to switch the instantiation of the shared range analysis library over from the sound IR-based SSA to dataflow-based SSA.
On
mainwe currently consider any definition from aPhiNodeto be an uncertain definition. This PR changes this so that we consider aPhiNodeto be a certain definition whenever its inputs are all certain definitions.There are some technical things that make this slightly more complicated than what you would expect: Since a
PhNodecan be its own input we need recursion over a graph with cycles, and because of QL's least fixed-point semantics we don't get the right result when doing recursion over a graph with cycles. So we instead recurse over a graph whose nodes are the the recursivePhiNodes.As always, recursion through
forallis also dangerous and should ideally be rewritten using explicit recursion over integers to reduce the risk of performance problems. However, I'll delay that until we actually see a performance problem on a database.(I expect this work to be necessary for switching the instantiation of the shared range analysis library over to dataflow-based SSA because I expect the relevant SSA variables for range analysis to be those for which any SSA definition is a certain SSA definition.)
DCA shows a couple of lost results. They're all FPs of the form added in 8e25240 (which this PR fixes) 🎉