Add normalized name and publisher to index and correlation telemetry#767
Merged
Conversation
… normalization for best results
yao-msft
reviewed
Feb 25, 2021
yao-msft
reviewed
Feb 25, 2021
yao-msft
reviewed
Feb 25, 2021
yao-msft
reviewed
Feb 25, 2021
yao-msft
reviewed
Feb 25, 2021
yao-msft
reviewed
Feb 26, 2021
yao-msft
reviewed
Feb 26, 2021
yao-msft
reviewed
Feb 26, 2021
yao-msft
reviewed
Feb 26, 2021
yao-msft
reviewed
Feb 26, 2021
yao-msft
reviewed
Feb 26, 2021
yao-msft
reviewed
Feb 26, 2021
yao-msft
reviewed
Feb 26, 2021
…nd fix another AV reason that I missed last time
Misspellings found, please review:
To accept these changes, run the following commands from this repository on this branch |
Contributor
|
Fwiw, I'd suggest adding |
Misspellings found, please review:
To accept these changes, run the following commands from this repository on this branch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change
This change adds the normalized name and publisher to the index, as well as leveraging this to collect telemetry on how well this normalization strategy is doing to correlate a manifest with the ARP entry on the machine.
The normalized name and publisher tables are added to the index as 1:N entries due to the introduction of full localization of name and publisher into the new manifest schema. These tables don't have any link between them, which would allow a combination of name and publisher from any localization to work. This tradeoff of precision seems worthwhile over writing more complex code to handle two values at a time, and it also allows the potential for one of the fields to change while the other remains constant and consume less space overall.
Searching is done by sending in the observed values for name and publisher, and the normalization is done by the source before comparing with its internal store. In this way any type of normalization can occur internal to the source without the workflow logic needing to be aware.
In order to collect telemetry, we first snapshot the set of packages in ARP before executing the install. Afterward, we compare the set with the snapshot to determine which ones are net new. We also search the source using the manifest used to install to determine if there are any matches. Finally, we intersect these two sets of packages.
The telemetry contains information about the package that was being installed
{source, id, version, channel}, information about the number of packages in each state above{changed, matching, intersection}, and information from the ARP entry (if any) that we have decided is the one that represents the main package{name, version, publisher, language}. We will be able to leverage this telemetry to improve both our normalization code and the manifests in winget-pkgs to improve our ability to correlate installed packages with remote ones.For instance, installing Notepad++ show the following:
Notepad++ does not correlate with the manifest because its publisher does not match (note the lack of “Team” in the manifest):
Validation
New tests are added for the index changes and to verify behavior is consistent with expectations with the ARP reporting.
Microsoft Reviewers: Open in CodeFlow