Skip to content

Add normalized name and publisher to index and correlation telemetry#767

Merged
JohnMcPMS merged 17 commits into
microsoft:masterfrom
JohnMcPMS:normcorrtelem
Mar 2, 2021
Merged

Add normalized name and publisher to index and correlation telemetry#767
JohnMcPMS merged 17 commits into
microsoft:masterfrom
JohnMcPMS:normcorrtelem

Conversation

@JohnMcPMS

@JohnMcPMS JohnMcPMS commented Feb 24, 2021

Copy link
Copy Markdown
Member

Change

This change adds the normalized name and publisher to the index, as well as leveraging this to collect telemetry on how well this normalization strategy is doing to correlate a manifest with the ARP entry on the machine.

The normalized name and publisher tables are added to the index as 1:N entries due to the introduction of full localization of name and publisher into the new manifest schema. These tables don't have any link between them, which would allow a combination of name and publisher from any localization to work. This tradeoff of precision seems worthwhile over writing more complex code to handle two values at a time, and it also allows the potential for one of the fields to change while the other remains constant and consume less space overall.

Searching is done by sending in the observed values for name and publisher, and the normalization is done by the source before comparing with its internal store. In this way any type of normalization can occur internal to the source without the workflow logic needing to be aware.

In order to collect telemetry, we first snapshot the set of packages in ARP before executing the install. Afterward, we compare the set with the snapshot to determine which ones are net new. We also search the source using the manifest used to install to determine if there are any matches. Finally, we intersect these two sets of packages.

The telemetry contains information about the package that was being installed {source, id, version, channel}, information about the number of packages in each state above {changed, matching, intersection}, and information from the ARP entry (if any) that we have decided is the one that represents the main package {name, version, publisher, language}. We will be able to leverage this telemetry to improve both our normalization code and the manifests in winget-pkgs to improve our ability to correlate installed packages with remote ones.

For instance, installing Notepad++ show the following:

[CLI ] During package install, 1 changes to ARP were observed, 0 matches were found for the package, and 0 packages were in both
[CLI ] The entry determined to be associated with the package is 'Notepad++ (64-bit x64)', with publisher 'Notepad++ Team'

Notepad++ does not correlate with the manifest because its publisher does not match (note the lack of “Team” in the manifest):

PS> wingetdev show Notepad++.Notepad++
Found Notepad++ [Notepad++.Notepad++]
Version: 7.9.3
Publisher: Notepad++

Validation

New tests are added for the index changes and to verify behavior is consistent with expectations with the ARP reporting.

Microsoft Reviewers: Open in CodeFlow

@JohnMcPMS JohnMcPMS requested a review from a team as a code owner February 24, 2021 23:58
Comment thread src/AppInstallerCLICore/ExecutionContext.h Outdated
Comment thread src/AppInstallerCLICore/Workflows/InstallFlow.cpp Outdated
Comment thread src/AppInstallerCLICore/Workflows/InstallFlow.cpp
Comment thread src/AppInstallerCLITests/CompositeSource.cpp
Comment thread src/AppInstallerCommonCore/Public/winget/ManifestLocalization.h
Comment thread src/AppInstallerCLITests/TestSource.cpp Outdated
Comment thread src/AppInstallerRepositoryCore/Public/AppInstallerRepositorySearch.h Outdated
Comment thread src/AppInstallerRepositoryCore/Public/AppInstallerRepositorySearch.h Outdated
Comment thread src/AppInstallerRepositoryCore/Public/AppInstallerRepositorySearch.h Outdated
Comment thread src/AppInstallerRepositoryCore/Microsoft/Schema/1_2/Interface.h Outdated
Comment thread src/AppInstallerRepositoryCore/Microsoft/ARPHelper.cpp
Comment thread src/AppInstallerRepositoryCore/CompositeSource.cpp
Comment thread src/AppInstallerCLICore/Workflows/InstallFlow.cpp

@yao-msft yao-msft left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

…nd fix another AV reason that I missed last time
@github-actions

Copy link
Copy Markdown

Misspellings found, please review:

  • EPackaged
To accept these changes, run the following commands from this repository on this branch
pushd $(git rev-parse --show-toplevel)
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"validator valijson valueiterator "');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
  if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
  next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
make_path ".github/actions/spelling";
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"EPackaged "');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a) cmp lc($b)} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;'
popd

@jsoref

jsoref commented Feb 28, 2021

Copy link
Copy Markdown
Contributor

Fwiw, I'd suggest adding \bE2EPackaged\b on its own line to patterns.txt instead of adding the suggested item to expect.txt.

@github-actions

github-actions Bot commented Mar 1, 2021

Copy link
Copy Markdown

Misspellings found, please review:

  • cend
To accept these changes, run the following commands from this repository on this branch
pushd $(git rev-parse --show-toplevel)
perl -e '
my @expect_files=qw('".github/actions/spelling/expect.txt"');
@ARGV=@expect_files;
my @stale=qw('"validator valijson valueiterator "');
my $re=join "|", @stale;
my $suffix=".".time();
my $previous="";
sub maybe_unlink { unlink($_[0]) if $_[0]; }
while (<>) {
  if ($ARGV ne $old_argv) { maybe_unlink($previous); $previous="$ARGV$suffix"; rename($ARGV, $previous); open(ARGV_OUT, ">$ARGV"); select(ARGV_OUT); $old_argv = $ARGV; }
  next if /^(?:$re)(?:(?:\r|\n)*$| .*)/; print;
}; maybe_unlink($previous);'
perl -e '
my $new_expect_file=".github/actions/spelling/expect.txt";
use File::Path qw(make_path);
make_path ".github/actions/spelling";
open FILE, q{<}, $new_expect_file; chomp(my @words = <FILE>); close FILE;
my @add=qw('"cend "');
my %items; @items{@words} = @words x (1); @items{@add} = @add x (1);
@words = sort {lc($a) cmp lc($b)} keys %items;
open FILE, q{>}, $new_expect_file; for my $word (@words) { print FILE "$word\n" if $word =~ /\w/; };
close FILE;'
popd

@JohnMcPMS JohnMcPMS merged commit f236fa2 into microsoft:master Mar 2, 2021
@JohnMcPMS JohnMcPMS deleted the normcorrtelem branch March 2, 2021 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants