You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve the relevance of code snippet search results in the presence of AppMap data.
The current algorithm selects code snippets in two phases:
Find best matching AppMap data files, and add code snippets that are referred to by events on
those AppMap files. Select a number of events such that the character count of the collected
snippets matches 3/4 of a threshold.
Perform a second code search, unboosted by AppMap data, to fill in the remaining 1/4 of the
threshold.
sequenceDiagram
participant S as SearchContextCollector
participant F as FileIndex
participant SI as SnippetIndex
participant E as EventCollector
participant A as AppMapIndex
S->>A: search AppMaps with vectorTerms
activate A
A-->>S: AppMapSearchResponse
deactivate A
S->>F: buildIndex - files
activate F
F-->>S: fileIndex
deactivate F
S->>F: search fileIndex with vectorTerms
activate F
F-->>S: FileSearchResult[]
deactivate F
S->>SI: buildIndex - snippets
activate SI
SI-->>S: snippetIndex
deactivate SI
loop collect context with varying events
S->>E: collectEvents
activate E
E-->>S: contextCandidate
deactivate E
S->>SI: collectSnippets
activate SI
SI-->>S: sourceContext
deactivate SI
S->>S: applyContext
activate S
S-->>S: appliedContext
deactivate S
end
S-->>S: return searchResponse and context
Loading
Some problems with this approach include:
The fixed 3/4 allocation of context that comes directly from AppMaps. If there is AppMap data
available, and it is minimally relevant to the user's question, but not highly relevant, 3/4 of
the search context will still be populated by snippets referenced in that AppMap data.
The search algorithm used to select the events from the matching AppMap data does not index the
full code of the functions referred to by the events; it only matches on certain keywords that
are present in the AppMap data index (such as function names, parameter names, etc).
Logically, if a user records AppMap data, then the functions that are referenced by that AppMap data
are likely to be more relevant to any user question that functions that are not referenced. Keyword
search (BM25) is still a factor, but a keyword search match that is referenced by AppMap data should
be considered more relevant than a keyword search match that is not referenced.
This technique can be extended to other types of references, such as stack traces and errors, in
which non-code files that are relevant to runtime execution contain embedded references to code
object names and file locations.
Overview
The first phase of searching is to select relevant content that will be used for boosting search
results.
The second phase of searching is to index all the possible snippets that may match the user's
question, and then apply boost factors from the results obtained in phase one.
A file index and snippet index are similar. They both contain an identifier, directory, file path,
tokens, and words. File path, tokens, and words are indexed. Snippet also includes a range within
the file from which the snippet is obtained.
Snippets can be boosted by applying boost factors to specific identifiers. Applying a boost factor
makes a snippet more likely to be chosen; although it must be a BM25 match as well. Boosts are
applied when some external relevant data, such as a trace or stack trace, refers to a snippet. In
those cases, reference in the external relevant data is a strong indication that the snippet is
likely to be relevant to the search.
Task
These changes refer to the context search algorithm.
The first algorithm step is to choose the most relevant AppMap data files. This step should remain,
but it should be migrated to the new FileIndex implementation. AppMap data files should not be
indexed directly. Rather, there is an index directory for each AppMap data file, with the same name
as the AppMap file without the extension. Within this directory are files that contain keywords
which have been extracted from the AppMap data file. These keywords can be used to match the search
query.
The second step of the current algorithm is to select AppMap events and collect some number of these
into the output. This step will be removed. Instead of selecting AppMap events in one step, and
collecting relevant source code snippets in a second step, these steps will be combined.
Snippets are added to a SnippetIndex. AppMap data elements that are not code snippets, such as HTTP
client and server requests and SQL queries, will also be added to the SnippetIndex. Then those
snippets that are referenced by an AppMap data file selected in the first step will be boosted, as
described in the Overview.
The text was updated successfully, but these errors were encountered:
Objective
Improve the relevance of code snippet search results in the presence of AppMap data.
The current algorithm selects code snippets in two phases:
those AppMap files. Select a number of events such that the character count of the collected
snippets matches 3/4 of a threshold.
threshold.
Some problems with this approach include:
available, and it is minimally relevant to the user's question, but not highly relevant, 3/4 of
the search context will still be populated by snippets referenced in that AppMap data.
full code of the functions referred to by the events; it only matches on certain keywords that
are present in the AppMap data index (such as function names, parameter names, etc).
Logically, if a user records AppMap data, then the functions that are referenced by that AppMap data
are likely to be more relevant to any user question that functions that are not referenced. Keyword
search (BM25) is still a factor, but a keyword search match that is referenced by AppMap data should
be considered more relevant than a keyword search match that is not referenced.
This technique can be extended to other types of references, such as stack traces and errors, in
which non-code files that are relevant to runtime execution contain embedded references to code
object names and file locations.
Overview
The first phase of searching is to select relevant content that will be used for boosting search
results.
The second phase of searching is to index all the possible snippets that may match the user's
question, and then apply boost factors from the results obtained in phase one.
A file index and snippet index are similar. They both contain an identifier, directory, file path,
tokens, and words. File path, tokens, and words are indexed. Snippet also includes a range within
the file from which the snippet is obtained.
Snippets can be boosted by applying boost factors to specific identifiers. Applying a boost factor
makes a snippet more likely to be chosen; although it must be a BM25 match as well. Boosts are
applied when some external relevant data, such as a trace or stack trace, refers to a snippet. In
those cases, reference in the external relevant data is a strong indication that the snippet is
likely to be relevant to the search.
Task
These changes refer to the context search algorithm.
The first algorithm step is to choose the most relevant AppMap data files. This step should remain,
but it should be migrated to the new FileIndex implementation. AppMap data files should not be
indexed directly. Rather, there is an index directory for each AppMap data file, with the same name
as the AppMap file without the extension. Within this directory are files that contain keywords
which have been extracted from the AppMap data file. These keywords can be used to match the search
query.
The second step of the current algorithm is to select AppMap events and collect some number of these
into the output. This step will be removed. Instead of selecting AppMap events in one step, and
collecting relevant source code snippets in a second step, these steps will be combined.
Snippets are added to a SnippetIndex. AppMap data elements that are not code snippets, such as HTTP
client and server requests and SQL queries, will also be added to the SnippetIndex. Then those
snippets that are referenced by an AppMap data file selected in the first step will be boosted, as
described in the Overview.
The text was updated successfully, but these errors were encountered: