Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate symbols (same symbol name across different symbols) in generated index #160

Open
ofeki-neosec opened this issue Oct 21, 2024 · 5 comments

Comments

@ofeki-neosec
Copy link

It seems like in some cases the indexer binary would name two different symbols in the same exact symbol name.
I have some symbols in the code that are defined in different files using the same code (due to coding conventions).
For example:

SCHEMA = "content" # In the first file
SCHEMA = "admin" # In the second file

or

router = APIRouter()

In both files.

For some reason, the created index file contains a symbol block with an identical symbol for both files, without any differentiation of both symbols. This makes resolving references impossible since there's no way to know which symbol is being referenced in the code.

Examples for the resolved symbols:

scip-python python 18 fab26367f82173242a708664ff78e5710a5f59c8 /SCHEMA.
scip-python python 23 4ca18dc98dea1287e17d506f1ca8f717225af20b /router.

I'm running Python 3.12.6 and I'm pointing the indexer to an empty scip-python.json file ([]) if that matters.
Is there any way it can be solved through #156 ?

@ofeki-neosec
Copy link
Author

@varungandhi-src is there a change this is a bug in the indexer?

@varungandhi-src
Copy link
Contributor

@ofeki-neosec it seems quite likely that it is a bug, I would expect at least the module name to be present in the symbol name, if not the full qualified module path.

@ofeki-neosec
Copy link
Author

@varungandhi-src would you recommend trying to upgrade the underlying Pyright package to see if it solves the issue?
I've tried comparing the changes using the sourcegraph link in the README but it seems like there are too many changes from the base branch to evaluate.
How would you approach fixing the issue?

Thanks in advance!

@ofeki-neosec
Copy link
Author

After almost creating a PR with the merge of the latest Pyright version, I've noticed a critical function was missing in the new version of Pyright (indexWorkspace - the function that actually indexes the workspace). Upon further investigation, I've found out that Microsoft decided to split Pyright into two projects - Pyright and Pylance. Pyright is still open source but it doesn't have the indexing capability. Pylance has the indexing capability and other features, but it's closed-source and intended to be used within Microsoft's products only.

@varungandhi-src do you have any suggestions on how to troubleshoot the issue or any next steps?

@ofeki-neosec
Copy link
Author

Just saw this happening in another project I indexed. There were four instances of app = Flask(__name__) which all got resolved to scip-python python . 640d0044014632bdc722c08bf49902b853e111ba /app.. Since other documents reference these symbols there is absolutely no way to differentiate which app instance is being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants