Conduct research on given URLs without forgetting and add more research #734

Makesh-Srinivasan · 2024-08-07T05:19:01Z

Hi!

I am making this pull request to fix the issue with source_urls being reset inside the conduct_research() function, which causes GPTR to forget the user-input URLs.

Additionally, I am introducing a new parameter to the GPTResearcher class called add_additional_sources (bool). This parameter allows GPTR to gather more context from a default web search in addition to the user-input URLs, thereby increasing the overall scope of research for the query or sub-query.

HOW: If set, I scrape from both the user-input URLs and the default web search function. This way, GPTR researches both the user-input sources and the sources it finds on its own. If unset, we simply scour the user-input URLs alone and build the answer with the gathered context. If the query is unrelated to the URLs' contents, we log a message so the user knows the answer is generated from the model's inherent knowledge from its training data and not through 'research'.

WHY: The intent of providing the source_urls is to scour the user-provided webpages. Since conduct_research was forgetting the URLs, we were unable to scrape them. With this fix, the webpages can be scraped. However, there may be cases where the user might have missed edge cases where the query could be unrelated to the hardcoded source_urls, causing GPTR to generate answers from its own knowledge rather than from new research. To address this, I introduce the new parameter add_additional_sources, which allows GPTR to scour both the user-provided sources and conduct web searches, thereby increasing the context to answer from. This way, if the sources do not match the query, we can still overcome this and perform authentic research because of default web search as compared to the earlier answer generation from model's pre-trained weights. This feature is also useful when the user wants research done not only from the hardcoded URLs he/she provides, but also from other related sources on the internet which is infeasible to add manually by the user every time, but GPTR can find easily.

Other functions remain the same. I have also cleaned up parts of the code and comments relevant to the new modifications.

Thanks,
Makesh Srinivasan

assafelovic · 2024-08-07T05:42:14Z

@Makesh-Srinivasan this is great love it! Would you mind also adding a section in the documentation that explains how to use it? https://github.com/assafelovic/gpt-researcher/blob/master/docs/docs/gpt-researcher/tailored-research.md

Thank you!

Makesh-Srinivasan · 2024-08-07T06:21:31Z

Sure, I can do that :) Thanks!

…al_sources

ElishaKay · 2024-11-11T16:48:51Z

resolved conflicts & pending merge here.
Welcome to the Git Tree @Makesh-Srinivasan - feel free to ping me on Discord to get added to our Contributors wall of honor

#982

Makesh-Srinivasan added 2 commits August 7, 2024 00:51

Fix source_urls and add add_additional_sources

b94f5ba

add test case for the parameter add_additional_sources

814ee1d

update research on specific sources with source_urls and add_addition…

a9ed90e

…al_sources

ElishaKay mentioned this pull request Nov 12, 2024

fix: passing source_urls limits sources #982

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conduct research on given URLs without forgetting and add more research #734

Conduct research on given URLs without forgetting and add more research #734

Makesh-Srinivasan commented Aug 7, 2024

assafelovic commented Aug 7, 2024

Makesh-Srinivasan commented Aug 7, 2024

ElishaKay commented Nov 11, 2024 •

edited

Loading

Conduct research on given URLs without forgetting and add more research #734

Are you sure you want to change the base?

Conduct research on given URLs without forgetting and add more research #734

Conversation

Makesh-Srinivasan commented Aug 7, 2024

assafelovic commented Aug 7, 2024

Makesh-Srinivasan commented Aug 7, 2024

ElishaKay commented Nov 11, 2024 • edited Loading

ElishaKay commented Nov 11, 2024 •

edited

Loading