Built as a demo for the NYC NetCore Cloud office's content creators.
- create a directory in the root folder.
- within this directory create
scrape_data.py
- within this module, write a function called
scrape_data
which formats and returns your data as a dict like:- {'title': [str], 'content': [str]}
- now, in the root directory open
create_vector_database
.- specify your directory name in
DIRECTORY
- enter your OpenAI API key in
openai.api_key
- specify your directory name in
- run
create_vector_database
. Since you are using openAI's API, you'll be charged for this! - now, in your directory, you will see two folders:
-
chunks.pkl
are the chunks of text from your corpus.
-
embeddings.csv
is the embeddings database.
-
- In
ui.py
specify the {DIRECTORY}. - Run
streamlit run ui.py
- Enter your API Key.
- You should be able to use the chatbot and view the retrieval context that was provided to it.
-
- User should be able to specify the number of retrieved articles with cost constraints.
- Users should be able to see cost estimates.
- Users should be able to choose an openai model.
-
- Users should be able to specify the data path on the UI.
-
- Users should be able to create the data files using some UI, where they can write the scrape data function.