Skip to content

bigdatatransformer/wordlit

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wordlit.net

Wordlit.net is aimed at demystifying the decisions and behaviors of algorithms in Natural Language Processing (NLP). It visualizes the relationships and entities extracted from text, offering insights into how NLP algorithms interpret and process language. The application is helpful for NLP researchers, data scientists, and enthusiasts keen on understanding the workings of computational linguistics. Wordlit.net currently supports input for file types of PDF, Word, and TXT.

Wordlit-Home

Key Features

  • Entity Extraction: Leverages spaCy's NLP capabilities to identify entities in the text.
  • Knowledge Graph Construction: Builds a graph using NetworkX, linking entities based on their relationships.
  • Interactive Visualization: Utilizes Plotly and Streamlit for dynamic graph visualization.
  • Customizable Graph Parameters: Offers options to adjust layout spacing, color scheme, node size, and more.
  • Graph Analytics: Provides statistics like node and edge counts, graph density and centrality measures.
  • Text Analytics: Calculates various text statistics such as token counts, sentence lengths, and unique tokens.

Installation

To use this tool, you need to install the following dependencies:

pip install spacy networkx transformers streamlit plotly matplotlib pandas

Install dependencies

Don't forget to download the spaCy language model:

python -m spacy download en_core_web_sm

Download the spaCy language model

Usage

1. Start the Streamlit App: Run the app using Streamlit

streamlit run wordlit.py

Run the app using Streamlit

2. Input Text: You can input the text by uploading a file, inputting a website URL or pasting it directly into the text area provided.

Input Text

3. Customize Graph: Adjust the graph parameters like layout spacing, node size, and color scheme using the sidebar options.

Customize Graph

4. Generate Graph: Select 'Generate Graph' to visualize the knowledge graph based on your text.

Generate Graph

Generate Graph

5. Explore Graph Analytics: View various statistics and metrics related to the generated graph and the input text.

Explore Graph Analytics

Examples

Below is an example of a knowledge graph generated from a file. The nodes represent entities, and edges represent their relationships. Each node's size corresponds to its connection degree, and colors vary based on the selected color scheme.

Upload.a.File.mp4

An example of a knowledge graph generated from text.

Enter.Text.Manually.mp4

An example of a knowledge graph generated from a website URL.

Enter.Website.URL.mp4

Tech Stack

Python: The entire code is written in Python.

Spacy: An open-source software library for advanced Natural Language Processing (NLP) in Python. It is used for tokenization, named entity recognition (NER), part of speech tagging, and dependency parsing.

NetworkX: A Python library used for building and analyzing network graphs.

Streamlit: An open-source Python library used to build and run the web application.

Plotly: This is a graphing library used for creating interactive knowledge graph visualizations.

Pandas: An open-source data analysis and manipulation tool built on top of the Python programming language.

Time Module: A Python module that is used here for tracking processing time.

Python-Docx: A Python library for creating and updating Microsoft Word (.docx) files.

Pdfplumber: Used for extracting text from PDF files. It allows detailed access to text, tables, and metadata in PDFs.

Requests: A simple HTTP library for Python, used to send HTTP requests easily.

Beautiful Soup (bs4): A Python library used here to parse HTML content.

Contributing

Contributions to enhance Wordlit.net are welcome. Feel free to fork the repository, make changes, and create a pull request.

License

All code contributed to Wordlit.net © 2024 by Sahir Maharaj is licensed under Attribution 4.0 International

When using the code from Wordlit.net, please credit as follows:

Code sourced from Wordlit.net, authored by Sahir Maharaj, 2024.

Contact

Report a bug or request a feature: [email protected]

LinkedIn: Sahir Maharaj

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%