Galatasaray University Advanced Topics in Computer Engineering Homework 3

Homework Description

In this homework, we are tasked with creating a knowledge graph from a wikipedia page using langchain and neo4j. The wikipedia page that we are going to use is the Marcus Aurelius page. The knowledge graph that we are going to create will have the following structure:

Marcus Aurelius
- Name
- Last Emperor
- Related
- Death Date
- Spouse
- Children
- Parents
- Enemies
- Succession

Requirements

Python 3.12
Docker
Docker Compose

Installation

Clone the repository

git clone https://github.com/theFellandes/Hw3-Knowledge-Graph

Change the directory to the repository

cd Hw3-Knowledge-Graph

Run the following command to start the neo4j database

docker-compose up -d

Install the required python packages

pip install -r requirements.txt

Usage

To create the knowledge graph, run the following command:

python main.py

To observe the knowledge graph generated by script, open your browser and go to http://localhost:7474/. The default username is neo4j and the password is your_password.

Project Structure

bart_main.py - The main script that uses BART llm to create the knowledge graph from the wikipedia page.

main.py - The main script that uses langchain to create the knowledge graph from the wikipedia page.

requirements.txt - The file that contains the required python packages.

Dockerfile - The file that contains the docker configuration for the neo4j database.

docker-compose.yml - The file that contains the docker compose configuration for the neo4j database.

.env - The file that contains the environment variables for the docker compose.

.gitignore - The file that contains the files and directories that are ignored by git.

README.md - The file that contains the information about the project.

db/neo4j/neo4j_connector.py - The module that contains the functions to interact with the neo4j database.

legacy - The directory that contains the trial and error scripts.

internal - The directory that contains the internal modules of the project.

internal/langchain - The directory that contains the langchain module.

internal/langchain/knowledge_graph_builder.py - The module that contains the functions to interact with the neo4j.

internal/langchain/wikipedia_api.py - The module that interacts with the Wikipedia api in order to obtain data.

internal/llm/llm.py - The base class for the llms.

internal/llm/bart.py - The class for interacting bart llm.

internal/reader/yaml_reader.py - The utility class for reading yaml files.

Results

Using Bart for the knowledge graph creation, the following results are obtained:

Bart isn't good at extracting structured data from the wikipedia page. It is better at generating text rather than extracting structured data. The knowledge graph that is generated by Bart is not as good as the one generated by OpenAI.

For Bart, the used task as follows: summarization

Generated Knowledge Graph Using Bart:

After using OpenAI gpt-4o for the knowledge graph creation, the following results are obtained:

OpenAI gpt-4o is better at extracting structured data from the wikipedia page. It is better at generating structured data rather than generating text. The knowledge graph that is generated by OpenAI is better than the one generated by Bart.

Generated Knowledge Graph Using OpenAI version 1:

Generated Knowledge Graph Using OpenAI version 2: In this version of the knowledge graph, the knowledge graph is generated using the same method as the previous version. The only difference is the named relationships.

Generated Knowledge Graph Using OpenAI version 3: In this version of the knowledge graph, this version added underscores to the named relationships.

An Interesting Relationship Observed In This Graph:

name: Antoninus Pius adopted Marcus Aurelius and Lucius but was adopted by Hadrian although not in the graph as Emperor, was indeed an emperor to the Roman empire.

Script:

MATCH (a:Entity {name: "Antoninus Pius"})-[r]->(b)
WHERE b.name IN ["Marcus Aurelius", "Hadrian", "Lucius"]
RETURN a.name AS Source, type(r) AS Relationship, b.name AS Target

Used Prompts In OpenAI

Current Result Prompt:

        Analyze the following text and extract entities and relationships in CSV format.
        The CSV should have the following columns: 'source', 'relationship', 'target'.

        Text: {chunk.page_content}

        Example output:
        source,relationship,target
        Marcus Aurelius,wrote,Meditations
        Marcus Aurelius,was born in,Rome

        Your output should be a CSV with the columns 'source', 'relationship', and 'target'.

Previous Version With JSON:

        Analyze the following text and extract entities and relationships in JSON format.
        Please return only valid JSON data, and make sure the keys are 'source', 'relationship', and 'target'.
        Each relationship should be a dictionary with these three keys.

        Text: {chunk.page_content}

        Example output:
        [
            {{
                "source": "Marcus Aurelius",
                "relationship": "wrote",
                "target": "Meditations"
            }},
            {{
                "source": "Marcus Aurelius",
                "relationship": "was born in",
                "target": "Rome"
            }}
        ]

        Your output should be a list of relationships in JSON format, and nothing else.

Graph

Authors

Oğuzhan Güngör

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Galatasaray University Advanced Topics in Computer Engineering Homework 3

Homework Description

Requirements

Installation

Usage

Project Structure

Results

Used Prompts In OpenAI

Graph

Authors

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
db		db
internal		internal
legacy		legacy
md-resources		md-resources
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
bart_main.py		bart_main.py
chatbot_demo.py		chatbot_demo.py
docker-compose.yml		docker-compose.yml
graph.png		graph.png
main.py		main.py
relationships.csv		relationships.csv
requirements.txt		requirements.txt
test_main.http		test_main.http

theFellandes/Hw3-Knowledge-Graph

Folders and files

Latest commit

History

Repository files navigation

Galatasaray University Advanced Topics in Computer Engineering Homework 3

Homework Description

Requirements

Installation

Usage

Project Structure

Results

Used Prompts In OpenAI

Graph

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages