This is a repository for graph embedding and visualization. Various different graph embedding methods and dimension reduction methods are combined to produce 2D layouts for graph data.
-
Install Python 3.9 (other unprescribed versions of Python may work, but are not tested).
-
Clone the repository. Use
Clone Git Repository...
tab in an empty window of VSCode or use the following command line in Command Prompt:
git clone https://github.com/Charlie-XIAO/embedding-visualization-test.git
- Set the Python virtual environment using the following command lines in Command Prompt:
python -m venv myvenv
(For Windows) myvenv\Scripts\activate
(For Mac/Linux) source myvenv/bin/activate
- Install required packages in the Python virtual environment using the following command line in Command Prompt:
pip install -r requirements.txt
- Run
main.py
using the following command line in Command Prompt:
python main.py --data <dataset_name> --embed <embedding_method> --vis <visualization_method>
(Example) python main.py --data wiki --embed deepwalk --vis t-sne
- To run the program on large datasets mentioned in the experiment 2 of the essay, download zipped datasets from this google drive link, and unzip the file in the
datasets
folder.
Method | Paper | Note |
---|---|---|
DeepWalk | [KDD 2014] DeepWalk: Online Learning of Social Representations | 【Graph Embedding】DeepWalk:算法原理,实现和应用 |
Node2Vec | [KDD 2016] Node2Vec: Scalable Feature Learning for Networks | 【Graph Embedding】Node2Vec:算法原理,实现和应用 |
LE | [KDD 2001] Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering | 【Graph Embedding】LE(拉普拉斯映射)特征提取方法 |
GLEE | [KDD 2019] GLEE: Geometric Laplacian Eigenmap Embedding | |
SDNE | [KDD 2016] Structural Deep Network Embedding | 【Graph Embedding】SDNE:算法原理,实现和应用 |
model = DeepWalk(self.graph, walk_length=10, num_walks=80, workers=1)
model.train(embed_size=128, window_size=5, iter=3)
embeddings = pd.DataFrame.from_dict(model.get_embeddings())
self.embeddings = embeddings.T
model = Node2Vec(self.graph, walk_length=10, num_walks=80, p=0.25, q=4, workers=1)
model.train(embed_size=128, window_size=5, iter=3)
embeddings = pd.DataFrame.from_dict(model.get_embeddings())
self.embeddings = embeddings.T
model = LEE(self.graph)
embeddings = pd.DataFrame.from_dict(model.get_embeddings(embed_size=128, iter=100))
self.embeddings = embeddings.T
model = GLEE(self.graph)
embeddings = pd.DataFrame.from_dict(model.get_embeddings(embed_size=128, iter=100))
self.embeddings = embeddings.T
model = SDNE(self.graph, hidden_size=[256, 128])
model.train(batch_size=3000, epochs=40, verbose=2)
embeddings = pd.DataFrame.from_dict(model.get_embeddings())
self.embeddings = embeddings.T
model = ShortestPath(self.graph)
embeddings = pd.DataFrame.from_dict(model.get_embeddings(embed_size=128, sampling="random"))
self.embeddings = embeddings.T
model = SPLEE(self.graph)
embeddings = pd.DataFrame.from_dict(model.get_embeddings(embed_size=128, iter=10, shape="gaussian", epsilon=6.0, threshold=5))
self.embeddings = embeddings.T
Method | Paper | Note |
---|---|---|
PCA | [WCS 2010] Principal Component Analysis | 【Dimension Reduction】主成分分析(PCA)原理详解 |
t-SNE | [KDD 2016] Visualizing Data Using t-SNE | 【Dimension Reduction】降维方法之 t-SNE |
model = PCA(n_components=2, random_state=0)
self.projections = model.fit_transform(self.X)
model = TSNE(n_components=2, verbose=2, random_state=0)
self.projections = model.fit_transform(self.X)
model = TSGNE(perplexity=30, n_components=2, verbose=2, random_state=0, knn_matrix=self.knn_matrix, mode="distance")
self.projections = model.fit_transform(self.X)
In the datasets
folder, create a folder with the name of the dataset. In this folder, put the edgelist file and the labels file (optional), and name them <dataset_name>_edgelist.txt
and <dataset_name>_labels.txt
respectively. The program automatically reads graph and label data from the correponding locations.