Skip to content

Commit

Permalink
Fixed training config
Browse files Browse the repository at this point in the history
  • Loading branch information
sardev committed Aug 31, 2024
1 parent 59c9cbc commit 3fb93c0
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 61 deletions.
16 changes: 11 additions & 5 deletions wikipedia_analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
The following README contains the steps to perform the benchmarking on the wikipedia datasets. Before we run anything, run the commands:
```
$ sudo apt update -y && sudo apt upgrade -y
$ sudo apt-get install -y xfsprogs
$ sudo modprobe -v xfs
```

## Mounting the data directory
Expand Down Expand Up @@ -45,11 +47,6 @@ $ sudo mount -a
$ sudo chmod ugo+rw -R all_data
```

Verify by running `df -h` inside of `all_data` and ensure it produces this output:
```
```

## Setting up docker

First install the nvidia driver using the command:
Expand Down Expand Up @@ -88,6 +85,8 @@ $ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --de
$ sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
$ sudo nvidia-ctk runtime configure --runtime=docker
$ sudo systemctl restart docker
```

Finally run `sudo reboot`. Verify the install by running the command `nvidia-smi`.
Expand Down Expand Up @@ -121,6 +120,7 @@ $ python3 -m pip install boto3

Then setup aws using `aws configure`. Then run the preprocessing using the commands:
```
$ apt install -y lbzip2
$ cd wikipedia_analysis
$ python3 -u preprocess_runner.py &> preprocess.log
```
Expand All @@ -136,4 +136,10 @@ $ cmake ../ -DUSE_CUDA=TRUE -DUSE_OMP=TRUE
and then:
```
$ rm -rf /root/all_data/graph_snapshots/initial_snapshot/marius_formatted/model_* && make marius_train -j && ./marius_train ../wikipedia_analysis/initial_training.yaml
```

Once the training is done then upload the results to AWS using the commands:
```
$ tar -I lbzip2 -cvpf ~/all_data/graph_snapshots/trained_initial_snapshot.tar.gz ~/all_data/graph_snapshots/initial_snapshot
$ aws s3 mv ~/all_data/graph_snapshots/trained_initial_snapshot.tar.gz s3://wikidata-update-history
```
62 changes: 6 additions & 56 deletions wikipedia_analysis/initial_training.yaml
Original file line number Diff line number Diff line change
@@ -1,59 +1,9 @@
model:
learning_task: LINK_PREDICTION
encoder:
train_neighbor_sampling:
- type: UNIFORM
options:
max_neighbors: 32
- type: UNIFORM
options:
max_neighbors: 32
- type: UNIFORM
options:
max_neighbors: 32
eval_neighbor_sampling:
- type: UNIFORM
options:
max_neighbors: 32
- type: UNIFORM
options:
max_neighbors: 32
- type: UNIFORM
options:
max_neighbors: 32
layers:
- - type: EMBEDDING
output_dim: 32
bias: true
init:
type: GLOROT_NORMAL

- - type: GNN
options:
type: GRAPH_SAGE
aggregator: MEAN
input_dim: 32
output_dim: 32
bias: true
init:
type: GLOROT_NORMAL

- - type: GNN
options:
type: GRAPH_SAGE
aggregator: MEAN
input_dim: 32
output_dim: 32
bias: true
init:
type: GLOROT_NORMAL

- - type: GNN
options:
type: GRAPH_SAGE
aggregator: MEAN
input_dim: 32
output_dim: 32
output_dim: 128
bias: true
init:
type: GLOROT_NORMAL
Expand All @@ -70,7 +20,7 @@ model:
sparse_optimizer:
type: ADAGRAD
options:
learning_rate: 0.01
learning_rate: 0.001
storage:
device_type: cuda
dataset:
Expand All @@ -81,15 +31,15 @@ storage:
type: DEVICE_MEMORY
save_model: true
training:
batch_size: 16
batch_size: 1024
negative_sampling:
num_chunks: 10
negatives_per_positive: 750
degree_fraction: 0.0
degree_fraction: 0.1
filtered: false
num_epochs: 50
epochs_per_shuffle: 1
evaluation:
batch_size: 16
batch_size: 1024
negative_sampling:
filtered: true
filtered: false

0 comments on commit 3fb93c0

Please sign in to comment.