Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tool submission #8

Open
stanleybak opened this issue Jun 13, 2021 · 85 comments
Open

Tool submission #8

stanleybak opened this issue Jun 13, 2021 · 85 comments

Comments

@stanleybak
Copy link
Owner

The tool testing and submission instructions are in this document.

Please post your tool information according to the instructions in this topic.

@stanleybak
Copy link
Owner Author

stanleybak commented Jun 13, 2021

This is the entry for the nnenum tool. The tool is available using git:

TOOL_NAME=nnenum
REPO=https://github.com/stanleybak/nnenum.git 
COMMIT=c93a39cb568f58a26015bd151acafab34d2d4929
SCRIPTS_DIR=vnncomp_scripts

@mnmueller
Copy link

Regarding the tool (and benchmark) submission deadline. I would like to propose to either strictly enforce both or give all participants one week from the last update to any benchmark to submit their tool (i.e., move that deadline) such that all other participants can check that their tools behaves as expected on all other benchmarks and potentially fix any problems coming up.

@stanleybak
Copy link
Owner Author

stanleybak commented Jun 13, 2021

I think your second option may be more realistic. Maybe we'll have a cutoff for scored benchmarks... if people submit something late and we still want to do them, them they won't be part of the scoring. How does that sound?

@mnmueller
Copy link

I have no problem with either suggestion. While it would be great to be able to score all benchmarks, moving the deadline will obviously leave you (and the rest of the organizing team) with a tighter schedule, so I think this decision should be yours.
If we go with excluding some benchmarks from the scoring, it would still be great to be able to test the final (example) benchmarks/networks before we have to submit our tools.

@pat676
Copy link

pat676 commented Jun 25, 2021

@stanleybak, we intend to compete with a toolkit that hasn't been published yet; how should we proceed with this? We can share it via a Dropbox folder if that's okay with you?

We also have a dependency on the Xpress solver, they have free academic licenses or we could provide the license file (this is all described in our readme). If we are to provide the license file, we need a host-id from the AWS instance; this host-id changes every time the instance is stopped.

@Neelanjana314
Copy link
Contributor

Neelanjana314 commented Jun 26, 2021

@stanleybak Should we have one result file for each instance run(i.e one onnx-vnnlib pair)? Or its a single result (.txt) file containing all the instances run?

Also, as per my understanding there will be 3 categories: mnist, cifar10 and acasxu. In that case, if I want to skip a particular benchmark under any these categories, how should I mention that? Or the Categories will be the benchmark names?

@Wei-TianHao
Copy link
Contributor

Wei-TianHao commented Jun 28, 2021

This is the entry for the NeuralVerification.jl tool. The tool is available using git: Passed docker test.

TOOL_NAME=NeuralVerification.jl
REPO=https://github.com/intelligent-control-lab/NeuralVerification.jl.git
COMMIT=4e612602ba4b34b42416742d85476d9b0dcdcb51
SCRIPTS_DIR=vnncomp_scripts

@pat676
Copy link

pat676 commented Jun 28, 2021

@stanleybak, we have emailed you instructions on how to obtain the VeriNet toolkit. Please let us know if you need a different submission format.

@stanleybak
Copy link
Owner Author

stanleybak commented Jun 28, 2021

@pat676 got it. We'll let you know if there are issues with the licensing.

@Neelanjana314 one result for each instance run. The run_instance.sh script gets passed in a single onnx path and a single vnnlib path, your tool should run on that single instance and then output the result. Our competition scripts will aggregate results.

@alessandrodepalma
Copy link
Contributor

alessandrodepalma commented Jun 28, 2021

This is the submission for the oval framework:

TOOL_NAME=oval
REPO=https://github.com/oval-group/oval-bab.git 
COMMIT=0f39b4d685927c56f9e2c12307cc3d2b19be8bd6
SCRIPTS_DIR=vnncomp_scripts

@mnmueller
Copy link

mnmueller commented Jun 28, 2021

This is the submission for the ERAN framework:
A gurobi license has to be acquired manually as detailed in the readme.

TOOL_NAME=ERAN
REPO=https://github.com/mnmueller/eran_vnncomp2021.git
COMMIT=<updated later in the topic>ca42b4afd1ff8cb92ebde5303fcce0db26357b49
SCRIPTS_DIR=vnncomp_scripts

Edit: ERAN should be run on a GPU instance

@stanleybak
Copy link
Owner Author

@mnmueller great thanks. Were you able to test if / how to get Gurobi working on AWS?

@mnmueller
Copy link

@stanleybak Yes. If you follow the README.md or uncomment the following block in install_tool_user.sh and copy a license key into the "###" block everything should work correctly.

#cd bin
#./grbgetkey #################### < ../../
#cd ../../../

@dlshriver
Copy link

The submission for DNNF:

TOOL_NAME=DNNF
REPO=https://github.com/dlshriver/DNNF.git 
COMMIT=e2dafcc0017bdd555a777e8f6ae96d0af5813bfb
SCRIPTS_DIR=scripts/vnncomp

The README for the tool submission is here: https://github.com/dlshriver/DNNF/blob/e2dafcc0017bdd555a777e8f6ae96d0af5813bfb/scripts/vnncomp/README.md

@ChristopherBrix
Copy link

For Debona:

TOOL_NAME=Debona
REPO=https://github.com/ChristopherBrix/Debona
COMMIT=792575d18bb5f83cb8699dda6b9097dc41438e3d
SCRIPTS_DIR=Debona

https://github.com/ChristopherBrix/Debona

Thank you for organizing this competition!

@Neelanjana314
Copy link
Contributor

Neelanjana314 commented Jun 29, 2021

TOOL_NAME=nnv
REPO=https://github.com/verivital/nnv.git
COMMIT=de4b327fdf112888a03a0bd51c4f4854e1b8f53b
SCRIPTS_DIR=nnv/code/nnv/examples/Submission/VNN_COMP2021/vnncomp_scripts

https://github.com/verivital/nnv.git
Readme : https://github.com/verivital/nnv/tree/master/code/nnv/examples/Submission/VNN_COMP2021/vnncomp_scripts/README.md

@pkouvaros
Copy link

pkouvaros commented Jun 29, 2021

Here is the entry for venus2. Thank you for the organisation of the competition @stanleybak.

TOOL_NAME=venus2
REPO=https://github.com/pkouvaros/venus2_vnncomp21
COMMIT=c13f9bf486a5eaf82a9193836bc09d8e862c48f4
SCRIPTS_DIR=vnncomp_scripts

@wu-haoze
Copy link
Contributor

wu-haoze commented Jun 29, 2021

For Marabou

TOOL_NAME=Marabou
REPO= https://github.com/anwu1219/Marabou_private.git
COMMIT=32bc82e785c570523c0af0a0e6e2b77c7e89986f
SCRIPTS_DIR=vnn-comp-scripts

Repo: https://github.com/anwu1219/Marabou_private.git
Readme : https://github.com/anwu1219/Marabou_private/blob/vnn-comp-21/README.md

@Joe-Vincent
Copy link

Joe-Vincent commented Jun 29, 2021

For RPM

TOOL_NAME=RPM
REPO=https://github.com/StanfordMSL/Neural-Network-Reach.git
COMMIT=021a811153ae744bdbc49726809bf5670d9f33a2
SCRIPTS_DIR=vnncomp_scripts

@huanzhang12
Copy link
Contributor

Here is our entry for alpha,beta-CROWN:

TOOL_NAME=alpha-beta-CROWN
REPO=https://github.com/huanzhang12/alpha-beta-CROWN
COMMIT=8144c10a4aa2c182e9556cc302c6654bbf9cbfc3
SCRIPTS_DIR=vnncomp_scripts

Our tool should be run on a GPU instance with Amazon Deep Learning AMI 46.0 (Ubuntu 18.04), and it should run all benchmarks without errors/crashes. It requires a Gurobi license. If you encounter any issues please let us know. Thank you!

@stanleybak
Copy link
Owner Author

Ok, the list of 12 tools is as follows:

alpha-beta-CROWN
Debona
DNNF
eran_vnncomp2021
Marabou_private
Neural-Network-Reach
NeuralVerification.jl
nnenum
nnv
oval-bab
venus2_vnncomp21
VeriNet

Let me know if I missed anyone.

@stanleybak
Copy link
Owner Author

@Wei-TianHao Should NeuralVerification.jl be run on a CPU instance or GPU instance? Any custom installation instructions?

@Wei-TianHao
Copy link
Contributor

It should be run on a CPU instance. The installation is fully automated.

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 1, 2021

An issue came up that I'm going to need further input from everyone. The Amazon Deep Learning Base AMI we're using comes with standard deep learning frameworks like tensorflow and pytorch installed, but you still need to select which one using conda:

Please use one of the following commands to start the required environment with the framework of your choice:
for AWS MX 1.7 (+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) _______________________________ source activate mxnet_p36
for AWS MX 1.8 (+Keras2) with Python3 (CUDA + and Intel MKL-DNN) ___________________________ source activate mxnet_latest_p37
for AWS MX(+AWS Neuron) with Python3 ___________________________________________________ source activate aws_neuron_mxnet_p36
for AWS MX(+Amazon Elastic Inference) with Python3 _______________________________________ source activate amazonei_mxnet_p36
for TensorFlow(+Keras2) with Python3 (CUDA + and Intel MKL-DNN) _____________________________ source activate tensorflow_p37
for Tensorflow(+AWS Neuron) with Python3 _________________________________________ source activate aws_neuron_tensorflow_p36
for TensorFlow 2(+Keras2) with Python3 (CUDA 10.1 and Intel MKL-DNN) _______________________ source activate tensorflow2_p36
for TensorFlow 2.3 with Python3.7 (CUDA + and Intel MKL-DNN) ________________________ source activate tensorflow2_latest_p37
for PyTorch 1.4 with Python3 (CUDA 10.1 and Intel MKL) _________________________________________ source activate pytorch_p36
for PyTorch 1.7.1 with Python3.7 (CUDA 11.1 and Intel MKL) ________________________________ source activate pytorch_latest_p37
for PyTorch (+AWS Neuron) with Python3 ______________________________________________ source activate aws_neuron_pytorch_p36
for base Python3 (CUDA 10.0) _______________________________________________________________________ source activate python3

If you are using one of these frameworks, could you specify your tool name and which framework I should activate?

Alternatively, if you need multiple frameworks, I don't think this is going to work as you need to select one (I ran into this with ERAN @mnmueller , where I needed both pytorch and tensorflow). My idea then is I could just install the frameworks directly in that case (let me know if you have a better idea). If you don't want me to use the conda environment they provide, could you please specify your tool name and the commands I should use to install the necessary frameworks?

If you don't need any frameworks you don't need to provide further information.

@huanzhang12
Copy link
Contributor

@stanleybak Our script for alpha-beta-CROWN has handled the selection of frameworks and there is no need to manually activate. Let us know if you encounter any troubles running our code. Thank you!

Generally, using multiple frameworks (tensorflow+pytorch like the case in ERAN) should be fine as long as people provide their requirements.txt listing all required Python packages (e.g., both tensorflow and pytorch can be listed in requirements.txt), and in the initial setup you can just install all required packages via python -m pip install -r requirements.txt (if that's not already included in install_tool.sh) in any python environment (either any of the preinstalled ones, or the base/vanilla one). This will install any missing packages listed in requirements.txt (if a package is already provided by the environment, it will not be reinstalled).

For example, in a PyTorch environment (like source activate pytorch_latest_p37) you can install tensorflow and all other required packages automatically using the python -m pip install -r requirements.txt command, and in all following experiments you just need to activate this single environment.

@mnmueller
Copy link

While everything @huanzhang12 said is true the ERAN install script was based on the vanilla python environment and as this might be updated between when I tested things and when you run/initialize the instances, it is probably better to use a well defined conda environment.
I updated the install instructions and install script to include every step and install everything in a (custom) conda environment:

TOOL_NAME=ERAN
REPO=https://github.com/mnmueller/eran_vnncomp2021.git
COMMIT=808bfa4a1d3660c7e161ab1550f90392c9fdd2ee
SCRIPTS_DIR=vnncomp_scripts

@dlshriver
Copy link

For DNNF, our install_tool.sh script should take care of creating a python virtual environment and installing the required frameworks, and the other scripts should automatically activate this virtual environment, so there shouldn't be a need to manually activate a conda environment.

@pat676
Copy link

pat676 commented Jul 2, 2021

For VeriNet the install_tool.sh script should install all necessary requirements.

@alessandrodepalma
Copy link
Contributor

The install_tool.sh script for the oval framework installs all the necessary requirements into a new conda environment, which is then exploited by prepare_instance.sh and run_instance.sh scripts.
No additional command should be required.

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 2, 2021

@huanzhang12 I'm getting an error during install after entering the Gurobi key:

./vnncomp_scripts/install_tool.sh: line 77: /home/ubuntu/anaconda3/envs/pytorch_latest_p37/bin/grbgetkey: No such file or directory

Please modify install script and provide new commit hash.

@ChristopherBrix
Copy link

ChristopherBrix commented Jul 4, 2021

@ChristopherBrix Line 62 in install_tool.sh is missing a $: cd SCRIPT_DIR.

+ cd SCRIPT_DIR
././install_tool.sh: line 62: cd: SCRIPT_DIR: No such file or directory

install_Debona_log.txt

Sorry, fixed that: a7612da6e6fd0b72b480f23cdf0816753c3b5b62
(edited, because the path was still wrong)

@Joe-Vincent
Copy link

Joe-Vincent commented Jul 5, 2021

@stanleybak OK that error should be fixed now. Here is the new commit:

TOOL_NAME=Neural-Network-Reach
REPO=https://github.com/StanfordMSL/Neural-Network-Reach.git
COMMIT=132643a00cba211f598a0287065ac9d3494e1e4e
SCRIPTS_DIR=vnncomp_scripts

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 6, 2021

@mnmueller Here's the new output I get for the test category. Most of the benchmark instances are not completing, with slightly different errors. Does this match the test category result on your system? It's not essential to do the benchmarks in this category, but it's more of a sanity check that things are installed correctly. Should we proceed with the measurements?

install_ERAN_log.txt
test_ERAN_log.txt

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 6, 2021

@Joe-Vincent Here's the new output for Neural-Network-Reach that I get for the test category. None of the benchmark instances produce a result. Does this match the test category result on your system? It's not essential to do the benchmarks in this category, but it's more of a sanity check that things are installed correctly. Should we proceed with the measurements?

ERROR: LoadError: MethodError: no method matching local_map(::Array{BitArray{1},1}, ::Array{Any,1})

install_Neural-Network-Reach_log.txt
test_Neural-Network-Reach_log.txt

@stanleybak
Copy link
Owner Author

@ChristopherBrix I think Debona installed correctly now. There are still a few errors on the test category, but I think that's more that the networks are weird (for example, the nano one lacks a bias), than anything else. Does this match the result on your system? Should we proceed with the measurements?

install_Debona_log.txt
test_Debona_log.txt

@mnmueller
Copy link

@stanleybak Sorry about that. Fixed a typo and updated the required onnx version and now everything should work as expected. The new commit is:

8e7c3e42c4319e3b52d8c2d3506daec67ea04a99

@Joe-Vincent
Copy link

@stanleybak Thanks, that should be fixed now.
The test categories should produce the expected outputs.

COMMIT=861ce6e380e3cc2d439a7bca87b59817e4624af6

@ChristopherBrix
Copy link

ChristopherBrix commented Jul 6, 2021

@ChristopherBrix I think Debona installed correctly now. There are still a few errors on the test category, but I think that's more that the networks are weird (for example, the nano one lacks a bias), than anything else. Does this match the result on your system? Should we proceed with the measurements?

install_Debona_log.txt
test_Debona_log.txt

You're right, the networks not having a bias messed up my onnx parsing (also, there was still debug code active, setting all existing biases to zero...). That's fixed now: 79f122707fd84c60c76f6363133f0421ece7bc5b

I was surprised to notice that for nn4sys/nets/normal_1000.onnx and test/test_small.onnx, the weights specified in the onnx file seem to be transposed - I was able to adapt the parsing code to detect that and reverse the transpose, but is this expected or a bug in the benchmark?

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 7, 2021

I was surprised to notice that for nn4sys/nets/normal_1000.onnx and test/test_small.onnx, the weights specified in the onnx file seem to be transposed - I was able to adapt the parsing code to detect that and reverse the transpose, but is this expected or a bug in the benchmark?

@ChristopherBrix how did you know they were transposed? The sizes? The onnxruntime seems to execute them without raising errors.

@changliuliu Any insight on this?

@Wei-TianHao
Copy link
Contributor

@ChristopherBrix The shape of the weight matrix depends on the input order (specified in the onnx file). If the weight matrix is the first input. Then it's supposed to be (W X). If the weight matrix is the second input, then it's supposed to be (X^T W)

@anwu
Copy link

anwu commented Jul 7, 2021

I'm currently on PTO until next Monday, I'll take a look then.
Also, I think you tagged the wrong @anwu. @anwu1219 maybe?

@huanzhang12
Copy link
Contributor

@ChristopherBrix @stanleybak We also had that issue with the nn4sys models, where we found the weight matrices were transposed and in wrong shapes after converted using the onnx2pytorch package. We had to manually workaround that issue by modifying the onnx2pytorch package. In our case this might be a bug of onnx2pytorch though (we later found that onnx2pytorch also produces not totally correct models for a few other VNNCOMP models).

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 7, 2021

Also, I think you tagged the wrong @anwu. @anwu1219 maybe?

@anwu Opps you're right. Sorry about that, feel free to ignore.

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 7, 2021

@huanzhang12 @ChristopherBrix Which instances specifically caused the problem? If multiple tools have issues on specific benchmark instances, it may be best to exclude them from the scoring this time. I'll discuss this with the organizers.

@huanzhang12
Copy link
Contributor

huanzhang12 commented Jul 7, 2021

@stanleybak Our issue was just with the problematic onnx2pytorch library which had a bug and did not strictly follow the onnx standard in many places (it did not work on many MLP models including all nn4sys models). The nn4sys models themselves are good, because onnxruntime can indeed correctly inference them. I also did a quick test with another PyTorch based tool (oval) and they also work well on this benchmark, so I believe these models themselves should be good if parsed correctly.

I think it is better to not exclude this benchmark from scoring at this time, considering that people from many teams (including us) have made significant efforts on making this benchmark working.

@ChristopherBrix
Copy link

ChristopherBrix commented Jul 7, 2021

Thanks for the explanation, I didn't realize that it was connected to the order of inputs.

From those benchmarks that I can process (only FFs), the order seems to be swapped for nn4sys/nets/*.onnx, test/test_small.onnx and test/test_tiny.onnx. Not knowing the underlying reason, I just fixed it by checking whether the matrix shapes makes sense, and transposing it otherwise. That detected the first two, and missed the tiny one, but there the matrix is 1x1 anyway, so it does not matter.

I have no preference whether the benchmarks should be updated to have a consistent format, but I agree, they should not be excluded.

@stanleybak
Copy link
Owner Author

Another update: if you're using a virtual environment or conda, check if it's correct / activate it in your prepare_instance.sh or run_instance.sh script, not (just) in your install_tool.sh script. For practical reasons, installation is a separate step from running the benchmarks, so the environment is reset between these two steps.

@dlshriver This may have been an issue with DNNF. I've attached the install log and test log (the test models are failing).
install_DNNF_log.txt
test_DNNF_log.txt

@dlshriver Any update for DNNF?

@stanleybak
Copy link
Owner Author

@pkouvaros it seems to be having trouble finding the .so for gurobi (license was installed correctly)

File "/home/ubuntu/work/venus2/src/Specification.py", line 12, in <module>
    from gurobipy import *
  File "/usr/local/lib/python3.6/dist-packages/gurobipy/__init__.py", line 1, in <module>
    from .gurobipy import *
ImportError: libgurobi91.so: cannot open shared object file: No such file or directory

I didn't see any obvious mistakes in the install script, any idea how to fix it?

install_venus2_log.txt
test_venus2_log.txt

@pkouvaros Any update for venus2?

@dlshriver
Copy link

@stanleybak It looks like the installation is failing because it can't find virtualenv. I tried modifying it to use venv instead. The new commit is d3fd0ea5ef5f2a416f4646fdd0d77bb60139254d.

@stanleybak
Copy link
Owner Author

@dlshriver Here are the results I got on the test category, which doesn't quite seem to be working. Let me know if there's an update or if I should run the remaining measurements.

install_DNNF_log.txt
test_DNNF_log.txt

@dlshriver
Copy link

@stanleybak I'm not sure what is causing the error, but it seems to be related to concurrency, so I just disabled it. Could you try this commit d4f08b43e4ad622157c65ac071183a3a0f4e6fe0?

@stanleybak
Copy link
Owner Author

@dlshriver this fixed the errors, but the result is unknown for most of the test networks. Does this match the result on your system? Should I run the remaining measurements?

test,./benchmarks/test/test_nano.onnx,./benchmarks/test/test_nano.vnnlib,.010728855,unknown,2.884330116
test,./benchmarks/test/test_tiny.onnx,./benchmarks/test/test_tiny.vnnlib,.007432943,unknown,2.967444308
test,./benchmarks/test/test_small.onnx,./benchmarks/test/test_small.vnnlib,.006911572,unknown,4.035037926
test,./benchmarks/test/test_sat.onnx,./benchmarks/test/test_prop.vnnlib,.007084138,violated,2.074686221
test,./benchmarks/test/test_unsat.onnx,./benchmarks/test/test_prop.vnnlib,.007649187,unknown,34.466189558
test,./benchmarks/test/test_nano.onnx,./benchmarks/test/test_nano.vnnlib,.007168776,unknown,2.794327169

install_DNNF_log.txt
test_DNNF_log.txt

@dlshriver
Copy link

@stanleybak I think it's okay. Let's run it

@pkouvaros
Copy link

@stanleybak Thank you. It should be working now. New commit is "e14b7f356aed82d515804d0a8daa54572ac07f17".

@stanleybak
Copy link
Owner Author

stanleybak commented Jul 10, 2021

@pkouvaros It looks like venus2 running now, but all of the networks in the test category output holds. Does this match the result on your system? Should I proceed with the measurement?

install_venus2_log.txt
test_venus2_log.txt

@pkouvaros
Copy link

@stanleybak I disabled a feature which may have been causing this. Could you please run the tool on this commit "1a70cc4a174ebaf11ecd605bd50505180e6f5da7". Thanks.

@ChristopherBrix
Copy link

For the report that's currently written: I propose to include the specific commit hash that was used, to make sure that the results can be replicated even if the tools are improved in the future. I went ahead and included in our description, feel free to remove it if you don't agree.

@stanleybak
Copy link
Owner Author

For the report that's currently written: I propose to include the specific commit hash that was used

@ChristopherBrix Yes this is a good idea

@stanleybak
Copy link
Owner Author

I'm trying to test things and get an initialization error with the tiny instance (run 285):


+ unzip large_models.zip -d large_models
./setup.sh: line 4: unzip: command not found```

@ChristopherBrix
Copy link

I guess that was meant to be posted in this years repo? stanleybak/vnncomp2022#3

It's fixed, sorry. That's what I get for thinking that a minor change doesn't need to be tested...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests