-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool submission #8
Comments
This is the entry for the nnenum tool. The tool is available using git:
|
Regarding the tool (and benchmark) submission deadline. I would like to propose to either strictly enforce both or give all participants one week from the last update to any benchmark to submit their tool (i.e., move that deadline) such that all other participants can check that their tools behaves as expected on all other benchmarks and potentially fix any problems coming up. |
I think your second option may be more realistic. Maybe we'll have a cutoff for scored benchmarks... if people submit something late and we still want to do them, them they won't be part of the scoring. How does that sound? |
I have no problem with either suggestion. While it would be great to be able to score all benchmarks, moving the deadline will obviously leave you (and the rest of the organizing team) with a tighter schedule, so I think this decision should be yours. |
@stanleybak, we intend to compete with a toolkit that hasn't been published yet; how should we proceed with this? We can share it via a Dropbox folder if that's okay with you? We also have a dependency on the Xpress solver, they have free academic licenses or we could provide the license file (this is all described in our readme). If we are to provide the license file, we need a host-id from the AWS instance; this host-id changes every time the instance is stopped. |
@stanleybak Should we have one result file for each instance run(i.e one onnx-vnnlib pair)? Or its a single result (.txt) file containing all the instances run? Also, as per my understanding there will be 3 categories: mnist, cifar10 and acasxu. In that case, if I want to skip a particular benchmark under any these categories, how should I mention that? Or the Categories will be the benchmark names? |
This is the entry for the NeuralVerification.jl tool. The tool is available using git: Passed docker test. TOOL_NAME=NeuralVerification.jl
REPO=https://github.com/intelligent-control-lab/NeuralVerification.jl.git
COMMIT=4e612602ba4b34b42416742d85476d9b0dcdcb51
SCRIPTS_DIR=vnncomp_scripts |
@stanleybak, we have emailed you instructions on how to obtain the VeriNet toolkit. Please let us know if you need a different submission format. |
@pat676 got it. We'll let you know if there are issues with the licensing. @Neelanjana314 one result for each instance run. The |
This is the submission for the TOOL_NAME=oval
REPO=https://github.com/oval-group/oval-bab.git
COMMIT=0f39b4d685927c56f9e2c12307cc3d2b19be8bd6
SCRIPTS_DIR=vnncomp_scripts |
This is the submission for the ERAN framework:
Edit: ERAN should be run on a GPU instance |
@mnmueller great thanks. Were you able to test if / how to get Gurobi working on AWS? |
@stanleybak Yes. If you follow the README.md or uncomment the following block in
|
The submission for DNNF:
The README for the tool submission is here: https://github.com/dlshriver/DNNF/blob/e2dafcc0017bdd555a777e8f6ae96d0af5813bfb/scripts/vnncomp/README.md |
For Debona:
https://github.com/ChristopherBrix/Debona Thank you for organizing this competition! |
https://github.com/verivital/nnv.git |
Here is the entry for venus2. Thank you for the organisation of the competition @stanleybak.
|
For Marabou
Repo: https://github.com/anwu1219/Marabou_private.git |
For RPM
|
Here is our entry for alpha,beta-CROWN:
Our tool should be run on a GPU instance with Amazon Deep Learning AMI 46.0 (Ubuntu 18.04), and it should run all benchmarks without errors/crashes. It requires a Gurobi license. If you encounter any issues please let us know. Thank you! |
Ok, the list of 12 tools is as follows:
Let me know if I missed anyone. |
@Wei-TianHao Should |
It should be run on a CPU instance. The installation is fully automated. |
An issue came up that I'm going to need further input from everyone. The Amazon Deep Learning Base AMI we're using comes with standard deep learning frameworks like tensorflow and pytorch installed, but you still need to select which one using conda:
If you are using one of these frameworks, could you specify your tool name and which framework I should activate? Alternatively, if you need multiple frameworks, I don't think this is going to work as you need to select one (I ran into this with ERAN @mnmueller , where I needed both pytorch and tensorflow). My idea then is I could just install the frameworks directly in that case (let me know if you have a better idea). If you don't want me to use the conda environment they provide, could you please specify your tool name and the commands I should use to install the necessary frameworks? If you don't need any frameworks you don't need to provide further information. |
@stanleybak Our script for Generally, using multiple frameworks (tensorflow+pytorch like the case in ERAN) should be fine as long as people provide their For example, in a PyTorch environment (like |
While everything @huanzhang12 said is true the ERAN install script was based on the vanilla python environment and as this might be updated between when I tested things and when you run/initialize the instances, it is probably better to use a well defined conda environment.
|
For DNNF, our |
For VeriNet the install_tool.sh script should install all necessary requirements. |
The |
@huanzhang12 I'm getting an error during install after entering the Gurobi key:
Please modify install script and provide new commit hash. |
Sorry, fixed that: |
@stanleybak OK that error should be fixed now. Here is the new commit:
|
@mnmueller Here's the new output I get for the test category. Most of the benchmark instances are not completing, with slightly different errors. Does this match the test category result on your system? It's not essential to do the benchmarks in this category, but it's more of a sanity check that things are installed correctly. Should we proceed with the measurements? |
@Joe-Vincent Here's the new output for Neural-Network-Reach that I get for the test category. None of the benchmark instances produce a result. Does this match the test category result on your system? It's not essential to do the benchmarks in this category, but it's more of a sanity check that things are installed correctly. Should we proceed with the measurements?
install_Neural-Network-Reach_log.txt |
@ChristopherBrix I think Debona installed correctly now. There are still a few errors on the test category, but I think that's more that the networks are weird (for example, the nano one lacks a bias), than anything else. Does this match the result on your system? Should we proceed with the measurements? |
@stanleybak Sorry about that. Fixed a typo and updated the required onnx version and now everything should work as expected. The new commit is:
|
@stanleybak Thanks, that should be fixed now.
|
You're right, the networks not having a bias messed up my onnx parsing (also, there was still debug code active, setting all existing biases to zero...). That's fixed now: I was surprised to notice that for |
@ChristopherBrix how did you know they were transposed? The sizes? The onnxruntime seems to execute them without raising errors. @changliuliu Any insight on this? |
@ChristopherBrix The shape of the weight matrix depends on the input order (specified in the onnx file). If the weight matrix is the first input. Then it's supposed to be (W X). If the weight matrix is the second input, then it's supposed to be (X^T W) |
@ChristopherBrix @stanleybak We also had that issue with the nn4sys models, where we found the weight matrices were transposed and in wrong shapes after converted using the |
@huanzhang12 @ChristopherBrix Which instances specifically caused the problem? If multiple tools have issues on specific benchmark instances, it may be best to exclude them from the scoring this time. I'll discuss this with the organizers. |
@stanleybak Our issue was just with the problematic I think it is better to not exclude this benchmark from scoring at this time, considering that people from many teams (including us) have made significant efforts on making this benchmark working. |
Thanks for the explanation, I didn't realize that it was connected to the order of inputs. From those benchmarks that I can process (only FFs), the order seems to be swapped for I have no preference whether the benchmarks should be updated to have a consistent format, but I agree, they should not be excluded. |
@dlshriver Any update for DNNF? |
@pkouvaros Any update for venus2? |
@stanleybak It looks like the installation is failing because it can't find virtualenv. I tried modifying it to use venv instead. The new commit is |
@dlshriver Here are the results I got on the test category, which doesn't quite seem to be working. Let me know if there's an update or if I should run the remaining measurements. |
@stanleybak I'm not sure what is causing the error, but it seems to be related to concurrency, so I just disabled it. Could you try this commit |
@dlshriver this fixed the errors, but the result is
|
@stanleybak I think it's okay. Let's run it |
@stanleybak Thank you. It should be working now. New commit is "e14b7f356aed82d515804d0a8daa54572ac07f17". |
@pkouvaros It looks like venus2 running now, but all of the networks in the |
@stanleybak I disabled a feature which may have been causing this. Could you please run the tool on this commit "1a70cc4a174ebaf11ecd605bd50505180e6f5da7". Thanks. |
For the report that's currently written: I propose to include the specific commit hash that was used, to make sure that the results can be replicated even if the tools are improved in the future. I went ahead and included in our description, feel free to remove it if you don't agree. |
@ChristopherBrix Yes this is a good idea |
I'm trying to test things and get an initialization error with the tiny instance (run 285):
|
I guess that was meant to be posted in this years repo? stanleybak/vnncomp2022#3 It's fixed, sorry. That's what I get for thinking that a minor change doesn't need to be tested... |
The tool testing and submission instructions are in this document.
Please post your tool information according to the instructions in this topic.
The text was updated successfully, but these errors were encountered: