Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Category Proposal: Piecewise Linear Activations (ReLU) #2

Closed
ttj opened this issue Apr 6, 2020 · 73 comments
Closed

Category Proposal: Piecewise Linear Activations (ReLU) #2

ttj opened this issue Apr 6, 2020 · 73 comments
Labels
category-proposal Proposed category

Comments

@ttj
Copy link
Contributor

ttj commented Apr 6, 2020

Networks consisting of piecewise linear activations such as ReLUs

Representative benchmark: ACAS-Xu

Benchmark details: see appendix section 6: https://arxiv.org/abs/1702.01135

Specifications: safety (output lies in specific range based on a set of inputs)

Questions: keep to only ReLUs, or allow other linear and piecewise linear activations?

@ttj ttj added the category-proposal Proposed category label Apr 6, 2020
@ttj ttj changed the title Category Proposal: Piecewise Linear (ReLU) Category Proposal: Piecewise Linear Activations (ReLU) Apr 6, 2020
@stanleybak
Copy link

Should the specifications be a single set? Some specs like "Clear of conflict is minimal" requires encoding as a disjunction (unsafe if left > clear or right > clear, ect.), which I'm not sure all tools will support. Perhaps it makes sense to provide both if not all tools can handle disjunctions.

@alessiolomuscio
Copy link

Benchmarks: ACAS is low dimensional. We also need some high dimensional benchmarks.

Activations: I think just ReLUs is sufficient.

Specifications: safety is good but I think we ought to have local robustness here as well.

@ttj
Copy link
Contributor Author

ttj commented Apr 8, 2020

Should the specifications be a single set? Some specs like "Clear of conflict is minimal" requires encoding as a disjunction (unsafe if left > clear or right > clear, ect.), which I'm not sure all tools will support. Perhaps it makes sense to provide both if not all tools can handle disjunctions.

Thanks for the feedback, yes, we will need to handle this, e.g., by providing a list of the sets in the disjunction, then tools that don't support disjuncts can check each of these in e.g. multiple calls, whereas tools that do internally support disjuncts can handle them however they do.

@ttj
Copy link
Contributor Author

ttj commented Apr 8, 2020

Benchmarks: ACAS is low dimensional. We also need some high dimensional benchmarks.

Activations: I think just ReLUs is sufficient.

Specifications: safety is good but I think we ought to have local robustness here as well.

Thanks for the feedback. While I agree the ACAS networks are small, if one considers all the properties, the full range of input sets, and all the networks, they are nontrivial to analyze (e.g., I think at least still hours of analysis for state-of-the-art).

We're certainly welcome to other benchmarks if there are suggestions for larger ones.

What I was thinking (but welcome to revision) is the CNN category would cover the larger networks and robustness specifications (and explicitly covering convolutional layers), but if there are networks with only ReLUs for MNIST/CIFAR/etc., we could probably add those as benchmarks for this category if that is what you're thinking.

@stanleybak
Copy link

Yes, I think there are some MNIST networks with RELU / fully connected layers. Local robustness could make sense here.

@alessiolomuscio
Copy link

alessiolomuscio commented Apr 14, 2020

Benchmarks: ACAS is low dimensional. We also need some high dimensional benchmarks.
Activations: I think just ReLUs is sufficient.
Specifications: safety is good but I think we ought to have local robustness here as well.

Thanks for the feedback. While I agree the ACAS networks are small, if one considers all the properties, the full range of input sets, and all the networks, they are nontrivial to analyze (e.g., I think at least still hours of analysis for state-of-the-art).

We're certainly welcome to other benchmarks if there are suggestions for larger ones.

What I was thinking (but welcome to revision) is the CNN category would cover the larger networks and robustness specifications (and explicitly covering convolutional layers), but if there are networks with only ReLUs for MNIST/CIFAR/etc., we could probably add those as benchmarks for this category if that is what you're thinking.

To my mind the architectures and the activations are orthogonal to the size of the models. Indeed I would have thought we certainly wanted to have a set of models in each category ranging from small/manageable to large/undoable.

In addition I do think we need to cover extensively models with high dimensional inputs, including in a ReLU category. This is what the AI/ML community use and I think we want to reach out to them and show that our benchmarks cover the sort of models they themselves use (at least in terms of architecture, if not quite of size just yet) and not just case studies normally explored in verification.

@ttj
Copy link
Contributor Author

ttj commented Apr 17, 2020

To my mind the architectures and the activations are orthogonal to the size of the models. Indeed I would have thought we certainly wanted to have a set of models in each category ranging from small/manageable to large/undoable.

In addition I do think we need to cover extensively models with high dimensional inputs, including in a ReLU category. This is what the AI/ML community use and I think we want to reach out to them and show that our benchmarks cover the sort of models they themselves use (at least in terms of architecture, if not quite of size just yet) and not just case studies normally explored in verification.

Yes, these are good points, and I agree. I had not really thought of having classification without the convolutional layers, which is why I primarily had put ACAS-Xu as the representative benchmark, but assuming that's possible / architectures can be provided, then certainly we could have parameterized examples scaling the size/architecture of the networks within this category as well, for e.g. MNIST and potentially other data sets.

With respect to high-dimensional inputs, I presume you are thinking of images that would be covered with e.g. MNIST/other data sets in this category, but please let us know if you have other examples in mind that might make appropriate benchmarks.

@alessiolomuscio
Copy link

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.

Some existing tools may not support all the benchmarks, but I think that's fine.

@souradeep-111
Copy link

This is to follow up on Taylor's request.
I would like to sign up for this category with Sherlock.
https://github.com/souradeep-111/sherlock

@ttj
Copy link
Contributor Author

ttj commented Apr 20, 2020

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.

Some existing tools may not support all the benchmarks, but I think that's fine.

Thanks for the feedback! Let's see what other feedback comes in as people hopefully start to reply with what categories they're interested to see if this makes sense based on level of participation, or whether keeping them split is ideal.

@stanleybak
Copy link

I'd like the nnenum tool to participate in this category. Hopefully we can get a few benchmarks in addition to ACASXu that are higher input dimension, but not just vision ones.

@changliuliu
Copy link

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.
Some existing tools may not support all the benchmarks, but I think that's fine.

Thanks for the feedback! Let's see what other feedback comes in as people hopefully start to reply with what categories they're interested to see if this makes sense based on level of participation, or whether keeping them split is ideal.

I agree with Alessio's suggestion. If we are going to bring big models (e.g., MNIST) to this category, it makes sense to reduce to two categories regarding network structure: piecewise linear (including ReLU, convolution, max pooling, etc) and smooth nonlinear (sigmoid and tanh). Then orthogonally for each category, we can have big models (e.g., MNIST) and small models (e.g., ACAS). We may see how other tool authors will respond.

@vtjeng
Copy link
Contributor

vtjeng commented Apr 24, 2020

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.
Some existing tools may not support all the benchmarks, but I think that's fine.

Thanks for the feedback! Let's see what other feedback comes in as people hopefully start to reply with what categories they're interested to see if this makes sense based on level of participation, or whether keeping them split is ideal.

I agree with Alessio's suggestion. If we are going to bring big models (e.g., MNIST) to this category, it makes sense to reduce to two categories regarding network structure: piecewise linear (including ReLU, convolution, max pooling, etc) and smooth nonlinear (sigmoid and tanh). Then orthogonally for each category, we can have big models (e.g., MNIST) and small models (e.g., ACAS). We may see how other tool authors will respond.

I'm generally on board with Alessio and Changliu's suggestions, and I like what @changliuliu pointed out about orthogonality. To expand a bit more on it, here are four orthogonal properties of categories that I've identified.

  1. Type of non-linearity used: piecewise linear (ReLu, max pooling) vs. non-piecewise linear (sigmoid, tanh)
  2. Type of linear layers used: fully-connected layers, convolutional layers, average pooling and skip connections are all linear transforms (but not every tool might support a particular linear layer natively).
  3. Input dataset, which affects the dimensionality of the input and the model complexity required to obtain non-trivial results: CIFAR10, MNIST, ACAS, ...
  4. Specifications: safety (as defined above), local robustness. (I'm sure I'm missing something else here).

Note that I've distinguished the type of non-linearity used from the type of linear layers used in the network structure, since convolutional layers can co-exist with non-piecewise linear activations (e.g. LeNet is tanh+convolution)

We don't have to have a category covering every possible combination of properties (non-linearity, input dataset, specification), but would just what they are for each category.

To avoid an explosion in the number of categories, rather than having additional categories for 2, we could just note in the overall summary what each tool is able to support natively.

@vtjeng
Copy link
Contributor

vtjeng commented Apr 24, 2020

Seperately: I'd like to enter the MIPVerify tool for this category as it is currently described.

More details on the tool:

  • Non-linearities supported: network must be piecewise linear; ReLU and max units are supported, but tanh and sigmoid layers are not.
  • Linear layers natively supported: fully connected layers, convolution layers, skip connections (as used in ResNets)
  • Requirements on specifications: The tool is currently geared towards verifying local robustness, but I intend to add support for ACAS-Xu style safety specifications as that has been a highly requested feature.
  • Input format
    • Networks: The "native" input format for the tool is a .mat file containing a dictionary with all the parameters. I currently also have a script that can convert saved PyTorch models into the .mat input format.
    • Specifications: Since I currently only verify local robustness, the specification is just the size of the L_infinity robustness radius that is to be verified.

@GgnDpSngh
Copy link

I would like to enter our tool ERAN in this category:
https://github.com/eth-sri/eran

Cheers,
Gagandeep Singh

@guykatzz
Copy link

Hi,
Please add Marabou for this category, too.

Regards,
Guy

@FlorianJaeckle
Copy link

The oval group would like to sign up for this category. Our work will be based on the methods presented in the journal paper `Branch and Bound for Piecewise Linear Neural Network Verification’ and developments thereof.

Cheers,
Florian Jaeckle

@ttj
Copy link
Contributor Author

ttj commented May 2, 2020

Piecewise/ReLU Category Participants:

Oval
Marabou
ERAN
MIPVerify
nnenum
NNV
Sherlock

If anyone else plans to join this category, please add a comment soon, as the participants in this category need to decide on the benchmarks soon (by about May 15).

@haithamkhedr
Copy link

haithamkhedr commented May 2, 2020

Hi,
Please add PeregeriNN for this category. We support fully connected layers with ReLU activations.

Regards,
Haitham

@pat676
Copy link

pat676 commented May 4, 2020

Hi,

We would like to enter VeriNet https://vas.doc.ic.ac.uk/software/neural/.

The toolkit supports:

  • Local robustness properties.

  • Fully connected and convolutional layers.

  • ReLU, Sigmoid and Tanh activation functions.

Regards,
Patrick Henriksen

@ttj
Copy link
Contributor Author

ttj commented May 8, 2020

Finalized Piecewise/ReLU Category Participants:

Oval
Marabou
ERAN
MIPVerify
nnenum
NNV
Sherlock
VeriNet
PeregeriNN
Venus

For benchmarks, as indicated earlier, we will decide as a group the benchmarks, and anyone may propose some. For this category by default to move things along, I suggest we use the ACAS-Xu networks as one benchmark set, and decide which of the networks and properties, if not all of them.

There has been earlier discussion regarding some classification networks in this category as well. If anyone has some and can provide details (e.g., links to models) that would fall into this category and not the CNN category (e.g., without convolutional layers), please speak up. Any other benchmarks are of course welcome, but details need to be provided soon (~May 15) so we may finalize things to give at least a month or so for everyone to look at them.

If we consider classification, as in the CNN category, there is a lot of parameterization (on the network architecture, the data set, the specification, etc.), so those are the details we will want to discuss next and finalize, as well as how all of these aspects will be specified to ease interoperability between the tools.

@pkouvaros
Copy link

Hi

We also like to enter Venus (https://vas.doc.ic.ac.uk/software/neural/) in this category.

Best

Panagiotis

@haithamkhedr
Copy link

I suggest we also consider adversarial robustness for MNIST fully connected NN with ReLU activations. Some MNIST trained models can be found here

@stanleybak
Copy link

stanleybak commented May 9, 2020

For ACAS Xu, I suggest a subcategory of the following 10 set of networks / benchmarks I found to be difficult.

ACASXU-HARD:

  1. net 4-6, prop 1
  2. net 4-8, prop 1
  3. net 3-3, prop 2
  4. net 4-2, prop 2
  5. net 4-9, prop 2
  6. net 5-3, prop 2
  7. net 3-6, prop 3
  8. net 5-1, prop 3
  9. net 1-9, prop 7
  10. net 3-3, prop 9

edit: fixed type in prop 7 to net 1-9... not 1-7

@pat676
Copy link

pat676 commented May 12, 2020

I agree with haithamkhedr in that we should consider adversarial robustness benchmarks for MNIST networks; however, we should also consider networks of different depths as som toolkits seem to handle deeper networks better than others. I can provide 3 MNIST fully connected networks with 2, 4 and 6 fully-connected layers and 256 nodes in each layer.

@ttj
Copy link
Contributor Author

ttj commented May 12, 2020

I agree with haithamkhedr in that we should consider adversarial robustness benchmarks for MNIST networks; however, we should also consider networks of different depths as som toolkits seem to handle deeper networks better than others. I can provide 3 MNIST fully connected networks with 2, 4 and 6 fully-connected layers and 256 nodes in each layer.

Thanks! Can you please provide links so participants can check them out?

@pat676
Copy link

pat676 commented Jun 9, 2020

I can provide 3 MNIST fully connected networks with 2, 4 and 6 fully-connected layers and 256 nodes in each layer.

When I load the networks from @pat676 I see some strange structure at the start of the network before the fully connected layers:

out

In particular, it looks like the start of the computation is node "23"... what's going on before that?

Hi @stanleybak,

It looks like the part of our code that was designed to handle non-standard input dimensions, I didn't expect this to show in the ONNX model.

I've modified the code to get rid of most of these operations; the only thing left is one flatten operation as I was told to use inputs of dim (1, 784, 1); however, the PyTorch model requires inputs on the form (1, 784).

Let me know if this looks okay to you.

@stanleybak
Copy link

stanleybak commented Jun 9, 2020

Let me know if this looks okay to you.

Yes the network is great now @pat676.

One more weird thing that doesn't really need to be fixed is that the input images look like they want to be lists of integers, but instead they are floating point values close to integers. For example, in image1 you have entries like:
84.00000259280205,185.00000417232513,159.0000057220459

I'm not sure if this was on purpose. My guess was you started with a single-precision floating point value in the range [0, 1] and multiplied it by 255 and then forgot to round.

@pat676
Copy link

pat676 commented Jun 10, 2020

Let me know if this looks okay to you.

Yes the network is great now @pat676.

One more weird thing that doesn't really need to be fixed is that the input images look like they want to be lists of integers, but instead they are floating point values close to integers. For example, in image1 you have entries like:
84.00000259280205,185.00000417232513,159.0000057220459

I'm not sure if this was on purpose. My guess was you started with a single-precision floating point value in the range [0, 1] and multiplied it by 255 and then forgot to round.

Thanks, you are right; I updated the images with rounded pixel values now.

@stanleybak
Copy link

stanleybak commented Jun 12, 2020

Hello @pat676

The image files looks good now. it looks like the expected classification file got deleted, although it's maybe easy enough to recreate since all images classify correctly.

@pat676
Copy link

pat676 commented Jun 12, 2020

Hello @pat676

The image files looks good now. it looks like the expected classification file got deleted, although it's maybe easy enough to recreate since all images classify correctly.

I recreated the labels file and uploaded it to the images folder.

@stanleybak
Copy link

For the images @pat676 , we want a plot of num images verified over time, right? so something like this:

mockup

What should the max timeout be? 60 secs per image or more? Should we use a log scale for the time values?

Also, for some of the parameters (say 0.1 epsilon) there may only be a few images that are unsafe... so maybe mixing safe and unsafe in the same graph is better in this case? We probably then want one image per network, and then one image that combines all the networks? Maybe one for all the unsafe images together? What do you think?

@pat676
Copy link

pat676 commented Jun 15, 2020

For the images @pat676 , we want a plot of num images verified over time, right? so something like this:

mockup

What should the max timeout be? 60 secs per image or more? Should we use a log scale for the time values?

Also, for some of the parameters (say 0.1 epsilon) there may only be a few images that are unsafe... so maybe mixing safe and unsafe in the same graph is better in this case? We probably then want one image per network, and then one image that combines all the networks? Maybe one for all the unsafe images together? What do you think?

Yes, the plot looks good @stanleybak .

I agree that plotting safe, unsafe and all epsilons separately is unnecessary; however, it may be better to merge different epsilon values into one plot than merging safe/ unsafe. The reason is that, in my experience, toolkits vary a lot on safe/ unsafe, but usually have a similar relative performance over different epsilon values.

So one for safe and one for unsafe for each network; resulting in a total of 6 plots for the three networks. One that combines all networks is also a good idea.

For the timeout, 60 seconds seems too short. Normally papers report at least 30 minutes; if possible we should do the same for the results to be meaningful.

@stanleybak
Copy link

For the timeout, 60 seconds seems too short. Normally papers report at least 30 minutes; if possible we should do the same for the results to be meaningful.

30 minutes is okay for some benchmarks, but my concern would be that the worst-case runtime would then be 30 minutes * 50 images * 3 epsilon values * 3 networks = 13500 minutes = 9 days! Is there some subset of the images / networks where 30 minutes is more interesting, and a shorter timeout could be used for the others? Understandably they won't all take the full 30 minutes, but especially the large networks with large epsilons they might mostly timeout.

@pat676
Copy link

pat676 commented Jun 16, 2020

Hi @stanleybak, I see the problem, how about these solutions:

  1. Change the epsilon values. I originally suggested 0.01, 0.03 and 0.05. The first is easy and mostly safe, the second is difficult and about 50/50 safe/unsafe and the last is mostly unsafe and easy for some toolkits, difficult for others. Instead, we could use 0.02 (Mostly safe but not as easy as 0.01) and 0.05.
  2. Reduce from 50 to 25 input images for each epsilon-network combination.
  3. Reduce the timeout to 15 minutes.

This should reduce the worst-case performance to about 1.5 days and I expect all toolkits to finish in less than 1 day.

Having different timeouts for different cases is, in my opinion, unfortunate as it may make the results more difficult to interpret.

@stanleybak
Copy link

Yes, that's much better. I think it would be good to have some that have longer timeouts too, but perhaps we can do that in future years.

@Neelanjana314
Copy link
Contributor

I sent an e-mail about this, but since you mentioned it here Taylor: there is a question about the intended precision of the ACAS Xu networks. The ONNX files use single-precision floating point, whereas the original Reluplex files are ascii text and include scaling terms that look like double-precision. For example, the scaling term "7.5188840201005975" would lose precision if stored in a single-precision float (the 32-bit version gets rounded to "7.518884181976318359375" compared with "7.5188840201005975316661533724982" for the 64 bit version).

This will slightly affect the output of the network, and maybe it doesn't matter for the properties or this iteration. I guess we could also ask @guykatzz what the intended precision for the networks should be.

Hi @stanleybak do we have any update on the precision?

@stanleybak
Copy link

I haven't seen anything, but certain layers are not supported by the onnx runtime, at least the one that installed for me by default. These ones include ReLU and Gemm layers, so that if I want to execute I need to use 32-bit floats. It probably doesn't matter for this iteration of the competition, as I don't think it changes any of the verification results.

@stanleybak
Copy link

stanleybak commented Jun 23, 2020

In terms of reporting the results, I assume we'll have some pdf like ARCH-COMP where we each write a section for our tool. With some small coordination we could also produce a combined table and graph that I think would be useful, even if we're all using slightly different computers this time around. I have scripts that can help with this, if desired. The input looks something like this, with one file per tool:

1_1	1	safe	210.341341724037193
1_2	1	safe	211.93920411390718
1_3	1	safe	232.04592358705122
1_4	1	safe	236.39814017806202
...

It's just a tab separated file with the network as the first entry, the property as the second, the result as the third (safe/unsafe/error/timeout/anything-else) and the runtime in seconds as the fourth.

The scripts make a .tex file that's a table that we can include in the document, as well as a .pdf from gnuplot that looks like this, which is similar to what we discussed earlier:

demo_graph

Does this output look okay. Is this something we want to do @ttj ? What were the plans for reporting?

@Neelanjana314
Copy link
Contributor

Hi @stanleybak, I see the problem, how about these solutions:

  1. Change the epsilon values. I originally suggested 0.01, 0.03 and 0.05. The first is easy and mostly safe, the second is difficult and about 50/50 safe/unsafe and the last is mostly unsafe and easy for some toolkits, difficult for others. Instead, we could use 0.02 (Mostly safe but not as easy as 0.01) and 0.05.

Hi @pat676 what will be the total tolerance bound ( i.e. UB~LB) in this case, epsilon or 2*epsilon?

@pat676
Copy link

pat676 commented Jun 27, 2020

Hi @stanleybak, I see the problem, how about these solutions:

  1. Change the epsilon values. I originally suggested 0.01, 0.03 and 0.05. The first is easy and mostly safe, the second is difficult and about 50/50 safe/unsafe and the last is mostly unsafe and easy for some toolkits, difficult for others. Instead, we could use 0.02 (Mostly safe but not as easy as 0.01) and 0.05.

Hi @pat676 what will be the total tolerance bound ( i.e. UB~LB) in this case, epsilon or 2*epsilon?

Hi @Neelanjana314,

Each pixel should be constrained to LB=pixel_value - eps and UB=pixel_value + eps and clipped to [0, 1]. So the python command (assuming that the pixels are in floating-point format) is:

lb = np.clip(img - eps, 0, 1)
ub = np.clip(img + eps, 0, 1)

@GgnDpSngh
Copy link

in terms of ACAS Xu, are the benchmarks suggested by @stanleybak final?

Cheers,

@stanleybak
Copy link

stanleybak commented Jul 2, 2020

@GgnDpSngh My understanding was there would be one category ACASXU-ALL that uses all 10 specifications / networks with a 5 minute timeout, and then a second category for ACASXU-HARD that uses the 10 instances I suggested with a longer 6 hour timeout. Is this correct @ttj ? Is there an updated timeline when we're supposed to upload dockerfiles and/or results?

@vtjeng
Copy link
Contributor

vtjeng commented Jul 9, 2020

@pat676 for your MNIST networks, I had to scale the input images by 1/255 (so that they are in the range [0, 1]) to get a correct classification on every sample for those networks correct. Do your networks include a scaling layer? (Also, presumably the epsilons are scaled in the range [0, 1]?)

@pat676
Copy link

pat676 commented Jul 9, 2020

@pat676 for your MNIST networks, I had to scale the input images by 1/255 (so that they are in the range [0, 1]) to get a correct classification on every sample for those networks correct. Do your networks include a scaling layer? (Also, presumably the epsilons are scaled in the range [0, 1]?)

Hi @vtjeng,

That is correct, the images should be scaled to [0, 1] and the epsilons are already scaled correctly. The network only includes fully-connected layers, not scaling layers.

@vtjeng
Copy link
Contributor

vtjeng commented Jul 11, 2020

@stanleybak, @ttj, I wanted to make sure that I was getting the conventions for the ACASXU networks correct. Here's what I've been assuming:

Input / Output

Order of variables is as follows:

input: [ρ, θ, ψ, v_own, v_int]
output: [COC, weak left, weak right, strong left, strong right]

Bounds on Input in Properties

In addition to the input constraints specified for the property, we also impose the input constraints for the network. For example, for Property 1:

Input constraints from property: ρ ≥ 55947.691, v_own ≥ 1145, v_int ≤ 60
Additional input constraints from network specification
ρ ≤ 60760.0
-3.141593 ≤ θ ≤ 3.141593
-3.141593 ≤ψ ≤ 3.141593
v_own ≤ 1200.0
0.0 ≤ v_int

@stanleybak
Copy link

The inputs / outputs look correct.

The lower / upper bounds for property 1 I used are:

lb = [55947.691, -3.141592, -3.141592, 1145, 0]
ub = [60760, 3.141592, 3.141592, 1200, 60]

Which looks like it matches. Also make sure you scale the inputs before passing them to the network. I mentioned this in an earlier comment on June 2. Here's the code I used for that

means_for_scaling = [19791.091, 0.0, 0.0, 650.0, 600.0, 7.5188840201005975]
range_for_scaling = [60261.0, 6.28318530718, 6.28318530718, 1100.0, 1200.0]

for i in range(5):
    input[i] = (input[i] - means_for_scaling[i]) / range_for_scaling[i]

The scaled ranges can be found in the marabou property files to double-check with what you get: https://github.com/NeuralNetworkVerification/Marabou/tree/master/resources/properties

@pat676 pat676 mentioned this issue Jul 13, 2020
@haithamkhedr
Copy link

Hello,

Is there an overleaf document to add our results? Also, when is the due date to submit final results?

Thanks

@stanleybak
Copy link

For ACAS-Xu, given there hasn't been a lot of discussion on which properties/networks to focus on, I would suggest using all of them, with timing, verification, etc. results to be generated as a table for each participant.

@ttj, just so everyone's on the same page, for ACASXU-ALL, this is intended to be a 5 minute timeout, using 45 networks for properties 1-4 (180 instances), as well as property 5-10 which are defined for one network each (6 more instances), for a total of 186 instances, correct? The original Reluplex paper omits some of the networks in properties 1-4, but then in the Marabou paper it looks like they analyzed all 45.

@GgnDpSngh
Copy link

so overall the benchmarks here are all ACASXU properties with a timeout of 5 min, and the three FCN from @pat676 with eps=0.02 and 0.05 with a timeout of 15 min?

Cheers,

@stanleybak
Copy link

yes. I suggested a longer timeout for the ACASXU-hard subset if needed. no longer than 6 hours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category-proposal Proposed category
Projects
None yet
Development

No branches or pull requests