Category Proposal: Piecewise Linear Activations (ReLU) #2

ttj · 2020-04-06T18:14:14Z

Networks consisting of piecewise linear activations such as ReLUs

Representative benchmark: ACAS-Xu

Benchmark details: see appendix section 6: https://arxiv.org/abs/1702.01135

Specifications: safety (output lies in specific range based on a set of inputs)

Questions: keep to only ReLUs, or allow other linear and piecewise linear activations?

stanleybak · 2020-04-07T11:04:27Z

Should the specifications be a single set? Some specs like "Clear of conflict is minimal" requires encoding as a disjunction (unsafe if left > clear or right > clear, ect.), which I'm not sure all tools will support. Perhaps it makes sense to provide both if not all tools can handle disjunctions.

alessiolomuscio · 2020-04-08T16:23:47Z

Benchmarks: ACAS is low dimensional. We also need some high dimensional benchmarks.

Activations: I think just ReLUs is sufficient.

Specifications: safety is good but I think we ought to have local robustness here as well.

ttj · 2020-04-08T21:23:26Z

Should the specifications be a single set? Some specs like "Clear of conflict is minimal" requires encoding as a disjunction (unsafe if left > clear or right > clear, ect.), which I'm not sure all tools will support. Perhaps it makes sense to provide both if not all tools can handle disjunctions.

Thanks for the feedback, yes, we will need to handle this, e.g., by providing a list of the sets in the disjunction, then tools that don't support disjuncts can check each of these in e.g. multiple calls, whereas tools that do internally support disjuncts can handle them however they do.

ttj · 2020-04-08T21:27:06Z

Benchmarks: ACAS is low dimensional. We also need some high dimensional benchmarks.

Activations: I think just ReLUs is sufficient.

Specifications: safety is good but I think we ought to have local robustness here as well.

Thanks for the feedback. While I agree the ACAS networks are small, if one considers all the properties, the full range of input sets, and all the networks, they are nontrivial to analyze (e.g., I think at least still hours of analysis for state-of-the-art).

We're certainly welcome to other benchmarks if there are suggestions for larger ones.

What I was thinking (but welcome to revision) is the CNN category would cover the larger networks and robustness specifications (and explicitly covering convolutional layers), but if there are networks with only ReLUs for MNIST/CIFAR/etc., we could probably add those as benchmarks for this category if that is what you're thinking.

stanleybak · 2020-04-08T21:41:21Z

Yes, I think there are some MNIST networks with RELU / fully connected layers. Local robustness could make sense here.

alessiolomuscio · 2020-04-14T10:58:18Z

Benchmarks: ACAS is low dimensional. We also need some high dimensional benchmarks.
Activations: I think just ReLUs is sufficient.
Specifications: safety is good but I think we ought to have local robustness here as well.

Thanks for the feedback. While I agree the ACAS networks are small, if one considers all the properties, the full range of input sets, and all the networks, they are nontrivial to analyze (e.g., I think at least still hours of analysis for state-of-the-art).

We're certainly welcome to other benchmarks if there are suggestions for larger ones.

What I was thinking (but welcome to revision) is the CNN category would cover the larger networks and robustness specifications (and explicitly covering convolutional layers), but if there are networks with only ReLUs for MNIST/CIFAR/etc., we could probably add those as benchmarks for this category if that is what you're thinking.

To my mind the architectures and the activations are orthogonal to the size of the models. Indeed I would have thought we certainly wanted to have a set of models in each category ranging from small/manageable to large/undoable.

In addition I do think we need to cover extensively models with high dimensional inputs, including in a ReLU category. This is what the AI/ML community use and I think we want to reach out to them and show that our benchmarks cover the sort of models they themselves use (at least in terms of architecture, if not quite of size just yet) and not just case studies normally explored in verification.

ttj · 2020-04-17T02:54:33Z

To my mind the architectures and the activations are orthogonal to the size of the models. Indeed I would have thought we certainly wanted to have a set of models in each category ranging from small/manageable to large/undoable.

In addition I do think we need to cover extensively models with high dimensional inputs, including in a ReLU category. This is what the AI/ML community use and I think we want to reach out to them and show that our benchmarks cover the sort of models they themselves use (at least in terms of architecture, if not quite of size just yet) and not just case studies normally explored in verification.

Yes, these are good points, and I agree. I had not really thought of having classification without the convolutional layers, which is why I primarily had put ACAS-Xu as the representative benchmark, but assuming that's possible / architectures can be provided, then certainly we could have parameterized examples scaling the size/architecture of the networks within this category as well, for e.g. MNIST and potentially other data sets.

With respect to high-dimensional inputs, I presume you are thinking of images that would be covered with e.g. MNIST/other data sets in this category, but please let us know if you have other examples in mind that might make appropriate benchmarks.

alessiolomuscio · 2020-04-20T13:24:54Z

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.

Some existing tools may not support all the benchmarks, but I think that's fine.

souradeep-111 · 2020-04-20T15:47:41Z

This is to follow up on Taylor's request.
I would like to sign up for this category with Sherlock.
https://github.com/souradeep-111/sherlock

ttj · 2020-04-20T17:44:53Z

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.

Some existing tools may not support all the benchmarks, but I think that's fine.

Thanks for the feedback! Let's see what other feedback comes in as people hopefully start to reply with what categories they're interested to see if this makes sense based on level of participation, or whether keeping them split is ideal.

stanleybak · 2020-04-20T20:20:18Z

I'd like the nnenum tool to participate in this category. Hopefully we can get a few benchmarks in addition to ACASXu that are higher input dimension, but not just vision ones.

changliuliu · 2020-04-22T01:26:25Z

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.
Some existing tools may not support all the benchmarks, but I think that's fine.

Thanks for the feedback! Let's see what other feedback comes in as people hopefully start to reply with what categories they're interested to see if this makes sense based on level of participation, or whether keeping them split is ideal.

I agree with Alessio's suggestion. If we are going to bring big models (e.g., MNIST) to this category, it makes sense to reduce to two categories regarding network structure: piecewise linear (including ReLU, convolution, max pooling, etc) and smooth nonlinear (sigmoid and tanh). Then orthogonally for each category, we can have big models (e.g., MNIST) and small models (e.g., ACAS). We may see how other tool authors will respond.

vtjeng · 2020-04-24T23:10:44Z

So maybe we ought to have 2 main categories only: ReLU and sigmoids. The present CNN could sit as a subcategory of the ReLU one. In the ReLU category we could have high (vision, classifiers) and low (Acas, ...) dimensionality inputs, some fully connected, some convolutional, etc.
Some existing tools may not support all the benchmarks, but I think that's fine.

Thanks for the feedback! Let's see what other feedback comes in as people hopefully start to reply with what categories they're interested to see if this makes sense based on level of participation, or whether keeping them split is ideal.

I agree with Alessio's suggestion. If we are going to bring big models (e.g., MNIST) to this category, it makes sense to reduce to two categories regarding network structure: piecewise linear (including ReLU, convolution, max pooling, etc) and smooth nonlinear (sigmoid and tanh). Then orthogonally for each category, we can have big models (e.g., MNIST) and small models (e.g., ACAS). We may see how other tool authors will respond.

I'm generally on board with Alessio and Changliu's suggestions, and I like what @changliuliu pointed out about orthogonality. To expand a bit more on it, here are four orthogonal properties of categories that I've identified.

Type of non-linearity used: piecewise linear (ReLu, max pooling) vs. non-piecewise linear (sigmoid, tanh)
Type of linear layers used: fully-connected layers, convolutional layers, average pooling and skip connections are all linear transforms (but not every tool might support a particular linear layer natively).
Input dataset, which affects the dimensionality of the input and the model complexity required to obtain non-trivial results: CIFAR10, MNIST, ACAS, ...
Specifications: safety (as defined above), local robustness. (I'm sure I'm missing something else here).

Note that I've distinguished the type of non-linearity used from the type of linear layers used in the network structure, since convolutional layers can co-exist with non-piecewise linear activations (e.g. LeNet is tanh+convolution)

We don't have to have a category covering every possible combination of properties (non-linearity, input dataset, specification), but would just what they are for each category.

To avoid an explosion in the number of categories, rather than having additional categories for 2, we could just note in the overall summary what each tool is able to support natively.

vtjeng · 2020-04-24T23:50:29Z

Seperately: I'd like to enter the MIPVerify tool for this category as it is currently described.

More details on the tool:

Non-linearities supported: network must be piecewise linear; ReLU and max units are supported, but tanh and sigmoid layers are not.
Linear layers natively supported: fully connected layers, convolution layers, skip connections (as used in ResNets)
Requirements on specifications: The tool is currently geared towards verifying local robustness, but I intend to add support for ACAS-Xu style safety specifications as that has been a highly requested feature.
Input format
- Networks: The "native" input format for the tool is a .mat file containing a dictionary with all the parameters. I currently also have a script that can convert saved PyTorch models into the .mat input format.
- Specifications: Since I currently only verify local robustness, the specification is just the size of the L_infinity robustness radius that is to be verified.

GgnDpSngh · 2020-04-27T14:31:23Z

I would like to enter our tool ERAN in this category:
https://github.com/eth-sri/eran

Cheers,
Gagandeep Singh

guykatzz · 2020-04-27T17:57:59Z

Hi,
Please add Marabou for this category, too.

Regards,
Guy

FlorianJaeckle · 2020-04-28T13:44:55Z

The oval group would like to sign up for this category. Our work will be based on the methods presented in the journal paper `Branch and Bound for Piecewise Linear Neural Network Verification’ and developments thereof.

Cheers,
Florian Jaeckle

ttj · 2020-05-02T13:18:31Z

Piecewise/ReLU Category Participants:

Oval
Marabou
ERAN
MIPVerify
nnenum
NNV
Sherlock

If anyone else plans to join this category, please add a comment soon, as the participants in this category need to decide on the benchmarks soon (by about May 15).

haithamkhedr · 2020-05-02T17:40:33Z

Hi,
Please add PeregeriNN for this category. We support fully connected layers with ReLU activations.

Regards,
Haitham

pat676 · 2020-05-04T19:45:33Z

Hi,

We would like to enter VeriNet https://vas.doc.ic.ac.uk/software/neural/.

The toolkit supports:

Local robustness properties.
Fully connected and convolutional layers.
ReLU, Sigmoid and Tanh activation functions.

Regards,
Patrick Henriksen

ttj · 2020-05-08T21:25:13Z

Finalized Piecewise/ReLU Category Participants:

Oval
Marabou
ERAN
MIPVerify
nnenum
NNV
Sherlock
VeriNet
PeregeriNN
Venus

For benchmarks, as indicated earlier, we will decide as a group the benchmarks, and anyone may propose some. For this category by default to move things along, I suggest we use the ACAS-Xu networks as one benchmark set, and decide which of the networks and properties, if not all of them.

There has been earlier discussion regarding some classification networks in this category as well. If anyone has some and can provide details (e.g., links to models) that would fall into this category and not the CNN category (e.g., without convolutional layers), please speak up. Any other benchmarks are of course welcome, but details need to be provided soon (~May 15) so we may finalize things to give at least a month or so for everyone to look at them.

If we consider classification, as in the CNN category, there is a lot of parameterization (on the network architecture, the data set, the specification, etc.), so those are the details we will want to discuss next and finalize, as well as how all of these aspects will be specified to ease interoperability between the tools.

pkouvaros · 2020-05-09T10:12:49Z

Hi

We also like to enter Venus (https://vas.doc.ic.ac.uk/software/neural/) in this category.

Best

Panagiotis

haithamkhedr · 2020-05-09T19:32:55Z

I suggest we also consider adversarial robustness for MNIST fully connected NN with ReLU activations. Some MNIST trained models can be found here

stanleybak · 2020-05-09T20:38:05Z

For ACAS Xu, I suggest a subcategory of the following 10 set of networks / benchmarks I found to be difficult.

ACASXU-HARD:

net 4-6, prop 1
net 4-8, prop 1
net 3-3, prop 2
net 4-2, prop 2
net 4-9, prop 2
net 5-3, prop 2
net 3-6, prop 3
net 5-1, prop 3
net 1-9, prop 7
net 3-3, prop 9

edit: fixed type in prop 7 to net 1-9... not 1-7

pat676 · 2020-05-12T10:18:52Z

I agree with haithamkhedr in that we should consider adversarial robustness benchmarks for MNIST networks; however, we should also consider networks of different depths as som toolkits seem to handle deeper networks better than others. I can provide 3 MNIST fully connected networks with 2, 4 and 6 fully-connected layers and 256 nodes in each layer.

ttj · 2020-05-12T14:09:21Z

I agree with haithamkhedr in that we should consider adversarial robustness benchmarks for MNIST networks; however, we should also consider networks of different depths as som toolkits seem to handle deeper networks better than others. I can provide 3 MNIST fully connected networks with 2, 4 and 6 fully-connected layers and 256 nodes in each layer.

Thanks! Can you please provide links so participants can check them out?

pat676 · 2020-06-09T16:39:19Z

I can provide 3 MNIST fully connected networks with 2, 4 and 6 fully-connected layers and 256 nodes in each layer.

When I load the networks from @pat676 I see some strange structure at the start of the network before the fully connected layers:

In particular, it looks like the start of the computation is node "23"... what's going on before that?

Hi @stanleybak,

It looks like the part of our code that was designed to handle non-standard input dimensions, I didn't expect this to show in the ONNX model.

I've modified the code to get rid of most of these operations; the only thing left is one flatten operation as I was told to use inputs of dim (1, 784, 1); however, the PyTorch model requires inputs on the form (1, 784).

Let me know if this looks okay to you.

stanleybak · 2020-06-09T21:19:02Z

Let me know if this looks okay to you.

Yes the network is great now @pat676.

One more weird thing that doesn't really need to be fixed is that the input images look like they want to be lists of integers, but instead they are floating point values close to integers. For example, in image1 you have entries like:
84.00000259280205,185.00000417232513,159.0000057220459

I'm not sure if this was on purpose. My guess was you started with a single-precision floating point value in the range [0, 1] and multiplied it by 255 and then forgot to round.

pat676 · 2020-06-10T10:01:22Z

Let me know if this looks okay to you.

Yes the network is great now @pat676.

One more weird thing that doesn't really need to be fixed is that the input images look like they want to be lists of integers, but instead they are floating point values close to integers. For example, in image1 you have entries like:
84.00000259280205,185.00000417232513,159.0000057220459

I'm not sure if this was on purpose. My guess was you started with a single-precision floating point value in the range [0, 1] and multiplied it by 255 and then forgot to round.

Thanks, you are right; I updated the images with rounded pixel values now.

stanleybak · 2020-06-12T21:23:15Z

Hello @pat676

The image files looks good now. it looks like the expected classification file got deleted, although it's maybe easy enough to recreate since all images classify correctly.

pat676 · 2020-06-12T21:29:10Z

Hello @pat676

The image files looks good now. it looks like the expected classification file got deleted, although it's maybe easy enough to recreate since all images classify correctly.

I recreated the labels file and uploaded it to the images folder.

stanleybak · 2020-06-13T16:03:29Z

For the images @pat676 , we want a plot of num images verified over time, right? so something like this:

What should the max timeout be? 60 secs per image or more? Should we use a log scale for the time values?

Also, for some of the parameters (say 0.1 epsilon) there may only be a few images that are unsafe... so maybe mixing safe and unsafe in the same graph is better in this case? We probably then want one image per network, and then one image that combines all the networks? Maybe one for all the unsafe images together? What do you think?

pat676 · 2020-06-15T19:44:38Z

For the images @pat676 , we want a plot of num images verified over time, right? so something like this:

What should the max timeout be? 60 secs per image or more? Should we use a log scale for the time values?

Also, for some of the parameters (say 0.1 epsilon) there may only be a few images that are unsafe... so maybe mixing safe and unsafe in the same graph is better in this case? We probably then want one image per network, and then one image that combines all the networks? Maybe one for all the unsafe images together? What do you think?

Yes, the plot looks good @stanleybak .

I agree that plotting safe, unsafe and all epsilons separately is unnecessary; however, it may be better to merge different epsilon values into one plot than merging safe/ unsafe. The reason is that, in my experience, toolkits vary a lot on safe/ unsafe, but usually have a similar relative performance over different epsilon values.

So one for safe and one for unsafe for each network; resulting in a total of 6 plots for the three networks. One that combines all networks is also a good idea.

For the timeout, 60 seconds seems too short. Normally papers report at least 30 minutes; if possible we should do the same for the results to be meaningful.

stanleybak · 2020-06-15T20:58:10Z

For the timeout, 60 seconds seems too short. Normally papers report at least 30 minutes; if possible we should do the same for the results to be meaningful.

30 minutes is okay for some benchmarks, but my concern would be that the worst-case runtime would then be 30 minutes * 50 images * 3 epsilon values * 3 networks = 13500 minutes = 9 days! Is there some subset of the images / networks where 30 minutes is more interesting, and a shorter timeout could be used for the others? Understandably they won't all take the full 30 minutes, but especially the large networks with large epsilons they might mostly timeout.

pat676 · 2020-06-16T09:45:14Z

Hi @stanleybak, I see the problem, how about these solutions:

Change the epsilon values. I originally suggested 0.01, 0.03 and 0.05. The first is easy and mostly safe, the second is difficult and about 50/50 safe/unsafe and the last is mostly unsafe and easy for some toolkits, difficult for others. Instead, we could use 0.02 (Mostly safe but not as easy as 0.01) and 0.05.
Reduce from 50 to 25 input images for each epsilon-network combination.
Reduce the timeout to 15 minutes.

This should reduce the worst-case performance to about 1.5 days and I expect all toolkits to finish in less than 1 day.

Having different timeouts for different cases is, in my opinion, unfortunate as it may make the results more difficult to interpret.

stanleybak · 2020-06-16T14:34:12Z

Yes, that's much better. I think it would be good to have some that have longer timeouts too, but perhaps we can do that in future years.

Neelanjana314 · 2020-06-23T18:34:47Z

I sent an e-mail about this, but since you mentioned it here Taylor: there is a question about the intended precision of the ACAS Xu networks. The ONNX files use single-precision floating point, whereas the original Reluplex files are ascii text and include scaling terms that look like double-precision. For example, the scaling term "7.5188840201005975" would lose precision if stored in a single-precision float (the 32-bit version gets rounded to "7.518884181976318359375" compared with "7.5188840201005975316661533724982" for the 64 bit version).

This will slightly affect the output of the network, and maybe it doesn't matter for the properties or this iteration. I guess we could also ask @guykatzz what the intended precision for the networks should be.

Hi @stanleybak do we have any update on the precision?

stanleybak · 2020-06-23T18:40:02Z

I haven't seen anything, but certain layers are not supported by the onnx runtime, at least the one that installed for me by default. These ones include ReLU and Gemm layers, so that if I want to execute I need to use 32-bit floats. It probably doesn't matter for this iteration of the competition, as I don't think it changes any of the verification results.

stanleybak · 2020-06-23T22:04:51Z

In terms of reporting the results, I assume we'll have some pdf like ARCH-COMP where we each write a section for our tool. With some small coordination we could also produce a combined table and graph that I think would be useful, even if we're all using slightly different computers this time around. I have scripts that can help with this, if desired. The input looks something like this, with one file per tool:

1_1	1	safe	210.341341724037193
1_2	1	safe	211.93920411390718
1_3	1	safe	232.04592358705122
1_4	1	safe	236.39814017806202
...

It's just a tab separated file with the network as the first entry, the property as the second, the result as the third (safe/unsafe/error/timeout/anything-else) and the runtime in seconds as the fourth.

The scripts make a .tex file that's a table that we can include in the document, as well as a .pdf from gnuplot that looks like this, which is similar to what we discussed earlier:

Does this output look okay. Is this something we want to do @ttj ? What were the plans for reporting?

Neelanjana314 · 2020-06-26T16:28:28Z

Hi @stanleybak, I see the problem, how about these solutions:

Change the epsilon values. I originally suggested 0.01, 0.03 and 0.05. The first is easy and mostly safe, the second is difficult and about 50/50 safe/unsafe and the last is mostly unsafe and easy for some toolkits, difficult for others. Instead, we could use 0.02 (Mostly safe but not as easy as 0.01) and 0.05.

Hi @pat676 what will be the total tolerance bound ( i.e. UB~LB) in this case, epsilon or 2*epsilon?

pat676 · 2020-06-27T14:07:12Z

Hi @stanleybak, I see the problem, how about these solutions:

Change the epsilon values. I originally suggested 0.01, 0.03 and 0.05. The first is easy and mostly safe, the second is difficult and about 50/50 safe/unsafe and the last is mostly unsafe and easy for some toolkits, difficult for others. Instead, we could use 0.02 (Mostly safe but not as easy as 0.01) and 0.05.

Hi @pat676 what will be the total tolerance bound ( i.e. UB~LB) in this case, epsilon or 2*epsilon?

Hi @Neelanjana314,

Each pixel should be constrained to LB=pixel_value - eps and UB=pixel_value + eps and clipped to [0, 1]. So the python command (assuming that the pixels are in floating-point format) is:

lb = np.clip(img - eps, 0, 1)
ub = np.clip(img + eps, 0, 1)

GgnDpSngh · 2020-07-02T10:35:28Z

in terms of ACAS Xu, are the benchmarks suggested by @stanleybak final?

Cheers,

stanleybak · 2020-07-02T19:46:40Z

@GgnDpSngh My understanding was there would be one category ACASXU-ALL that uses all 10 specifications / networks with a 5 minute timeout, and then a second category for ACASXU-HARD that uses the 10 instances I suggested with a longer 6 hour timeout. Is this correct @ttj ? Is there an updated timeline when we're supposed to upload dockerfiles and/or results?

vtjeng · 2020-07-09T03:00:32Z

@pat676 for your MNIST networks, I had to scale the input images by 1/255 (so that they are in the range [0, 1]) to get a correct classification on every sample for those networks correct. Do your networks include a scaling layer? (Also, presumably the epsilons are scaled in the range [0, 1]?)

pat676 · 2020-07-09T08:53:38Z

@pat676 for your MNIST networks, I had to scale the input images by 1/255 (so that they are in the range [0, 1]) to get a correct classification on every sample for those networks correct. Do your networks include a scaling layer? (Also, presumably the epsilons are scaled in the range [0, 1]?)

Hi @vtjeng,

That is correct, the images should be scaled to [0, 1] and the epsilons are already scaled correctly. The network only includes fully-connected layers, not scaling layers.

vtjeng · 2020-07-11T01:13:07Z

@stanleybak, @ttj, I wanted to make sure that I was getting the conventions for the ACASXU networks correct. Here's what I've been assuming:

Input / Output

Order of variables is as follows:

input: [ρ, θ, ψ, v_own, v_int]
output: [COC, weak left, weak right, strong left, strong right]

Bounds on Input in Properties

In addition to the input constraints specified for the property, we also impose the input constraints for the network. For example, for Property 1:

Input constraints from property: ρ ≥ 55947.691, v_own ≥ 1145, v_int ≤ 60

Additional input constraints from network specification
ρ ≤ 60760.0
-3.141593 ≤ θ ≤ 3.141593
-3.141593 ≤ψ ≤ 3.141593
v_own ≤ 1200.0
0.0 ≤ v_int

stanleybak · 2020-07-11T02:07:39Z

The inputs / outputs look correct.

The lower / upper bounds for property 1 I used are:

lb = [55947.691, -3.141592, -3.141592, 1145, 0]
ub = [60760, 3.141592, 3.141592, 1200, 60]

Which looks like it matches. Also make sure you scale the inputs before passing them to the network. I mentioned this in an earlier comment on June 2. Here's the code I used for that

means_for_scaling = [19791.091, 0.0, 0.0, 650.0, 600.0, 7.5188840201005975]
range_for_scaling = [60261.0, 6.28318530718, 6.28318530718, 1100.0, 1200.0]

for i in range(5):
    input[i] = (input[i] - means_for_scaling[i]) / range_for_scaling[i]

The scaled ranges can be found in the marabou property files to double-check with what you get: https://github.com/NeuralNetworkVerification/Marabou/tree/master/resources/properties

haithamkhedr · 2020-07-13T15:55:44Z

Hello,

Is there an overleaf document to add our results? Also, when is the due date to submit final results?

Thanks

stanleybak · 2020-07-16T13:45:27Z

For ACAS-Xu, given there hasn't been a lot of discussion on which properties/networks to focus on, I would suggest using all of them, with timing, verification, etc. results to be generated as a table for each participant.

@ttj, just so everyone's on the same page, for ACASXU-ALL, this is intended to be a 5 minute timeout, using 45 networks for properties 1-4 (180 instances), as well as property 5-10 which are defined for one network each (6 more instances), for a total of 186 instances, correct? The original Reluplex paper omits some of the networks in properties 1-4, but then in the Marabou paper it looks like they analyzed all 45.

GgnDpSngh · 2020-07-17T12:15:57Z

so overall the benchmarks here are all ACASXU properties with a timeout of 5 min, and the three FCN from @pat676 with eps=0.02 and 0.05 with a timeout of 15 min?

Cheers,

stanleybak · 2020-07-17T13:32:54Z

yes. I suggested a longer timeout for the ACASXU-hard subset if needed. no longer than 6 hours

ttj added the category-proposal Proposed category label Apr 6, 2020

ttj changed the title ~~Category Proposal: Piecewise Linear (ReLU)~~ Category Proposal: Piecewise Linear Activations (ReLU) Apr 6, 2020

ttj mentioned this issue Apr 17, 2020

Category Feedback and Select Categories for your Tool/Method #4

Closed

vtjeng mentioned this issue Apr 24, 2020

Category Proposal: Convolutional Neural Networks #3

Open

pat676 mentioned this issue Jul 13, 2020

Submission #8

Open

vtjeng mentioned this issue Jul 13, 2020

Dates and Deadlines #5

Open

ttj closed this as completed in ecea2a7 Jul 20, 2020

stanleybak mentioned this issue Apr 1, 2021

Benchmarks Discussion stanleybak/vnncomp2021#2

Open

Category Proposal: Piecewise Linear Activations (ReLU) #2

Category Proposal: Piecewise Linear Activations (ReLU) #2

Comments

ttj commented Apr 6, 2020 • edited Loading

stanleybak commented Apr 7, 2020

alessiolomuscio commented Apr 8, 2020

ttj commented Apr 8, 2020

ttj commented Apr 8, 2020

stanleybak commented Apr 8, 2020

alessiolomuscio commented Apr 14, 2020 • edited Loading

ttj commented Apr 17, 2020

alessiolomuscio commented Apr 20, 2020

souradeep-111 commented Apr 20, 2020

ttj commented Apr 20, 2020

stanleybak commented Apr 20, 2020

changliuliu commented Apr 22, 2020

vtjeng commented Apr 24, 2020 • edited Loading

vtjeng commented Apr 24, 2020 • edited Loading

GgnDpSngh commented Apr 27, 2020

guykatzz commented Apr 27, 2020

FlorianJaeckle commented Apr 28, 2020

ttj commented May 2, 2020

haithamkhedr commented May 2, 2020 • edited Loading

pat676 commented May 4, 2020 • edited Loading

ttj commented May 8, 2020 • edited Loading

pkouvaros commented May 9, 2020

haithamkhedr commented May 9, 2020

stanleybak commented May 9, 2020 • edited Loading

pat676 commented May 12, 2020

ttj commented May 12, 2020

pat676 commented Jun 9, 2020

stanleybak commented Jun 9, 2020 • edited Loading

pat676 commented Jun 10, 2020 • edited Loading

stanleybak commented Jun 12, 2020 • edited Loading

pat676 commented Jun 12, 2020

stanleybak commented Jun 13, 2020

pat676 commented Jun 15, 2020

stanleybak commented Jun 15, 2020

pat676 commented Jun 16, 2020

stanleybak commented Jun 16, 2020

Neelanjana314 commented Jun 23, 2020

stanleybak commented Jun 23, 2020

stanleybak commented Jun 23, 2020 • edited Loading

Neelanjana314 commented Jun 26, 2020

pat676 commented Jun 27, 2020

GgnDpSngh commented Jul 2, 2020

stanleybak commented Jul 2, 2020 • edited Loading

vtjeng commented Jul 9, 2020

pat676 commented Jul 9, 2020

vtjeng commented Jul 11, 2020

Input / Output

Bounds on Input in Properties

stanleybak commented Jul 11, 2020

haithamkhedr commented Jul 13, 2020

stanleybak commented Jul 16, 2020

GgnDpSngh commented Jul 17, 2020

stanleybak commented Jul 17, 2020

ttj commented Apr 6, 2020 •

edited

Loading

alessiolomuscio commented Apr 14, 2020 •

edited

Loading

vtjeng commented Apr 24, 2020 •

edited

Loading

vtjeng commented Apr 24, 2020 •

edited

Loading

haithamkhedr commented May 2, 2020 •

edited

Loading

pat676 commented May 4, 2020 •

edited

Loading

ttj commented May 8, 2020 •

edited

Loading

stanleybak commented May 9, 2020 •

edited

Loading

stanleybak commented Jun 9, 2020 •

edited

Loading

pat676 commented Jun 10, 2020 •

edited

Loading

stanleybak commented Jun 12, 2020 •

edited

Loading

stanleybak commented Jun 23, 2020 •

edited

Loading

stanleybak commented Jul 2, 2020 •

edited

Loading