You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Firstly, thank you for making this open-source. I've been trying to replicate your paper in order to build on it, but I've failed to train the model using the instructions in the README.
py-aiger-sat dependency fails to install due to this issue, regardless of the environment. But it doesn't seem to be necessary for training the model.
Conda Install
In a new conda environment, I installed TensorFlow using pip. It pulled the version 2.16.1, which is newer than the version specified in setup.py: tensorflow>=2.1.0.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ilker/deepltl/normal/deepltl/train/train_transformer.py", line 187, in <module>
run()
File "/home/ilker/deepltl/normal/deepltl/train/train_transformer.py", line 142, in run
model = transformer.create_model(vars(params), training=True, custom_pos_enc=params.tree_pos_enc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ilker/deepltl/normal/deepltl/models/transformer.py", line 31, in create_model
predictions, _ = transformer(transformer_inputs, training)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ilker/miniconda3/envs/spotltl/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 123, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/ilker/miniconda3/envs/spotltl/lib/python3.11/site-packages/keras/src/layers/layer.py", line 723, in __call__
raise ValueError(
ValueError: Only input tensors may be passed as positional arguments. The following argument value should be passed as a keyword argument: True (of type <class 'bool'>)
I fixed this issue by passing all training and cache arguments as keyword arguments. Then I encountered this issue:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ilker/deepltl/normal/deepltl/train/train_transformer.py", line 187, in <module>
run()
File "/home/ilker/deepltl/normal/deepltl/train/train_transformer.py", line 142, in run
model = transformer.create_model(vars(params), training=True, custom_pos_enc=params.tree_pos_enc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ilker/deepltl/normal/deepltl/models/transformer.py", line 32, in create_model
predictions = TransformerMetricsLayer(params)([predictions, target])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ilker/miniconda3/envs/spotltl/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 123, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/ilker/deepltl/normal/deepltl/models/transformer.py", line 80, in call
self.add_metric(accuracy)
TypeError: Exception encountered when calling TransformerMetricsLayer.call().
Layer.add_metric() takes 1 positional argument but 2 were given
Arguments received by TransformerMetricsLayer.call():
• args=(['<KerasTensor shape=(None, None, 16), dtype=float32, sparse=False, name=keras_tensor_67>', '<KerasTensor shape=(None, None), dtype=int32, sparse=None, name=target>'],)
• kwargs=<class 'inspect._empty'>
At this point, I gave up trying to run the model with the new TensorFlow version. However, TensorFlow 2.1.0 is not available on pip anymore. Therefore, I installed the official docker image of that TensorFlow version.
Docker
Here are the commands I executed:
docker pull tensorflow/tensorflow:2.1.0-gpu-py3-jupyter
# For using GPU in docker:
sudo apt install nvidia-container-toolkit
docker run -u $(id -u):$(id -g) -it --mount type=bind,source=.,target=/tf/deepltl --gpus=all tensorflow/tensorflow:2.1.0-gpu-py3-jupyter bash
# Run inside docker:
python -m deepltl.train.train_transformer --problem='ltl' --ds-name='ltl-35' --epochs=5
It started training, but at the end of the first epoch, it gave an error:
2024-03-12 07:50:20.147556: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_NOT_SUPPORTED
2024-03-12 07:50:20.147617: E tensorflow/stream_executor/cuda/cuda_blas.cc:2301] Internal: failed BLAS call, see log for details
2024-03-12 07:50:20.147665: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Blas xGEMMBatched launch failed : a.shape=[400,35,32], b.shape=[400,35,3
2], m=35, n=35, k=32, batch_size=400
[[{{node model/transformer/transformer_encoder/transformer_encoder_layer/multi_head_attention/MatMul}}]]
[[Reshape_640/_568]]
2024-03-12 07:50:20.147711: F tensorflow/core/common_runtime/gpu/gpu_util.cc:291] GPU->CPU Memcpy failed
2024-03-12 07:50:20.147792: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Blas xGEMMBatched launch failed : a.shape=[400,35,32], b.shape=[400,35,3
2], m=35, n=35, k=32, batch_size=400
[[{{node model/transformer/transformer_encoder/transformer_encoder_layer/multi_head_attention/MatMul}}]]
Aborted (core dumped)
I couldn't find anything useful about this error on the internet.
I don't know whether you want to maintain this repository or not, but I would appreciate it if you could update the repository for the new TensorFlow version or provide instructions on how to run it with the old version.
If you don't want to do that, TensorBoard logs of your training runs would be useful for me as well. I'm trying to port the code to PyTorch (UPDATE: My PyTorch port is available here), and I would like to compare loss and accuracy values to make sure that my implementation is correct.
Thanks in advance.
The text was updated successfully, but these errors were encountered:
Firstly, thank you for making this open-source. I've been trying to replicate your paper in order to build on it, but I've failed to train the model using the instructions in the README.
py-aiger-sat
dependency fails to install due to this issue, regardless of the environment. But it doesn't seem to be necessary for training the model.Conda Install
In a new conda environment, I installed TensorFlow using pip. It pulled the version 2.16.1, which is newer than the version specified in
setup.py
:tensorflow>=2.1.0
.I fixed this issue by passing all
training
andcache
arguments as keyword arguments. Then I encountered this issue:At this point, I gave up trying to run the model with the new TensorFlow version. However, TensorFlow 2.1.0 is not available on pip anymore. Therefore, I installed the official docker image of that TensorFlow version.
Docker
Here are the commands I executed:
It started training, but at the end of the first epoch, it gave an error:
I couldn't find anything useful about this error on the internet.
I don't know whether you want to maintain this repository or not, but I would appreciate it if you could update the repository for the new TensorFlow version or provide instructions on how to run it with the old version.
If you don't want to do that, TensorBoard logs of your training runs would be useful for me as well. I'm trying to port the code to PyTorch (UPDATE: My PyTorch port is available here), and I would like to compare loss and accuracy values to make sure that my implementation is correct.
Thanks in advance.
The text was updated successfully, but these errors were encountered: