-
Recently I am using PCMCI algorithm to discover causality for binary time series. Specifically, I chose a small datasets with T = 7000 and N = 30. The process configuration is the following.
However, it takes my Macbook nearly 3 hours to finish the discovery process. Any idea about the bottleneck and how to speed up the process? |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 7 replies
-
Hello Wang70880, (i) Have you looked at the mpi4py parallelization script for pcmci : (ii) Use version 5.x of Tigramite, ensure that you have Numba installed (Python JIT compilation) . (iii) If you also use Gaussian processes, ensure that Torch library and the Python module "Gpytorch" are installed and working with your GPU. See the "Optional packages depending on used functions" in the Tigramite [README.md] file (https://github.com/jakobrunge/tigramite/tree/developer#readme) Best regards, |
Beta Was this translation helpful? Give feedback.
-
Hi Nick, Thanks a lot for your advice. I will try it to see if it works. :-) |
Beta Was this translation helpful? Give feedback.
-
Note, when using the mpi4py script for PCMCI found that I had to specify
otherwise I could not plot the causal graph from the results. |
Beta Was this translation helpful? Give feedback.
-
I'm currently learning how to use this myself, and these notes may be helpful to others. Using prior knowledge to keep causal search tractable:NB the number of _links to be tested_ scales with the square of the number of variables. The case of 30 variables and 'ParCorr' independence tests, should be tractable on current computers, but other Tigramite users may have many more variables. The computational cost can be substantially reduced if you have prior knowledge that excludes some links. You can tell Tigramite functions to only consider certain links by providing "selected_links", a Python dict of possible edges per variable. The structure of the dict is:
e.g. this would be a fully-connected causal graph for 3 variables, and tau_min=1, tau_max=2:
Any elements that you can omit from this dict, will reduce the amount of computation required. Search the Tigramite documentation for functions that have a "selected_links" parameter. This technique is also used in the Tigramite parallelization scripts, to send different dicts of links to different MPI Ranks on different CPU cores, see : |
Beta Was this translation helpful? Give feedback.
-
Hi Nick, How do you use the numba speedup? Which modules (or functions) do you enforce the compliation? |
Beta Was this translation helpful? Give feedback.
-
It turns out that the usage of numba will be very limitted. Any incompatibilities (e.g., inference of data type) will result in fall back of numba to the "object mode". I test the speed and find that when working in object mode, the exeuction time increases about 10 ~ 20%. As a result, I don't suggest to use numba, unless you can identify functions which use numpy only. |
Beta Was this translation helpful? Give feedback.
-
Sorry for the late reply. We have had quite some improvements lately, does the issue still persist? |
Beta Was this translation helpful? Give feedback.
Hello Wang70880,
(i) Have you looked at the mpi4py parallelization script for pcmci :
https://github.com/jakobrunge/tigramite/blob/developer/run_pcmci_parallel.py ?
If you adapt this script, then you can use all the cores of your CPU
This requires an MPI library and its headers (e.g. Open MPI), plus the python module "mip4py" (available on either Conda or Pip).
(ii) Use version 5.x of Tigramite, ensure that you have Numba installed (Python JIT compilation) .
Compiled code is much faster than interpreted code.
(iii) If you also use Gaussian processes, ensure that Torch library and the Python module "Gpytorch" are installed and working with your GPU.
See the "Optional packages depending on …