-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computation speed up via parallelization #54
Computation speed up via parallelization #54
Conversation
numpy.random might be much faster than random.random, this should be tried --> this is the case if we can generate a lot of numbers to be used then, but not if we have to iterate and adapt the free windows of time available for switch on events |
67c3268
to
e423f85
Compare
@FLomb @mohammadamint - it seems to me a lot of time is wasted picking switch_on wrongly (ie the remaining window size is smaller than the time the appliance should be on and the switch_on must be resampled). For an appliance we know the windows of available times before we enter the while loop. Each time indexes of the switch_on event are computed and if they fit within the window, there are assigned, thus reducing the range of "free_spots". By simply keeping track of the start and stop times of the free_spots we can easily know in advance that a switch on event will not be possible within a remaining free spot (for example if the minimal on time of the appliance is 100 and there is no range within random_window1, it is pointless to even pick the switch on even within this window. The algorithm rules it out, but only after having already calculated the indexes which will not be used. |
What you suggest makes a lot of sense to me. I am convinced that the greatest margins for improving the code lie in these little details that were not done efficiently back then |
Good, I didn't want to start implementing it before you had your say on it, maybe the code is like it is because of some other reason and I want to avoid breaking anything :) I will hopefully come up with more speed improvements before our next meeting ^^ |
9bb1d93
to
d1b8692
Compare
Thes are the profiling before and after
|
d1b8692
to
44c1ea7
Compare
@FLomb - I would like to merge this PR before we merge |
5265e89
to
7b311b7
Compare
Change profiling of 1000s profiles from 2.657s --> 1.965s
As ma.masked_greater_equal returns a masked array from an unmasked one there was no need to provide a masked one as input argument Change profiling of 1000s profiles from 1.241s --> 1.055s
Change profiling of 1000s profiles from 1.055 --> 0.947s
The operation of ma.notmasked_contiguous on daily_use_masked takes quite some time and can be replaced keeping a list of available ranges. The method update_available_time_for_switch_on_events update this list given the time indexes of a new switch on event
Profiling time went from 1s --> 0.365s
If tot_time == rand_time, then the loop will run one more time and in that case the index_adj will always be empty, thus no need to update the daily_use attribute
from 0.241 to 0.210s
from 0.210 to 0.141s
None are now replaced by NaN
@FLomb @mohammadamint - this is ready for review, I would suggest you run some of your usecases on the development branch and you compare with the results obtain with this new branch. |
indexes_choice = [] | ||
for s in self.free_spots: | ||
if s.stop - s.start >= self.func_cycle: | ||
indexes_choice += [*range(s.start, s.stop - self.func_cycle + 1)] # this will be fast with cython |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solves the problem raised in #28 as the switch_on can by design only be chosen in the part of the available windows such that switch_on + func_cycle <= window upper limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed a few issues that affect both this PR and the latest updates of the development branch, which I hadn't tested this way either.
I am running the code from the perspective of a user that relies on an interface, such as Spyder, from which they run the ramp_run.py
file, for the example input file 1.
The first issue that arises is due to an outdated path in the initialise.py
file, which tries to look for the input files in a path that no longer exists (file_module = importlib.import_module(f"ramp.input_files.input_file_{j}")
). This can be easily fixed by updating the path to f"ramp.input_files.input_file_{j}"
. When doing so, however, a second issue arises, as reported below:
File "/home/fl/GitHub-repos/RAMP/ramp/../ramp/example/input_file_1.py", line 49, in <module>
Ch_indoor_bulb.windows([1200,1440],[0,0],0.1)
File "/home/fl/GitHub-repos/RAMP/ramp/../ramp/core/core.py", line 829, in windows
window_time = window_time + np.diff(getattr(self, f"window_{i}"))[0]
AttributeError: 'Appliance' object has no attribute 'window_4'
This seems to be something more structural. I am not sure why it went unnoticed in the latest updates of the development branch. I haven't yet had time to dig deeper into it, but perhaps it's something straightforward for you based on the latest changes you made?
This would happen because the attribute |
Indeed, it looks like all the example input files are deprecated because they used to define a 'name' for the appliance's user as a first parameter (as in: |
Another issue is that some parts of the code (namely, the post-processing) try to save results in a I have tested the code based on the qualitative testing functionality I had implemented some time ago. There were some issues in the testing script, too, which I fixed. You can see the results below for the 3 default example input files (this is the automated output of the testing script): The results seem highly satisfactory, so I would say that the refactoring of the code is consistent with the correct functioning of the algorithm and could be merged. However, we first need to take care of the issues above. I will try to update the PR with my own fixes. In addition, I haven't tested yet the parallel-computing case. |
In the updated documentation, there are some broken links to images: https://rampdemand-rtd--54.org.readthedocs.build/en/54/examples/parallel_processing/parallel_processing.html |
Tried testing parallel computing, too.
|
The tag source needs to be updated with the GitHub repo image dir in docs/source/_static |
I would suggest to update the examples scripts with the last changes of the code, mainly due to the fact that in those examples, in most of the cases, positional arguments are used, which are changed now. I can take care of this, by updating all the example files (and preferably use keyword arguments to avoid similar issue in the future) |
I specifically am working on the postprocessing and this issue will be solved in the upcoming changes related to postprocessing. However, for now, we can solve the issue by adding the results folder. This will be however a little bit tricky, as if someone installs RAMP using pip, finding the result directory would be in difficult! |
Right now the |
@mohammadamint - do you think you would be done soon with the postprocessing? Otherwise it should be possible to add a line of code to create the results folder if it does not exists with the |
Ok, I did manage to test parallel computing based on .xlsxl files created from the example .py files. Results are satisfactory also in this case, as shown by the testing report below: |
Indeed, all I see in the |
Edits part of review of PR#54
I did this in 7159cb2 and it works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the PR and made edits which were pushed to the rl-institut branch and thereby re-incorporated into this PR. The PR is now ready for merging, the only remaining issue being the missing 'results' folder that leads to conflicts with some post-processing parts of the code. The question is whether to make the results folder a default part of the repository, e.g., by adding default results for the three example input files, or not. I would probably go for the option above, for the time being, so that we can have a fully functioning version of the code for release.
I think we can merge this PR now, and open a separate PR for the 'results' folder issue. |
See the updated documentation for a usecase
refactored the code in order to improve execution time, use of masks were dismissed
added a way to compute ramp profiles for a usecase using parallel processes via the
generate_daily_load_profiles_parallel
method of theUseCase
class (option-p
for command line input)closes switch on + App.func_cycle < rand_window_123[1] #28 (the changes in PR switch on + App.func_cycle < rand_window_123[1] #28 are included in this PR)
The command
takes 1min 40 seconds to complete on the
develpoment
branch.The same command takes 22 seconds to complete on this branch
and the command
takes 18 seconds ( I have 4 cores on my laptop)
There might be room for further improvement playing with the parameter
chunksize
of theimap_unordered
function ofmultiprocessing
package and maybe the parallelization help more on larger usecases and with more cores