-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very high memory usage with write_html.py #15
Comments
I think you could add something like this to the start of every function in write_html.py, and try to determine which function grows memory the most.
|
I launched the following command for one minute only:
|
@fturco Awesome man, thanks for putting in the effort. I'll try to fix this this weekend. |
Can you try commenting out these two lines and seeing how the memory goes?
The script is intentionally loading all content for a sub into memory, so it's kind of a big logical failure. I've gotta rewrite a bit of it and maybe a lot of it. Not having all of the comments in memory at once might be enough to get by. |
After commenting out those lines, |
Okay I pushed an update. Not everything was optimized, but I think it should be a lot better. If it's still bad, can you try commenting out this line:
|
I tried running I haven't yet tried commenting out the line you specified. |
To give you a better idea, I have already archived 16 subreddits:
|
Thanks for the stats. Well how it stands now, I'm basically loading all /data/*/links.csv data into memory. So whenever you get 13GB of link data (only, not comments) in your archive, you won't be able to generate the html. So uhh I donno. Maybe we'll leave this open and I'll do more optimization in the future. Bug me when you get to 8GB of data? |
OK, sure. |
On my system running
write_html.py
without arguments requires too much memory and takes too long. After more than 30 minutes I had to manually stop it because my system became unresponsive. Memory usage increased slowly but relentlessly, untilwrite_html.py
used all 8 gigabytes of RAM plus 5 gigabytes of swap.My data directory is currently 3.1 gigabytes. It will continue to expand in the future because I'm always fetching new subreddits.
How can I help debug this?
P.S. My knowledge of Python is still very small...
The text was updated successfully, but these errors were encountered: