You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Councilmatic server had significant memory issues, beginning around midnight February 28. The /var/log/syslog shows that python started to kill processes (python invoked oom-killer) around 12:50 - after the execution of the Chicago cron (45 after) and LA Metro cron (40 after).
@evz and I rebooted the server, and then we watched the memory consumption, as crontasks executed. We noticed that the LA Metro update_index process required considerable memory: the process (Jetty) consumed about 15% of memory and doubled to around 30% inserting the data into the Solr index (Java). Such memory use could be hazardous, if it overlaps with other indexing processes (i.e., NYC and Chicago).
Additionally, we noticed that the rtf conversion script for NYC sometimes requires longer than 15 minutes to complete (which delays NYC data imports). Let's replace the RTF --> HTML with the actual PDFs. It should be possible via this PR.
The text was updated successfully, but these errors were encountered:
The SearchIndex class provides structured data to the search engine. (Note: the search engine is document-based – a single text blob that gets tokenized, analyzed, and indexed – much like a key-value store.) An instance of the SearchIndex can contain a get_updated_field function. This tells the search index which field has an "updated" timestamp. For Councilmatic, the bill model has an updated_at field, and we tell Haystack all about it. Hence, we can use the --age argument.
It looks like Chicago had a big data import day, resulting in some unusually large bill counts. I queried our Councilmatic database for bills updated in the last hour: it's 1704. This number aligns with what I saw in the update_index log (also, 1704).
In short, the --age argument works as expected and should be implemented in LA Metro (and other Councilmatics, including staging sites, that do not use it).
The Councilmatic server had significant memory issues, beginning around midnight February 28. The
/var/log/syslog
shows that python started to kill processes (python invoked oom-killer
) around 12:50 - after the execution of the Chicago cron (45 after) and LA Metro cron (40 after).@evz and I rebooted the server, and then we watched the memory consumption, as crontasks executed. We noticed that the LA Metro
update_index
process required considerable memory: the process (Jetty) consumed about 15% of memory and doubled to around 30% inserting the data into the Solr index (Java). Such memory use could be hazardous, if it overlaps with other indexing processes (i.e., NYC and Chicago).We identified several next steps:
--batch
argument with LA Metro, as we do with Chicago and NYC (handled by Better cron Metro-Records/la-metro-councilmatic#262)--age
arguments to Chicago and NYC, but we set it to 1 - could that many bills have been added in an hour? Let's determine how to best use this arg. (See below.) Handled by Better cron Metro-Records/la-metro-councilmatic#262 and Give an age and batch size to the staging update_index chi-councilmatic#236/var/log
rather than the/tmp
directory - we want to preserve these logs after a reboot...handled by Better cron Metro-Records/la-metro-councilmatic#262, Put logs in /var/log nyc-council-councilmatic#134, and Move logs to /var/log chi-councilmatic#237Additionally, we noticed that the rtf conversion script for NYC sometimes requires longer than 15 minutes to complete (which delays NYC data imports). Let's replace the RTF --> HTML with the actual PDFs. It should be possible via this PR.
The text was updated successfully, but these errors were encountered: