Other notes on rsphider...

Should we use a more established crawler, like Apache Solr?

The default answer is "no", for reasons that include:
1. The RACHEL content is already designed to use rsphider
2. The provisioning system can already deploy rsphider
3. We need everything to run on a low-resource machine like the cubietruck. My guess is that a Java/tomcat tool like Solr will be too resource intensive.
...but we could consider an alternative iff it can be clearly demonstrated that:
1. It will run without performance impact on a cubietruck
2. It will be easy to adapt the RACHEL content and other content to use it
3. It will be easier to use and maintain than rsphider
One potential argument in favor of Solr: Someday I would like to deploy offline versions of StackExchange sites. This would be an incredibly useful resource, and they provide handy dumps of their content, plus a server for browsing them. That server uses... Solr. Without it, we'll need to adapt their stuff to use our indexer.

Provide feedback