Skip to content

Other notes on rsphider...

Brad Smith edited this page May 7, 2015 · 1 revision

Should we use a more established crawler, like Apache Solr?

  • The default answer is "no", for reasons that include:
    1. The RACHEL content is already designed to use rsphider
    2. The provisioning system can already deploy rsphider
    3. We need everything to run on a low-resource machine like the cubietruck. My guess is that a Java/tomcat tool like Solr will be too resource intensive.
  • ...but we could consider an alternative iff it can be clearly demonstrated that:
    1. It will run without performance impact on a cubietruck
    2. It will be easy to adapt the RACHEL content and other content to use it
    3. It will be easier to use and maintain than rsphider
  • One potential argument in favor of Solr: Someday I would like to deploy offline versions of StackExchange sites. This would be an incredibly useful resource, and they provide handy dumps of their content, plus a server for browsing them. That server uses... Solr. Without it, we'll need to adapt their stuff to use our indexer.