Skip to content

SolrWayback bundle 5.1.2

Latest
Compare
Choose a tag to compare
@thomasegense thomasegense released this 01 Aug 06:14
· 8 commits to master since this release

The SolrWayback distribution is an out of the box solution for exploring archived webpages in ARC/WARC format.
Runs under Windows/Linux/MacOs.

SolrWayback bundle version 5+ now require java 11 or java 17 and no longer runs under java8. Tomcat and Solr has both been upgraded
from version 7 to version 9. SolrWayback webapp will be backwards compatible with a solr7 index. If you have a large index build under solr7 just keep the solr7 and do not use the new solr9 folder.

Download: https://github.com/netarchivesuite/solrwayback/releases/download/5.1.2/solrwayback_package_5.1.2.zip

How to install:
Unzip the bundle and read 'install guide' section in the README.md file in the root of the zip-file.
Solr must now be started with a -c (for cloud) argument:
solr-9/bin/solr start -c -m 4g

How to upgrade from a previous version:
Replace the solrwayback folder with the new folder, but keep the solr7 folder if you already have build an index and do not want to reindex.
Compare properties in solrwayback.properties and solrwaybackweb.properties with yours and add new missing properties.

Changelog:
See https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md

Changes since last 4.2.2 release:

5.1.2
Bug fix. Chunking was not removed in all cases. This was only relevant for WARC-files that are created with chunking. (not Heritrix)
Dockerfile has been updated to build SolrWayback bundle 5.1.0. (Will be upgraded each release) See: #456 implemented by @c-vandendyck-kbr
Geo search was not working for Solr 9.4 in cloud mode. Solr function query syntax rewrite was required and it also is backwards compatible with Solr7.

5.1.1
Little cleanup in log messages due to shard-splitting to avoid repeated stack traces.
Solr9 bug temporary bug fix due to invalid Json from Solr. See:#449

5.1.0
Substatial speed up when exporting (csv,warc etc.) from large multi sharded collections. See #329 (Thanks Toke Eskildsen) This feature still needs a little more testing. Feedback will be welcome.

Minor tweaking of log info/debug. Less log lines in default solrwayback.log when running with log level INFO.
Fix regression bug where "page resources" was not showing missing resources for the webpage.

Updated the bundle install documentation. Added new section how to redeploy the Solr configuration.

5.0.0
Upgrade Java 1.8 → 11, Tomcat 8.5 → 9 and Solr 7 → 9. SolrWayback 5.0.0 is backwards compatible with existing Solr 7 installations.
Better guide for using start and stop scripts.
Fixed csv/json export when more than 1 facet was selected. (regression bug... sorry)
warc-indexer now also finds arc files when searching recursive(thanks to @fedorw)
Frontend third-parties dependencies updated.

4.4.3
Add Zip Export feature. It is now possible to extract raw files from SolrWayback in a combined zip file. This could for example be used to extract all HTML content, images, video etc. from a search result. (github #382 and #245). Add additional property in solrwaybackweb.properties to increase the default max file limit: export.zip.maxresults=1000000

Docker support. The docker file will install the SolrWayback in the docker container. You can index WARC files from a folder outside the docker contain. See the docker file for documentation. (Thanks to Trym Bremnes for this PR)

Query hints fix (range queries). The search validation helper did like range queries and showed warning when they was correct. (github #380)
Remove an error message that would be shown while waiting to load "Page resources"

CTRL+click on a facet will open the search-result in a new tab. On macOS use CMD+click. (github #404)

Setting encoding to UTF-8 when indexing into Solr using the indexing scripts in the bundle install. Some OS/docker containers may not have UTF-8 as default.