Skip to content

Commit

Permalink
Release 5.1.1 preparation.
Browse files Browse the repository at this point in the history
 Solr config folder added.
README.md updated with Solr Cloud deploy usage.
Minor README.md improvements.
  • Loading branch information
Thomas Egense committed Mar 26, 2024
1 parent 5038524 commit 5b629a2
Show file tree
Hide file tree
Showing 50 changed files with 8,776 additions and 113 deletions.
26 changes: 17 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SolrWayback

## SolrWayback 5.0.0 software bundle has been released
SolrWayback bundle release 5.0.0 can be downloaded here: https://github.com/netarchivesuite/solrwayback/releases/tag/5.0.0
## SolrWayback 5.1.1 software bundle has been released
SolrWayback bundle release 5.1.1 can be downloaded here: https://github.com/netarchivesuite/solrwayback/releases/tag/5.1.1

The bundle is the recommended way to get started with SolrWayback. You download the bundle, follow the installation guide and index your own WARC files. Then you are up to speed.

Expand Down Expand Up @@ -135,13 +135,10 @@ Documents in SolrWayback are indexed through the [warc-indexer](https://github.c
* A Solr 9+ server with the index build from the Arc/Warc files using the Warc-Indexer version 3.2.0-SNAPSHOT+
* (Optional) chrome/(chromium) installed for page previews to work. (headless chrome)

## Build and usage
## Build and usage for developers.
* Build the application with: `mvn package`
* Deploy the `target/solrwayback-*.war` file in a web-container
* Copy `src/test/resources/properties/solrwayback.properties` and `/src/test/resources/properties/solrwaybackweb.properties`
to either the root of the tomcat folder or the `user/home/` folder for the J2EE server.
Alternatively use the [src/main/webapp/META-INF/context.xml](src/main/webapp/META-INF/context.xml) as template
for a context for the SolrWayback WAR and set the paths for the properties directly.
* Copy `properties/solrwayback.properties` and `properties/solrwaybackweb.properties` to the `user/home/` folder.
* Modify the property files. (default all urls http://localhost:8080)
* Open search interface: http://localhost:8080/solrwayback

Expand Down Expand Up @@ -171,13 +168,14 @@ Unzip and follow the instructions below.

### 1) INITIAL SETUP

* Copy `properties/solrwayback.properties` and `properties/solrwaybackweb.properties` to the `user/home/` folder.
If you want to use a custom location for the properties you can edit and enable the tomcat context environment variables in `/tomcat-9/conf/Catalina/localhost/solrwayback.xml`

* **Optional:** For screenshot previews to work you may have to edit the file `properties/solrwayback.properties` and change the value of the last two properties : `chrome.command` and `screenshot.temp.imagedir`.
Chrome(Chromium) must be installed for preview of images to work.

If you encounter any errors when running a script during installation or setup, try change the permissions for the file (`startup.sh` etc.). On Linux and mac, this can be done with the following command: `chmod +x filename.sh`

**Note:** Previous versions of the SolrWayback bundle expected the property files to be located at the root of the home folder of the user. If this is preferable, move the two property files `solrwayback.properties` and `solrwaybackweb.properties` from the `properties/` folder in the bundle to the root of the home folder of the user.

### 2) STARTING SOLRWAYBACK
SolrWayback requires both Solr and Tomcat to be running. These processes are started and stopped separately with the following commands:

Expand Down Expand Up @@ -317,6 +315,16 @@ If you want to remove and old index and create a new index from scratch, this ca
3. Start solr
4. Start the indexing script

### Update Solr cloud configuration
For experienced Solr users only that want to tweak the Solr configuration.
If you want to make changes to schema.xml or solrconfig.xml you must use the cloud update script on a running Solr Cloud.
Changes to schema.xml must be done before starting indexing. Changes to SolrConfig.xml can be done run time.
To update the configuration use the following two commands. (replace paths to your system)
`bin/solr zk upconfig -n netarchivebuilder_conf -d "/home/xxx/solrwayback/solrwayback_package_5.1.1/solr_config/conf" -z localhost:9983`
`curl -X POST "http://localhost:8983/api/collections/netarchivebuilder/" -H 'Content-Type: application/json' -d '{"modify":{"config": "netarchivebuilder_conf" } }`



### Faster indexing
A powerful laptop can handle up to 8 simultaneous indexing processes with Solr running on the same laptop.
Using an SSD for the Solr-index will speed up indexing and also improve search/playback performance drastically.
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<modelVersion>4.0.0</modelVersion>
<groupId>dk.kb.netarchivesuite.solrwayback</groupId>
<artifactId>solrwayback</artifactId>
<version>5.1.0</version>
<version>5.1.1</version>
<packaging>war</packaging>
<name>solrwayback</name>
<url>https://maven.apache.org</url>
Expand Down
22 changes: 22 additions & 0 deletions review.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
non redirect skal ikke sætte headers?
redirect skal adde headers og ikke overskrive.

Eksempel 1:
https://solrwb-test.kb.dk:4000/solrwayback/services/memento/http://www.twenty-fourflowers.com/
Eksempel 2:
https://solrwb-test.kb.dk:4000/solrwayback/services/memento/http://prak10k.dk/?page_id=13




ref:
http://timetravel.mementoweb.org/api/json/2013/http://cnn.com



Review:
Bug: fixed. Redirect må ikke have payload.
Kun redirect support - playback kan ikke køre under /memento url også. (Kompliceret forklaring).
Host -> localhost
todo comment in DatetimeNegotiationTest
Good unittests + solr unittest
106 changes: 3 additions & 103 deletions src/bundle/README.md
Original file line number Diff line number Diff line change
@@ -1,104 +1,4 @@
# SolrWayback bundle

Resources used when building the SolrWayback bundle.

- `install SolrWayback bundle`: See install guide [SolrWayback README](https://github.com/netarchivesuite/solrwayback/blob/master/README.md/)
- `indexing`: Scripts for indexing WARC files using [webarchive-discovery](https://github.com/ukwa/webarchive-discovery/)
- `Changes.md`: See version history [SolrWayback](https://github.com/netarchivesuite/solrwayback/blob/master/CHANGES.md/)

- solrwaybackproxy
- Solr 9 config files
- Tomcat 9
- Solr 9

## How to for package managers

### Build WARs and JAR

Create the SolrWayback WAR
```
mvn clean package
```

Build a `warc-indexer-0.3.2-SNAPSHOT-jar-with-dependencies.jar` from [webarchive-discovery](https://github.com/ukwa/webarchive-discovery/).

Build a `solrwaybackrootproxy-4.3.1.war` from [solrwaybackrootproxy](https://github.com/netarchivesuite/solrwaybackrootproxy).

### Folder structure

```
mkdir solrwayback_package_4.5
cd solrwayback_package_4.5/
cp -r ../src/bundle/indexing/ .
cp
cp -r ../src/test/resources/solr_9/ solr_9_files.
cp ../README.md ../CHANGES.md .
mkdir properties
cp ../src/test/resources/properties/solrwayback.properties properties/
cp ../src/test/resources/properties/solrwaybackweb.properties properties/
```

Copy the previously generated `warc-indexer-XXX-jar-with-dependencies.jar` to the `indexing/` folder.

### Tomcat 9

Download and unpack Tomcat 9 (in current folder `solrwayback_package_4.5`)
```
wget 'https://dlcdn.apache.org/tomcat/tomcat-9/v9.0.84/bin/apache-tomcat-9.0.84.tar.gz'
tar -xzovf apache-tomcat-9.0.84.tar.gz
mv apache-tomcat-9.0.84 tomcat-9
rm apache-tomcat-9.0.84.tar.gz
```

Copy WAR and context:
```
cp ../target/solrwayback-*.war tomcat-9/webapps/solrwayback.war
mkdir -p conf/Catalina/localhost/
cp ../src/main/webapp/META-INF/context.xml tomcat-9/conf/Catalina/localhost/solrwayback.xml
```

Edit `tomcat-9/conf/Catalina/localhost/solrwayback.xml` and set
* `solrwayback-config` to `properties/solrwayback.properties`
* `solrwaybackweb-config` to `properties/solrwaybackweb.properties`

Copy and rename the previously generated `solrwaybackrootproxy-4.3.1.war` to `tomcat/webapps/ROOT.war`.

### Solr 9

Download and unpack Solr 9 (in current folder `solrwayback_package_4.5`)
```
wget 'https://www.apache.org/dyn/closer.lua/solr/solr/9.4.0/solr-9.4.0.tgz?action=download' -O solr-9.4.0.tgz
tar -xovf solr-9.4.0.tgz
mv solr-9.4.0 solr-9
rm solr-9.4.0.tgz
```

/Optional but makes it easier to debug:/ Open Solr to the World instead of just localhost
```
sed -i 's/#SOLR_JETTY_HOST="127.0.0.1"/SOLR_JETTY_HOST="0.0.0.0"/' solr-9.4.0/bin/solr.in.sh
sed -i 's/REM set SOLR_JETTY_HOST=127.0.0.1/set SOLR_JETTY_HOST=0.0.0.0/' solr-9.4.0/bin/solr.in.cmd
```

Start Solr in cloud mode, create a 1 shard `netarchivebuilder` collection and shut it down
```
solr-9/bin/solr start -c -m 1g
solr-9/bin/solr create_collection -c netarchivebuilder -d solr_9_files/netarchivebuilder/conf/ -n sw_conf_1 -shards 1
solr-9/bin/solr stop
```

### Finishing and packing (in current folder `solrwayback_package_4.5`)

Remove Emacs backup files (if any)
```
find . -iname "*~" | xargs rm
```

Create the bundle
```
cd ..
zip -r solrwayback_package_4.5.zip solrwayback_package_4.5/
```

- `properties`: Default properties for the SolrWayback Bundle
# Solr configuration

This folder contains a copy of the Solr configuration and can be used upload a new Solr configuration to Solr. Only for experience Solr users that knows what they are doing.
See the' Update Solr cloud configuration' in the project README.md
38 changes: 38 additions & 0 deletions src/bundle/solr_config/conf/elevate.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<?xml version="1.0" encoding="UTF-8" ?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

<!-- If this file is found in the config directory, it will only be
loaded once at startup. If it is found in Solr's data
directory, it will be re-loaded every commit.
See http://wiki.apache.org/solr/QueryElevationComponent for more info
-->
<elevate>
<query text="foo bar">
<doc id="1" />
<doc id="2" />
<doc id="3" />
</query>

<query text="ipod">
<doc id="MA147LL/A" /> <!-- put the actual ipod at the top -->
<doc id="IW-02" exclude="true" /> <!-- exclude this cable -->
</query>

</elevate>
8 changes: 8 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_ca.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Set of Catalan contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
d
l
m
n
s
t
9 changes: 9 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_fr.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Set of French contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
l
m
t
qu
n
s
j
5 changes: 5 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_ga.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Set of Irish contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
d
m
b
23 changes: 23 additions & 0 deletions src/bundle/solr_config/conf/lang/contractions_it.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Set of Italian contractions for ElisionFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
c
l
all
dall
dell
nell
sull
coll
pell
gl
agl
dagl
degl
negl
sugl
un
m
t
s
v
d
5 changes: 5 additions & 0 deletions src/bundle/solr_config/conf/lang/hyphenations_ga.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Set of Irish hyphenations for StopFilter
# TODO: load this as a resource from the analyzer and sync it in build.xml
h
n
t
6 changes: 6 additions & 0 deletions src/bundle/solr_config/conf/lang/stemdict_nl.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Set of overrides for the dutch stemmer
# TODO: load this as a resource from the analyzer and sync it in build.xml
fiets fiets
bromfiets bromfiets
ei eier
kind kinder
Loading

0 comments on commit 5b629a2

Please sign in to comment.