Skip to content
ManuelThurner edited this page Nov 17, 2014 · 4 revisions

The Site Monitor is a crawler which looks for errors on the Climate CoLab website. It's a Java program, which can be run from the commandline.

It is located in the repository under other/site-monitor. The command to run it is cd java -cp "target/dependency/*:target/classes" org.xcolab.utils.sitemonitor.SiteMonitor. It is currently configured to run on the cognosis server every day, using a cronjob.

The Site Monitor can be configured using the file other/site-monitor/src/main/resources/siteMonitor-config.xml. The content of the configuration file are described below:

Checkers

The checkers configured in this section will be used to verify if a crawled page is considered erroneous. There are three types of Checkers at the moment: RegexMatchingChecker: Checks if a given regular expression is matched on the page. If yes, it considers it as an error. ResponseStatusChecker: Checks for a specific HTTP response code (e.g. 500). InverseCheckResultChecker: Inverses the result of another checker.

Typical checker configuration:

    <checkers>
		<checker>
			<name>errorNotificationChecker</name>
			<class>org.xcolab.utils.sitemonitor.checkers.RegexMatchingChecker</class>
			<configuration>(alert\-error|Internal\s*error)</configuration>
			<message>Error notification found on a page</message>
		</checker>
		<checker>
			<name>noErrorNotificationChecker</name>
			<class>org.xcolab.utils.sitemonitor.checkers.InverseCheckResultChecker</class>
			<configuration>errorNotificationChecker</configuration>
			<message>Error notification found on a page</message>
		</checker>
		<checker>
			<name>unavailablePortletChecker</name>
			<class>org.xcolab.utils.sitemonitor.checkers.RegexMatchingChecker</class>
			<configuration>.*is\s*temporarily\s*unavailable.*</configuration>
			<message>Unavailable portlet found</message>
		</checker>
		<checker>
			<name>noUnavailablePortletsChecker</name>
			<class>org.xcolab.utils.sitemonitor.checkers.InverseCheckResultChecker</class>
			<configuration>unavailablePortletChecker</configuration>
			<message>Unavailable portlet found on a page</message>
		</checker>
	</checkers>

A checker then also can be mapped to only be applied to specific URL patterns.