Skip to content

Jhove2 Developer Handbook

Richard Anderson edited this page Jul 26, 2013 · 1 revision

Handbook for JHOVE2 Code Project Collaboration and Release Builds

This document reviews the mechanisms and recommended procedures for making enhancement and bug fix contributions to the JHOVE2 code base . The technologies covered include:

The discussion of the development and merge process is unavoidably intertwined with the exposition of these tools' capabilities. The final section covers the tasks associated with production and distribution of a new release.

JHOVE2 Project Forks

We are using the [Bitbucket] (https://bitbucket.org) source code hosting service to make the JHOVE2 software publicly available and to encourage collaborative development of new features and bug fixes. Bitbucket, like other hosting services, offers a capability for making a "fork" (cloned copy) of a source code project, that is stored in the user's personal Bitbucket account space. The difference between a fork and a regular Mercurial clone is that Bitbucket manages the relationship between the original repository and your personal fork. For more information see: Forking a Repository on Bitbucket.

The Bitbucket repository forks managed by the jhove2 user id include:

  • jhove2/main - the production repository for source code, issue tracking, documentation and wiki.
  • jhove2/stage - the development integration, build, and test environment

For the JHOVE2 project our practice to date has been for each developer to base their personal fork on jhove2/main and then clone the personal fork to our own workstation. Ideally, in the future, the jhove2/stage fork should be used as the starting point for personal forks. Developer activity therefore results in an accumulation of new changesets in each developer's respective Bitbucket fork. These changesets are can be pulled and merged into other developer's forks to maintain compability. When developer code becomes stable, the release build manager should upload the accumulated changesets into the stage fork for testing and review. Ultimately a final push into the main stable JHOVE2 repository is done when the staged code is ready for distribution.

The following personal forks exist for the project's current code committers:

Bitbucket Basics

Here are some references which provide background about Bitbucket and the use of Mercurial in that service context.

Bitbucket's Tools for Project Forks

Source code repository services like Bitbucket and Github provide centralized mechanisms for collaboration that supplement the code sharing capabilities of a DVCS. These include the "pull request" and "compare view" features, neither of which has been used (yet) by the JHOVE2 development team, but which merit some discussion if only to stimulate interest for the future.

Bitbucket Pull Requests

Some projects make use of the pull request mechanism provided by Bitbucket to submit code contributions from a repository fork to a parent repository. For information on how this works, see the following documentation:

The JHOVE2 project is not using pull requests at this time. For one reason, most of the committer's project forks were based on the jhove2/main repository instead of the more desirable jhove2/stage copy. And our stated policy is that we want to first merge our changes into the stage copy. At the next iteration of development, it would be a good idea to rectify this situation, to help formalize our code review and change approval process as well as to utilize the related "Compare" feature discussed in the next section.

Note that we are also not using the "Pull requests across branches" capability recently added to Bitbucket. This tool is provided for projects in which a small team that works closely shares a single repository, but uses branches within the repository to develop new features or bug fixes. If you're curious about that feature, see:

Bitbucket Compare View

After you create a project fork, either your copy or the parent copy (or both) are likely to change. Bitbucket provides a tool allowing you to compare your fork work to the original or to create a pull request. This tool works in both directions, so you can use if from your forked copy to view "outgoing" changes or "incoming" changes. See:

A very handy side benefit of this tool is that it provides a listing of the mercurial commands you could enter from a clone of the parent repository to merge the changes from the fork into the parent. This can be handy for testing in advance of submitting a pull request. For example, the following commands were suggested to merge sheila's contributions into a local copy of a different project fork.

    $ hg update default
    $ hg pull -r default ssh://[email protected]/smmorrissey/jhove2-sheila
    $ hg merge 49604476ae7c

Bitbucket Issue Tracker

For an overview see Using your Bitbucket Issue Tracker

Right now, the values that Bitbucket allows for issue tracker status are new, open, resolved, invalid, duplicate, and wontfix. Unfortunately this set of state values is somewhat limiting. For example, how do we signal that an issue is resolved in the developer's fork, but the code hasn't been merged into the main fork yet? It's a little hard to see what issues still remain on a version in the current system. To work around this limitation, the JHOVE2 team has attached the following meanings to these state names:

  • New: issue is not resolved. Assignment to a contributor indicates that work is in progress.
  • Open: issue is resolved, but code fix is still in developer's fork.
  • Resolved: issue is resolved, and code has been pushed into main fork.

The jhove2/main project is configured with the Issues service enabled. This allows us to take advantage of Bitbucket's ability to scan the messages of incoming changeset and automatically set issue status to "Resolved" for fixed items. Therefore, when a developer commits code that fixes an issue, the commit message should reference the issue number using the syntax specified in the Bitbucket Documentation :

    <command> <issue id>   

Where is "#" followed by the issue number and is any of the following keywords:

    close
    closed
    closes
    closing
    fix
    fixed
    fixes
    resolve
    resolves
    resolved
    resolving

examples:

    close #845
    fixes issue 746
    resolving #3117

If you only use the #{id} syntax like this:

    #155 change added to updated script files
    Issue #173, Issue #174, Issue #175 displayer properties files for Arc, Gzip, and Warc modules
    Issue #165, Issue #166 plus some merge cleanup ...

with no "closed" or "fixed" or "resolved" keyword preceding the issue number, then the Resolved trigger will not be activated.

Alternatively one can manually mark an issues resolved by clicking on the Resolve button while displaying the issue detail. This provides you with the opportunity to enter text such as the following in a comment:

    Fixed by <<changeset e0b041e7f8f8>>   

The above syntax will cause a link to be created to the changeset. provided that changeset has been pushed to the main fork. To link to a changeset in another developers fork use comment text such as "addressed in changset 9f75269faa71 in mstrong fork".

Mercurial Basics

The following is a sampling of the online resources available for learning about the Mercurial distributed version control system.

Client Software for Mercurial

The documentation from mercurial.selenic.com covers the installation of mercurial software for command-line source code management operations, but most users find it very convenient to install client software that wraps the most essential of those capabilities in a graphical user interface. Both of the following programs include installation of the core Mercurial software as well as the GUI, eliminating the need for a separate mercurial install.

For Windows:

For Macintosh:

Alternatively (or additionally) you may use Mercurial from inside an integrated development environment. Plugin integration is available in Eclipse, Intellij IDEA, NetBeans, and Visual Studio.

For Eclipse:

For Intellij IDEA Mercurial plug-in is bundled (available out of the box) in IDEA

Source Code Synchronization

For the JHOVE2 project our practice to date has been for each developer to create a personal project fork based on jhove2/stage and then clone it to our personal workstation. We do development work and commit changes locally and/or pull and merge changesets from other developers' forks, then push our new changesets back to Bitbucket. This can be accomplished by using mercurial from the command line, or with a client GUI tool, such as TortoiseHg or MacHg. A discussion of the core mercurial commands will illustrate the principles and basic procedures involved in reviewing and combining code from multiple sources.

references:

hg incoming

Mercurial provides the hg incoming command to tell us what changes the hg pull command would pull into the repository, without actually doing the pull operation. The command returns a list of the new changesets that would be incorporated by a subsequent pull command. If you then decide to add those changes to the repository, you should use hg pull -r X where X is the last changeset listed by hg incoming.

    $ hg incoming  https://bitbucket.org/smmorrissey/jhove2-sheila

This also works if you have made a local clone of the repository you will be merging from

    $ hg clone https://[email protected]/jhove2/stage
    $ hg clone https://[email protected]/smmorrissey/jhove2-sheila  sheila
    $ cd stage
	$ hg incoming ../sheila
	comparing with ../sheila
	searching for changes
	changeset:   647:f3108a8d9815
	user:        [email protected]
	date:        Thu Aug 30 13:00:29 2012 -0400
	summary:     ISSUE #146 Typo in DROID Signature File
	
	changeset:   648:436938819a75
	user:        [email protected]
	date:        Wed Oct 17 23:26:38 2012 -0400
	summary:     Issues #148, #160, #161
	...

The --stat option to this command adds more file-level detail about the contents of each commit to be pulled

	$ hg incoming --stat ../sheila 
	comparing with ../sheila
	searching for changes
	changeset:   647:f3108a8d9815
	user:        [email protected]
	date:        Thu Aug 30 13:00:29 2012 -0400
	summary:     ISSUE #146 Typo in DROID Signature File
	
	 config/droid/DROID_SignatureFile_V20.xml                     |  2 +-
	 config/spring/module/format/jhove2-otherFormats-config.xml   |  2 +-
	 pom.xml                                                      |  6 +++---
	 src/main/java/org/jhove2/module/format/BaseFormatModule.java |  5 ++---
	 src/test/resources/config/module/format/sgml/test-config.xml |  2 +-
	 5 files changed, 8 insertions(+), 9 deletions(-)
	...

The --patch option adds even more granular file differences detail

	$ hg incoming -p ../sheila | morecomparing with ../sheila
	searching for changes
	changeset:   647:f3108a8d9815
	user:        [email protected]
	date:        Thu Aug 30 13:00:29 2012 -0400
	summary:     ISSUE #146 Typo in DROID Signature File
	
	diff -r ce6b6349f804 -r f3108a8d9815 config/droid/DROID_SignatureFile_V20.xml
	--- a/config/droid/DROID_SignatureFile_V20.xml  Thu Mar 22 14:20:32 2012 -0700
	+++ b/config/droid/DROID_SignatureFile_V20.xml  Thu Aug 30 13:00:29 2012 -0400
	@@ -8453,7 +8453,7 @@
	         
	         <!-- JHOVE2-specific extension added for ICC color profile, 2010-08-31 -->
	         <FileFormat ID="10002" Name="ICC Color Profile" Version="4.2.0.0" PUID="x-fmt/10002">
	-               <Extension>icc</Extension>Extension>
	+               <Extension>icc</Extension>
	         </FileFormat>
	     </FileFormatCollection>
	 </FFSignatureFile>
	\ No newline at end of file
	...

References:

hg outgoing

The hg outgoing command is the inverse of the incoming command. It show the local changesets not found in the specified destination repository or the default push location. These are the changesets that would be transferred if a push command were requested.

References:

hg pull

The hg pull command finds all new changesets from the specified remote repository and adds them to the local repository copy. If no remote repository path or URL is specified, then the repository from which the local copy was cloned will be used as the source. By default, this does not update the copy of the project in the working directory. In this respect hg pull is equivalent to git fetch.

    $ cd rnanders
	$ hg pull ../stage
	pulling from ../stage
	searching for changes
	adding changesets
	adding manifests
	adding file changes
	added 8 changesets with 16 changes to 14 files
	(run 'hg update' to get a working copy)

As stated above the hg incoming command can be used before doing a pull in order to assess the changes that would be made.

If used with the --update option, the pull command will combine the effects of hg pull and hg update. The --update option will, however, refuse to merge or overwrite local changes.. hg pull -u is equivalent to git pull

References:

force option of pull and incoming commands

Both the pull and incoming command have a force option, the help message for which reads:

    -f --force     run even when remote repository is unrelated

If you attempt to pull in commits from an unrelated repository without using this option, you will get the error

    abort: repository is unrelated

But what does "unrelated" mean? Here is a hypothetical example Suppose you have two repositories with completely different changeset histories

    repo A:  CA1 -> CA2 -> CA3
    repo B:  CB1 -> CB2 -> CB3 -> CB4

If you try to pull changesets from repo A to repo B and repo B doesn't have the root changeset of the tree of changesets found in A, then Mercurial will abort with the above warning.

In our situation, the various forks of the JHOVE2 project are all related, since they share a common commit ancestry. Therefore we do not need to use the --force option.

References:

hg update

There are two parts to a cloned repository: the local repository and the working directory. hg pull pulls all new changes from a remote repository into the local one but doesn't alter the working directory. This keeps you from upsetting your work in progress, which may not be ready to merge with the new changes you've pulled and also allows you to manage merging more easily.

hg update attempts to update the contents of the local working directory with the specified changeset. If no changeset is specified, the workspace is updated to the tip of the current named branch.

References:

update with no local changes

If the incoming changes are simply being added to the end of the current local branch (i.e. the local tip is an ancestor of the incoming changesets) , then the local directory is fast-forwarded to a new tip, and we are done.

	$ hg update
	14 files updated, 0 files merged, 2 files removed, 0 files unresolved

update with uncommitted local changes

The following rules apply when the working directory contains uncommitted changes:

  1. If neither --check nor --clean is specified, and if the requested changeset is an ancestor or descendant of the working directory's parent, the uncommitted changes are merged into the requested changeset and the merged result is left uncommitted. If the requested changeset is not an ancestor or descendant (that is, it is on another branch), the update is aborted and the uncommitted changes are preserved.
  2. With the --check option, the update is aborted and the uncommitted changes are preserved.
  3. With the --clean option, uncommitted changes are discarded and the working directory is updated to the requested changeset.

If case 1 happens, then the result of the merge must be inspected, any conflicts resolved, and a hg merge command used to finalize the operation.

The Mercurial shelve extension may be used to stash uncommitted changes temporarily in order to avoid this situation.

ShelveExtension - Mercurial

update with divergent local commit

If local changes have been committed to the local repo and a new remote changeset pulled, the usual outcome is a repo with two tips that need to be merged.

	hg pull ../other
	added 1 changesets with 1 changes to 1 files (+1 heads)
	(run 'hg heads' to see heads, 'hg merge' to merge)

The hg heads command will list two changesets

	$ hg heads
	changeset:   6:b6fed4f21233
	tag:         tip
	parent:      4:2278160e78d4
	changeset:   5:5218ee8aecf3

An attempt to do hg update will fail:

	$ hg update
	abort: crosses branches (use 'hg merge' or 'hg update -C')

References:

hg merge

The current working directory is updated with all changes made in the requested revision since the last common predecessor revision.

Files that changed between either parent are marked as changed for the next commit and a commit must be performed before any further updates to the repository are allowed. The next commit will have two parents.

--tool can be used to specify the merge tool used for file merges. It overrides the HGMERGE environment variable and your configuration files. See hg help merge-tools for options.

If no revision is specified, the working directory's parent is a head revision, and the current branch contains exactly one other head, the other head is merged with by default. Otherwise, an explicit revision with which to merge with must be provided.

hg resolve must be used to resolve unresolved files. References:

hg rebase

That said there is an extension you can use that would clean up your history a bit (but won't resolve your issue with needing to pull before you push). It is called the rebase extension, it is shipped with Mercurial but disabled by default. It adds a new argument to pull that looks like:

    hg pull --rebase

This will pull new changes and moves your local changeset linearly above them without having a merge changeset. However, I would urge against using this since you do lose information about your repository since you are re-writing its history. Read this post for information about some issues that this may cause.

hg push

Once all changesets from various personal forks are merged into a local workspace, then the finalized project can be pushed up one or more remote Bitbucket repository locations.

The simplest form of this command would be

    hg push

With no other options, this assumes you are pushing the changesets up to the remote server from which you cloned your local copy of the project. The URL of this remote location is stored in your local repo's .hg/hgrc file as the property "default". You can overide this default by explicitly specifying a URL, like this

    hg push https://[email protected]/projhome/projname

If using password authentication, you will be prompted for username and password. You can alternatively configure the local repo to use SSH keypair authentication.

references:

Maven

The JHOVE2 code project makes use of Maven to describe the software project, and for specifying both java library dependencies and the build operation.

see also:

Maven and IDEs

Although you can use the "mvn" command to execute maven lifecycle phases or goals, its is also possible to use the maven integration supplied by an IDE plugin.

For Eclipse you must install the m2eclipse plugin. See:

For Intellij IDEA maven integration is supplied out of the box, but must be activated. See: Intellij and Maven

Open IDEA and make sure that the "Maven Integration" checkbox is clicked in the File/Settings/IDE Settings/Plugins list Output from compiles or release builds is place in the target folder:

On a Macintosh, you will need to tell the IDE where Maven is located:

  • Specify the paths to Maven dependencies using File/Settings/Maven
    • Maven home directory: /usr/share/maven (overridden on my Mac system)
    • Maven user settings file: ~/.m2/settings.xml (default)
    • Maven repository directory: ~/.m2/directory (default)

Import the Maven-based project into the IDE

After you have activated your IDE's maven plugin, you will have the ability to import the source code as a "maven project". This is much preferrable to using your IDE's "create java project from existing code" wizard.

For Eclipse

  • File > Import -> Existing Maven Project

For Intellij IDEA see: Importing Project from Maven Model

  • click "Create New Project"
    • click the radio button next to "Import project from external model"
    • select Maven and click "Next" button
    • specify the path to the code project's home directory, other defaults are OK, click "Next" button
    • the Maven project to import should be listed, click "Next" button
    • provide a local name for the project

Specify the location of project source folders and dependencies

One little JHOVE2 "gotcha" when trying to run test code from inside the IDE is that the config and config/droid paths must be appended to the CLASSPATH. When running from an install of the release binaries, those dirs are included in the env.sh or env.cmd file that defines the runtime CLASSPATH. The env.* file is invoked when you run jhove2.sh or jhove2.cmd.

For Eclipse see the FAQ on how to configure your classpath for JHOVE2CommandLine and Junit Tests

For Intellij IDEA , the trick is to make sure those that the config and config/droid folders are specified to be source locations, using the following menu:

  • File/Project Structure/Sources

    • make sure that the following paths are listed as sources:
      • src/main/java
      • src/main/resources
      • config (added by right clicking on the folder name and selecting sources from the popup list)
      • config/droid (added by right clicking on the folder name and selecting sources from the popup list)
    • test sources are
      • src/test/java
      • src/test/resources
    • excluded folder is
      • target
  • The Paths tab should specify the location for compiler output

    • output path: target/classes
      • test output path: target/test-classes

The Maven Project Object Model (POM) file

The pom.xml file in the project's root directory is the configuration file that tells Maven all about the code project. It begins with a section that specifies the overall project properties

	<modelVersion>4.0.0</modelVersion>
	<groupId>org.jhove2</groupId>
	<artifactId>jhove2</artifactId>
	<version>2.1.0</version>
	<name>JHOVE2</name>
	<description>JHOVE2: a next-generation architecture for
		format-aware digital object preservation processing</description>
  • modelVersion = the version of Maven's project object model (usually 4.0.0 for Maven 2.x POMs).
  • groupId = A universally unique identifier for a project. It is normal to use a fully-qualified package name to distinguish it from other projects with a similar name (eg. org.apache.maven).
  • artifactId = The identifier used for jars and binary distribution artifacts produced by the build operation.
  • version = The current version of the jar or other artifacts produced by this project.
  • name = The full name of the project.
  • description = A detailed description of the project, used by Maven whenever it needs to describe the project

references: Maven: The Complete Reference: 9.2. Maven Properties Maven Model

Project Dependencies

The next section of the POM file specified the java library (jar file) dependencies that this project has. For example

		<dependency>
			<groupId>commons-logging</groupId>
			<artifactId>commons-logging-api</artifactId>
			<version>1.1</version>
		</dependency>

specifies that version 1.1 of the commons-logging library is required. The artifactId is used as a key to reference the item in a remote or local maven repository.

Project Licence

	<licenses>
		<license>
			<name>Apache 2</name>
			<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
			<distribution>repo</distribution>
			<comments>A business-friendly OSS license</comments>
		</license>
	</licenses>

This section describes the licenses for this project. The above text is currently part of the JHOVE2 pom.xml file, but is in conflict with the actual BSD license specified in src/main/resources/LICENSE.txt.

Artifact Repositories

Our pom.xml file contains a pointer to several remote repositories that are searched to locate java library jar files to be downloaded and cached in a local repository in our personal workspace. from Maven - Introduction to Repositories:

A repository in Maven is used to hold build artifacts and dependencies of varying types. There are strictly only two types of repositories: local and remote. The local repository refers to a copy on your own installation that is a cache of the remote downloads, and also contains the temporary build artifacts that you have not yet released.

	<repositories>
		<!-- JWAT -->
		<repository>
			<id>Sonatype-releases</id>
			<name>Sonatype Release Repository</name>
			<url>https://oss.sonatype.org/content/repositories/releases/</url>
		</repository>
		<repository>
			<id>Sonatype-snapshots</id>
			<name>Sonatype Snapshot Repository</name>
			<url>https://oss.sonatype.org/content/repositories/snapshots</url>
		</repository>
		<!-- for jargs GNU Command Line Parser -->
		<repository>
			<id>JBOSS</id>
			<name>JBoss Repository</name>
			<url>https://repository.jboss.org/nexus/content/groups/public/</url>
		</repository>
		<!-- for app assembler -->
		<repository>
			<id>Codehaus</id>
			<name>Codehaus Repository</name>
			<url>http://repository.codehaus.org/</url>
		</repository>
		<repository>
			<id>osgeo</id>
			<name>Open Source Geospatial Foundation Repository</name>
			<url>http://download.osgeo.org/webdav/geotools/</url>
		</repository>
		<!--  for berkeley db je -->
		<repository>
			<id>oracleReleases</id>
			<name>Oracle Released Java Packages</name>
			<url>http://download.oracle.com/maven</url>
			<layout>default</layout>
		</repository>
	</repositories>

We reference multiple artifact repositories because some of the jar files we depend upon may only be available from one source location. For example the jargs.jar file used to pars the command line is procured from a repository managed by the JBOSS project. We are currently using the URL https://repository.jboss.org/nexus/content/groups/public/ to specify that repository location. (As recommended in the Javalobby article Looking for JBoss Maven Repository?). Others had suggested using https://oss.sonatype.org/content/repositories/JBoss per a Sonatype Press Release, but that location turned out not to include the jargs library.

JHOVE2 Issue #128 discusses this history and the rationale for using the jboss.org URL

see also: MavenSettings | JBoss Community Maven Repository | JBoss Community

Project Folder Locations

The remainder of our POM file details the custom behavior that we want to happen when we compile, test, or package our application using Maven commands. The build section begins with

	<sourceDirectory>src/main/java</sourceDirectory>
	<testSourceDirectory>src/test/java</testSourceDirectory>
	<outputDirectory>target/classes</outputDirectory>
	<testOutputDirectory>target/test-classes</testOutputDirectory>
	<resources>
		<resource>
			<directory>src/main/resources</directory>
		</resource>
	</resources>
	<testResources>
		<testResource>
			<directory>src/test/resources</directory>
		</testResource>
	</testResources>

wherein we specify the location of the java source code files, the junit test files, the resources for both, and the location in which to place compiler output.

The Maven Build Lifecyle

from Maven - Introduction to the Build Lifecycle:

Maven 2.0 is based around the central concept of a build lifecycle. What this means is that the process for building and distributing a particular artifact (project) is clearly defined. ... There are three built-in build lifecycles: default, clean and site. The default lifecycle handles your project deployment, the clean lifecycle handles project cleaning, while the site lifecycle handles the creation of your project's site documentation.Each of these build lifecycles is defined by a different list of build phases, wherein a build phase represents a stage in the lifecycle.

The default lifecycle has the following build phases:

  • validate - validate the project is correct and all necessary information is available
  • compile - compile the source code of the project
  • test - test the compiled source code using a suitable unit testing framework. These tests should not require the code be packaged or deployed
  • package - take the compiled code and package it in its distributable format, such as a JAR.
  • integration-test - process and deploy the package if necessary into an environment where integration tests can be run
  • verify - run any checks to verify the package is valid and meets quality criteria
  • install - install the package into the local repository, for use as a dependency in other projects locally
  • deploy - done in an integration or release environment, copies the final package to the remote repository for sharing with other developers and projects.

These build phases are executed sequentially to complete the default lifecycle. … you only need to call the last build phase to be executed, [and the prerequisite phases will be called as well as the specified build phase ]

Maven Plugins

From: Maven: The Complete Reference: 1.4. Universal Reuse through Maven Plugins:

Maven has been designed to delegate most responsibility to a set of Maven Plugins … Most of the action in Maven happens in plugin goals which take care of things like compiling source, packaging bytecode, publishing sites, and any other task which need to happen in a build. The Maven you download from Apache doesn’t know much about packaging a WAR file or running JUnit tests; most of the intelligence of Maven is implemented in the plugins and the plugins are retrieved from the Maven Repository. … The Maven Surefire plugin is the plugin that is responsible for running unit tests. From Apache Maven - Wikipedia:

A plugin provides a set of goals that can be executed using the following syntax: mvn [plugin-name]:[goal-name] For example, a Java project can be compiled by running mvn compiler:compile.

Maven Compiler Plugin

Our project's specification for the compiler plugin looks like this:

	<plugin>
		<groupId>org.apache.maven.plugins</groupId>
		<artifactId>maven-compiler-plugin</artifactId>
		<version>2.1</version>
		<configuration>
			<source>1.6</source>
			<target>1.6</target>
		</configuration>
	</plugin>

This specifies that we expect the source code to be compatible with Java 1.6 and the output jar files will require a Java 1.6 runtime engine.

Maven Surefire Plugin

The surefire plugin (used for running junit tests) is told to add the config and config/droid folders to the classpath before running the tests.

	<plugin>
		<groupId>org.apache.maven.plugins</groupId>
		<artifactId>maven-surefire-plugin</artifactId>
		<version>2.5</version>
		<configuration>
			<additionalClasspathElements>
				<additionalClasspathElement>config/</additionalClasspathElement>
				<additionalClasspathElement>config/droid</additionalClasspathElement>
			</additionalClasspathElements>
		</configuration>
	</plugin>

Maven Dependency Plugin

The dependency plugin's goal outputs a classpath string of dependencies from the local repository to a file or log.

	<plugin>
		<groupId>org.apache.maven.plugins</groupId>
		<artifactId>maven-dependency-plugin</artifactId>
		<build-classpath>
		<!-- build a UNIX classpath  -->
		<execution>
			<id>build-unix-classpath</id>
			<phase>prepare-package</phase>
			<goals>
				<goal>build-classpath</goal>
			</goals>
			<configuration>
				<prefix>$JHOVE2_HOME/lib</prefix>
				<!-- Unix uses / as file separator and : as a path separator -->
				<pathSeparator>:</pathSeparator>
				<fileSeparator>\/</fileSeparator>
				<outputFilterFile>true</outputFilterFile>
				<outputFile>${project.basedir}/src/main/assembly/classpath_sh.properties</outputFile>
			</configuration>
		</execution>
		<!-- build a Windows classpath  -->
		<execution>
			<id>build-windows-classpath</id>
			<phase>prepare-package</phase>
			<goals>
				<goal>build-classpath</goal>
			</goals>
			<configuration>
				<prefix>%JHOVE2_HOME%\lib</prefix>
				<!-- Windows uses \ as file separator and ; as a path separator -->
				<pathSeparator>;</pathSeparator>
				<fileSeparator>\\</fileSeparator>
				<outputFilterFile>true</outputFilterFile>
				<outputFile>${project.basedir}/src/main/assembly/classpath_cmd.properties</outputFile>
			</configuration>
		</execution>
		</executions> 
	</plugin>

We define separate executions for the build-classpath goal in order to support both unix and windows search path and file path separator types. This causes 2 files to be generated in the src/main/assembly folder:

  • classpath_sh.properties
  • classpath_cmd.properties

Maven Jar Plugin

This plugin provides the capability to build and sign a jar archive

	<plugin>
		<groupId>org.apache.maven.plugins</groupId>
		<artifactId>maven-jar-plugin</artifactId>
		<version>2.3</version>
		<configuration>
			<archive>
				<addMavenDescriptor>false</addMavenDescriptor>
				<!-- make jar executable -->
				<manifest>
					<mainClass>org.jhove2.app.JHOVE2CommandLine</mainClass>
				</manifest>
				<manifestEntries>
					<Sealed>false</Sealed>
				</manifestEntries>
			</archive>
			<forceCreation>true</forceCreation>
			<finalName>${project.build.finalName}</finalName>
		</configuration>
	</plugin>

We indicate here that we want to make the jar file executable and which class contains the main method we would like to call when we give the command "java -jar jhove2.jar".

The directive was changed from false to true in order to fix an error message that occurred when the released application was run from the command line to characterize a zip file containing a shape file (worldBorders.zip). It ran OK in Eclipse, but not after being built sung maven and then invoked from the command line.

j2:value[ERROR/PROCESS] Shapefile could not be parsed: Shapefile could not be parsed: sealing violation: package org.geotools.data.shapefile is sealed</j2:value>

The problem turn out to be related to our inclusion of the org.geotools.data.shapefile package inside of our source tree. The Shapefile Module relies on some protected methods in the geotools library, and the only way to get access to those methods was by placing some code inside that same package name. We don’t have this problem with org.uk.gov.nationalarchives.droid stuff because we are not using those jars - -we just replicated their code.

Maven Javadoc Plugin

The Javadoc Plugin uses the Javadoc tool to generate javadocs for the specified project.

	<plugin>
		<groupId>org.apache.maven.plugins</groupId>
		<artifactId>maven-javadoc-plugin</artifactId>
		<version>2.6.1</version>
		<configuration></configuration>
	</plugin>

Maven Assembly Plugin

The Assembly Plugin allows users to aggregate the project output along with its dependencies, modules, site documentation, and other files into a single distributable archive.

	<plugin>
		<artifactId>maven-assembly-plugin</artifactId>
		<version>2.2.2</version>
		<executions>
			<execution>
				<id>distro-assembly</id>
				<phase>package</phase>
				<goals>
					<goal>single</goal>
				</goals>
				<configuration>
					<descriptors>
						<descriptor>src/main/assembly/jhove2_release.xml</descriptor>
					</descriptors>
					<appendAssemblyId>false</appendAssemblyId>
					<tarLongFileMode>gnu</tarLongFileMode>
				</configuration>
			</execution>
		</executions>
	</plugin>

The element points to "src/main/assembly/jhove2_release.xml" as the location of the recipe and input file locations to use for creating the distribution package. The package ends up in the "target" folder which I assume is a maven default, since it is not explicitly specified.

Producing the Release

Release Checklist

Create or Revise Documentation

  • Review the wiki for completeness (e.g. new module documentation)
  • Review the status of the tickets assigned to this release in the Issue Tracker
  • Draft Release Notes, including lists of issue fixes and known bugs
  • Draft an announcement message for the various mailing lists

For the Bitbucket jhove2/stage repository:

  • Merge all stable committer changesets into a local clone of the stage repository
  • Run all unit tests
  • Check displayer and message property files sung automated scans
  • Build the jhove2.jar file and distribution packages (zip & tar.gz)
    • Use maven build commands outside of the IDE environment and/or
    • Use the IDE's maven integration to build the package
  • Push the source code up to Bitbucket
  • Upload the package files and release notes to the Bitbucket downloads page
  • Test package installs on all platforms
  • Run volume tests, if possible
  • Tag the repository with an appropriate tag, e.g. "JHOVE2 2.1.0 distribution"
  • Push the tag up to Bitbucket

For the Bitbucket jhove2/main repository:

  • Merge the stage changesets into a local clone of the main repository
  • Run all unit tests
  • Push the source code up to Bitbucket
  • Upload the package files and release notes to the Bitbucket downloads page
  • Test package installs on a few platforms
  • Make sure the issue tracker is updated to mark issues Resolved
  • Announce the release
  • Take a well-deserved vacation

Pull and Push Changesets using Client Software

This author (Richard Anderson) is currently using MacHg and Intellij Idea as his primary tools for release build management. I will shift now into a first person narrative to describe the actions that took place during the creation of a build for the version 2.1.0 release.

MacHg allows one to create representations of multiple remote and local repositories and visualize the status of the commit relationships between them. For example I have created a proxy called "jhove2-main"" that represents https://bitbucket.org/jhove2/main. I was then able to clone that remote repository and create a local copy in my workspace. I have similar pairings of server and local repositories for the https://bitbucket.org/jhove2/stage and for each of jhove2 committer's project forks.

This arrangement helps me to see at a glance the relative ingoing/outgoing (pull/push) changeset status of each project fork and clone, and it helps simplify the process of pulling together changesets from other committers, then pushing the merged project up to bitbucket.

I have based the version 2.1.0 release candidate on Sheila Morrissey's fork of the project. I have cloned her fork locally, and then pull her latest changesets into my local "rnanders" copy of my moabrichard project fork. To do this using MacHg, I click on the name of my local repo in the repository list, then click on the "Pull" button on the MacHg toolbar. A window pops up asking me to specify which other repository to pull from, and I select my locally cloned copy of Sheila's fork. There is now an arrow in this dialog pointing from my "sheila" folder to my "rnanders" folder with the number 41 next to it. This is a count of how many changeset I will be pulling. I click on the Advanced Options button and click the "Update after Pull" checkbox, then click on the Pull button at the bottom right of the dialog. I then get a Results window that conforms the pull's success:

	pulling from /Users/rnanders/jhove2/sheila
	searching for changes
	adding changesets
	adding manifests
	adding file changes
	added 41 changesets with 689 changes to 429 files
	424 files updated, 0 files merged, 2 files removed, 0 files unresolved

In order to keep my rnanders local and remote in sync and to get more familiar with the MacHg push mechanism, I pushed the new changesets from my updated local rnanders repo to the rnanders (moabrichard) remote. The MacHg push dialog is similar to the pull dialog, except for the direction of the arrow and the difference in options.

	pushing to https://moabrichard:***@bitbucket.org/moabrichard/jhove2
	searching for changes
	remote: adding changesets
	remote: adding manifests
	remote: adding file changes
	remote: added 49 changesets with 705 changes to 441 files
	remote: bb/acl: moabrichard is allowed. accepted payload.

Since that all worked OK, I will repeat the above cycle with the jhove2/stage project fork

Result of pulling from local rnanders into local stage:

	pulling from /Users/rnanders/jhove2/rnanders
	searching for changes
	adding changesets
	adding manifests
	adding file changes
	added 41 changesets with 689 changes to 429 files
	424 files updated, 0 files merged, 2 files removed, 0 files unresolved

Result of pushing from local stage to bitbucket jhove2/stage:

	pushing to https://moabrichard:***@bitbucket.org/jhove2/stage
	searching for changes
	remote: adding changesets
	remote: adding manifests
	remote: adding file changes
	remote: added 41 changesets with 689 changes to 429 files
	remote: bb/acl: moabrichard is allowed. accepted payload.

Run the unit tests

Each module developer is expected to supply junit tests and sample files (fixtures) that exercise the modules. Ideally, we'd use coverage instrumentation to ensure that each line of source code is exercised.

You can use the IDE to run selected tests and/or Maven to run the entire test suite

The only unit tests that failed in my environment were the ones which depend on ongsmls. I was going to install OpenSP and try again, but I prefer not to use macports, and there doesn't seem to be a homebrew package install available.

Review displayer and message properties files

A module developer also needs to supply are displayer and message properties files. Sheila created a tool to scan the java code for message properties file keys (basically looks for new Message() in classes) –and then compares what it finds to the message keys in the message properties files

The class is org.jhove2.app.util.messages.MessagesChecker; the two parameters to main() method are full path to properties file, and full path to base directory where you want all .java files checked

[ToDo: add more details to this section]

Attempt "mvn package" from the command line

It should be possible to build a release package by cd'ing to the project root directory and issuing a "mvn" command.

Because OpenSp is not installed in my location, a few of the junit tests fail. This also caused the following build command (which had worked for Marisa) to fail with an error message enumerating the failed unit tests.

	mvn clean org.apache.maven.plugins:maven-javadoc-plugin:2.6.1:javadoc org.apache.maven.plugins:maven-assembly-plugin:2.2-beta-5:assembly

Marisa's suggested the following which should skip the tests:

    mvn assembly:assembly -DskipTests

But this too failed for me, with an error message:

    [ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.2.2:assembly (default-cli) on project jhove2: Error reading assemblies: No assembly descriptors found. -> [Help 1]

Nicholas suggested these commands:

	* mvn –DskipTests clean package
	* mvn –DskipTests=true clean package
	* mvn –DskipTests clean org.apache.maven.plugins:maven-javadoc-plugin:2.6.1:javadoc org.apache.maven.plugins:maven-assembly-plugin:2.2-beta-5:assembly

Unfortunately, I could not get any of those to work for me eiher. Instead I got the error:

	[INFO] Scanning for projects...
	[INFO]                                                                        
	[INFO] ------------------------------------------------------------------------
	[INFO] Building JHOVE2 2.1.0
	[INFO] ------------------------------------------------------------------------
	[INFO] ------------------------------------------------------------------------
	[INFO] BUILD FAILURE
	[INFO] ------------------------------------------------------------------------
	[INFO] Total time: 0.166s
	[INFO] Finished at: Wed Feb 13 12:51:52 MST 2013
	[INFO] Final Memory: 2M/81M
	[INFO] ------------------------------------------------------------------------
	[ERROR] Unknown lifecycle phase "?DskipTests". You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
	[ERROR] 

An alternative strategy from Nicholas would be to add the following to the pom.xml or the .m2/settings.xml and used with –P alternative.

  <profiles>
    <profile>
      <id>alternative</id>
      <properties>
        <maven.test.failure.ignore>true</maven.test.failure.ignore>
      </properties>
    </profile>
  </profiles>

I did not try this last approach, but ended up using my IDE to perform the package build.

Build the binary distribution package

  • Opened the Maven Projects dialog (Window/Tool Windows/Maven Projects)

  • Right clicked on JHOVE2/Lifecycle/Package and selected "Run Maven Build"

  • If you did not set the maven home directory correctly you will get the error:

      Error running jhove2 [package]: No valid Maven installation found. 
      Either set the home directory in the configuration dialog or set the M2_HOME environment variable on your system.
    
  • If you do not have OpenSP installed and do not toggle the "skip tests" mode, then you will get a build error

      Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.5:test (default-test) on project jhove2: There are test failures.
    

Here is the output from a successful release build:

	/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home/bin/java -Dclassworlds.conf=/usr/share/java/maven-3.0.4/bin/m2.conf -Dmaven.home=/usr/share/java/maven-3.0.4 -Dfile.encoding=MacRoman -classpath /usr/share/java/maven-3.0.4/boot/plexus-classworlds-2.4.jar org.codehaus.classworlds.Launcher --no-plugin-registry --fail-fast --no-plugin-updates --strict-checksums --update-snapshots -DskipTests=true -f /Users/rnanders/jhove2/rnanders/pom.xml package
	[WARNING] Command line option -npu is deprecated and will be removed in future Maven versions.
	[WARNING] Command line option -npr is deprecated and will be removed in future Maven versions.
	[INFO] Scanning for projects...
	[INFO]                                                                         
	[INFO] ------------------------------------------------------------------------
	[INFO] Building JHOVE2 2.1.0
	[INFO] ------------------------------------------------------------------------
	[INFO] 
	[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ jhove2 ---
	[debug] execute contextualize
	[INFO] Using 'UTF-8' encoding to copy filtered resources.
	[INFO] Copying 19 resources
	[INFO] 
	[INFO] --- maven-compiler-plugin:2.1:compile (default-compile) @ jhove2 ---
	[INFO] Nothing to compile - all classes are up to date
	[INFO] 
	[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ jhove2 ---
	[debug] execute contextualize
	[INFO] Using 'UTF-8' encoding to copy filtered resources.
	[INFO] Copying 369 resources
	[INFO] 
	[INFO] --- maven-compiler-plugin:2.1:testCompile (default-testCompile) @ jhove2 ---
	[INFO] Nothing to compile - all classes are up to date
	[INFO] 
	[INFO] --- maven-surefire-plugin:2.5:test (default-test) @ jhove2 ---
	[INFO] Tests are skipped.
	[INFO] 
	[INFO] --- maven-dependency-plugin:2.1:build-classpath (build-unix-classpath) @ jhove2 ---
	[INFO] Wrote classpath file '/Users/rnanders/jhove2/rnanders/src/main/assembly/classpath_sh.properties'.
	[INFO] 
	[INFO] --- maven-dependency-plugin:2.1:build-classpath (build-windows-classpath) @ jhove2 ---
	[INFO] Wrote classpath file '/Users/rnanders/jhove2/rnanders/src/main/assembly/classpath_cmd.properties'.
	[INFO] 
	[INFO] --- maven-jar-plugin:2.3:jar (default-jar) @ jhove2 ---
	[INFO] Building jar: /Users/rnanders/jhove2/rnanders/target/jhove2-2.1.0.jar
	[INFO] 
	[INFO] --- maven-assembly-plugin:2.2.2:single (distro-assembly) @ jhove2 ---
	[INFO] Reading assembly descriptor: src/main/assembly/jhove2_release.xml
	[INFO] Building tar : /Users/rnanders/jhove2/rnanders/target/jhove2-2.1.0.tar.gz
	[INFO] Building zip: /Users/rnanders/jhove2/rnanders/target/jhove2-2.1.0.zip
	[INFO] ------------------------------------------------------------------------
	[INFO] BUILD SUCCESS
	[INFO] ------------------------------------------------------------------------
	[INFO] Total time: 36.275s
	[INFO] Finished at: Wed Mar 06 17:14:17 MST 2013
	[INFO] Final Memory: 8M/81M
	[INFO] ------------------------------------------------------------------------
	
	Process finished with exit code 0

The resulting output is placed into the project's target folder specifically the package files are named:

  • jhove2-2.1.0.jar
  • jhove2-2.1.0.zip
  • jhove2-2.1.0.tar.gz

Upload binary distribution files for the release

After pushing the source code changesets for the new release up to bitbucket stage or main fork, it is time to upload the binary files that have been generated from the source code project.

The stage fork's download's page is: downloads.
It includes a button allowing you to upload files. The process is very simple and self explanatory

jhove2-2.1.0.zip and jhove2-2.1.0.tar.gz are now available for download by anyone.

Add Release Notes

Release notes in MS Word and PDF formats were also uploaded to the same downloads page on bitbucket: JHOVE2-2.1.0-release-notes.docx and JHOVE2-2.1.0-release-notes.pdf

Other documentation changes (such as new module specifications) were added by committers directly to the the main project wiki.

Tag the Release

The Committers should have uniformly set the version and release date fields of all module classes to the current release values, e.g.

    /** Module version identifier. */
    public static final String VERSION = "2.1.0";

    /** Module release date. */
    public static final String RELEASE = "2013-02-11";

Mercurial lets you give a permanent name to any revision using the hg tag command. The command "hg tags" gives you a list of existing tags:

	tip                              687:02302f4df5e1
	stable_tag                       645:e89363e5d24f
	Production Release v2.0.0 04192011   642:3106854c8d4d
	Production Release v2.0.0 04122011   640:f35c1ef002e2
	Codeline Release 12032010        539:6cc9b5736ea3
	JHOVE2-2.0.0-Beta                452:9bce90337401
	release-v0.6.0/trunk             283:d255560d2858
The tip tag is a special "floating" tag, which always identifies the newest revision in the repository.

To create a new tag:

    hg tag 'Production Release v2.1.0 2013-03-11' 
    followed by a "hg push"

reference: Managing releases and branchy development

Test the Release Binaries

The release binaries should be tested on as many of the following platforms as possible

  • Linux (32 and 64 bit)
  • Macintosh OSX 6,7,8
  • SunOS (Solaris) 5
  • Windows XP
  • Windows 7 (32 and 64 bit)
  • Windows Server 2003

The JHOVE2 Users Guide provides help for installing and running the JHOVE2 binary. Note that Java 1.6 or later is required. Otherwise you will see an error message such as:

    Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad version number in .class file

To download the install package to a unix or linux system use curl or wget:

    curl -k -O 'https://bitbucket.org/jhove2/main/downloads/jhove2-2.1.0.tar.gz'
    or
    wget --no-check-certificate 'https://bitbucket.org/jhove2/main/downloads/jhove2-2.1.0.tar.gz'

Then extract the files from the package using one of the following commands. (GNU tar has the -z option)

	tar -xzf jhove2-2.1.0.tar.gz  
	or 
	gunzip -c jhove2-2.1.0.tar.gz | tar -xf -

If you see error messages that look like the following and you find a file called @LongLink in the directory

	tar: ././@LongLink: typeflag 'L' not recognized, converting to regular file

then you are encountering an incompatibility between the GNU tar used to create the archive and the Solaris tar that you are trying to use for extraction. Solaris tar cannot handle very long file names. If your sunos host has GNU tar (gtar) installed you should use it instead of solaris' tar. Or you can try using the Pax utility.

For technical details of GNU tar's @LongLink 'trick' see:

From the latter of the above references:

..pre-pax tar formats had limits on the name and linkname fields in the tar header. GNU tar avoided these limits by writing out a dummy file before the file which needed the long name or linkname. The dummy file is named ././@LongLink. The type field is 'L' for a long name or 'K' for a long link name. The contents of the dummy file are the value to use for the appropriate field in the following non-longlink header.

Stage looks OK, Push to Main

Result of pulling local "stage" to local "main". Note that the 42nd commit is the new tag recently added.

	pulling from /Users/rnanders/jhove2/stage
	searching for changes
	adding changesets
	adding manifests
	adding file changes
	added 42 changesets with 690 changes to 430 files
	(run 'hg update' to get a working copy)

Pushing to the bitbucket required me to use the command line to get the authentication to work

	 hg push
	pushing to https://[email protected]/jhove2/main
	warning: bitbucket.org certificate with fingerprint 24:9c:45:8b:9c:aa:ba:55:4e:01:6d:58:ff:e4:28:7d:2a:14:ae:3b not verified (check hostfingerprints or web.cacerts config setting)
	searching for changes
	http authorization required
	realm: Bitbucket.org HTTP
	user: moabrichard
	password: 
	warning: bitbucket.org certificate with fingerprint 24:9c:45:8b:9c:aa:ba:55:4e:01:6d:58:ff:e4:28:7d:2a:14:ae:3b not verified (check hostfingerprints or web.cacerts config setting)
	remote: adding changesets
	remote: adding manifests
	remote: adding file changes
	remote: added 42 changesets with 690 changes to 430 files
	remote: bb/acl: moabrichard is allowed. accepted payload.
	warning: bitbucket.org certificate with fingerprint 24:9c:45:8b:9c:aa:ba:55:4e:01:6d:58:ff:e4:28:7d:2a:14:ae:3b not verified (check hostfingerprints or web.cacerts config setting)

Also uploaded the release notes.

Make sure that issues are correctly updated

The Build manager should check all issues that are tagged with the release version number to ensure that the status has been updated to resolved correctly. An automated mechanism on Bitbucket will do this update automatically if the changesets being pushed to the main repository have commit messages that are in the proper syntax

If not, then one can manually mark an issues resolved by clicking on the Resolve button while displaying the issue detail. This provides one with the opportunity to enter text such as the following in a comment:

    Fixed by <<changeset e0b041e7f8f8>>   

See issue #54 for an example of where this is used.