Skip to content
fmondaca edited this page Aug 17, 2015 · 1 revision

1. Database structure

The structure of the database used in Maalr is rather simple: Each dictionary entry is represented as a 'document', which contains an arbitrary number of key-value pairs (such as 'english_word=nose', 'german_word=Nase', etc.). There are no limitations on the number of key-value pairs, and it is neither required to define the keys, and all of them are optional, such that, for instance, it is possible to use different keys to describe different word classes. Imagine a spread sheet document where each entry is represented in a single row, and where some of the columns contain values, whereas some don't.

Dictionary entries are versioned: Whenever an entry is changed, a modified copy of the data is created and stored. This allows editors to view the modification history of an entry, and, for instance, to revert to an older version.

2. Index setup

All dictionary entries are stored in a database (MongoDB) as well as in a search-optimized index (Lucene). Whereas the database does not have to be configured, the index requires some setup, depending on the kind of queries the application should support. The most important configuration file is 'searchconfig.xml', which is described below.

2.1 searchconfig.xml

2.1.1 Configuring the index

The first part of this configuration file contains the index configuration, that is, it describes which columns of the database will be used in the dictionary.

The index configuration might look like this:

<databaseColumns>
    <column name="english_word"/>
    <column name="german_word"/>
…
</databaseColumns>

By default, the content of a column is interpreted 'as is'. In case a column represents multiple, comma-separated values, this can be expressed by adding the attribute 'type="CSV"' to a column.

2.1.2 Configuring queries

The easiest way to provide a search interface is a single text field and a 'submit'-button. However, even in that case it is not clear where to search for the entered text: In one specific column, in all columns, or in a subset of the available columns. Besides, it is not clear HOW to search in the field(s): Lucene offers different query types (such as a 'PrefixQuery' to find all entries which start with the search term, a 'TermQuery' for exact matches, or a 'FuzzyLikeThisQuery' for soft matching). To tackle this, Maalr uses the concept of 'Field Selectors' and 'Query Builders', which will be explained in a simple scenario, by defining a search form as shown below:

TODO: Add screenshot

Imagine a dictionary query interface for the example used above (that is, a very simply dictionary which consists of the two fields 'german_word' and 'english_word') which provides a single text field to enter a search phrase, and in addition two combo boxes. One of them can be used to define the language direction (for instance, 'English->German', 'German->English', or 'Both'), and the other one defines the query type (for instance, 'Exact' or 'Default').

The query key would be defined like that:

 <queryKeys>
 <queryKey id="searchPhrase" />
 </queryKeys>

Note that the key has an id which is neither 'english_word' nor 'german_word' but 'searchPhrase' instead. This is because it is not yet clear where to search for the input - it depends on the column selection made by the user. Therefore, this choice (and all possible options) would have to be configured:

 <columnSelectors>
	<columnSelector id="language">
		<options>
			<option id="german">
				<columns>
					<column reference="german_word" />
				</columns>
			</option>
			<option id="english">
				<columns>
					<column reference="english_word" />
				</columns>
			</option>
			<option id="both" default="true">
				<columns>
					<column reference="german_word" />
					<column reference="english_word" />
				</columns>
			</option>
		</options>
	</columnSelector>
</columnSelectors>

Through this configuration snippet, a query parameter named 'language' is defined, which is related to the value of the item with the id 'searchPhrase'. Its motivation is to define where to search for the value assigned to 'searchPhrase', by listing all options, and within the options all fields to search in the index. Note that the list of fields is interpreted disjunctive: In case of the 'both'-option (which also is the default option, defined through the attribute 'default="true'), the search phrase should occur in either 'german_word' OR 'english_word', not in 'german_word' AND 'english_word'.

The definition of how to create and execute the query could look as the following:

<queryBuilders>
	<queryBuilder id="method" queryKeyId="searchPhrase" columnSelectorId="language">
		<options>
			<option id="default" default="true" preset="DEFAULT" />
			<option id="infix" preset="INFIX" />
			<option id="prefix" preset="PREFIX" />
			<option id="suffix" preset="SUFFIX" />
			<option id="exact" preset="EXACT" />
		</options>
	</queryBuilder>
</queryBuilders>

The queryBuilder combines user-specified input (referenced by 'queryKeyId') with a column selection (referenced by 'columnSelectorId') and a search mode (through the list of options). When Maalr receives a query like

 searchPhrase=Eagle & language=english & method=prefix

it can analyze the chosen query builder option and column selector option and perform the query on the required fields. As both query builder and column selector must define default-options, the same hold for a query like

searchPhrase=Eagle

which would be expanded to

 searchPhrase=Eagle & language=both & method=default

Maalr offers the query builder options listed above, which can be referred to via the 'preset' attribute. However, if a special kind of query is required, a custom query builder can be reqistered:

<queryBuilder ...>
 <options>
         <option id="prefix" builderClass="fully.qualified.ClassName" />
         ...
     </options>
</queryBuilder>

In case you want to create a custom query builder, have a look at de.uni_koeln.spinfo.maalr.lucene.config.interpreter.MaalrQueryBuilder and its subclasses for details.

To create more complex query forms, you can repeat the steps described above and add more query keys, column selectors and query builders. However, sometimes a column selector might be dependent on another selector. In that case, the dependency can be specified by using the attribute 'depends', as shown below.

	<columnSelector id="gender" depends="language">
		<options>
			<option id="german">
				<columns>
					<column reference="german_gender" />
				</columns>
			</option>
			<option id="english">
				<columns>
					<column reference="english_gender" />
				</columns>
			</option>
			<option id="both" default="true">
				<columns>
					<column reference="german_gender" />
					<column reference="english_gender" />
				</columns>
			</option>
		</options>
	</columnSelector>

Note that, in case you define a dependency, the option-ids must be identical to the ids of the referenced selector. Also note that it is not possible to define recursive dependencies: If column selector A depends on selector B, B cannot depend on a selector C.

2.2 Configuring language-based column mapping

Now that queries can be interpreted, it must be defined which columns belong to which language, and how those columns shall be represented in editors as well as in the list of results. The following gives an example:

<language id="german" mainColumn="german_word">
	<editors>
		<frontend_editor>
			<fields>
				<field column="german_word"></field>
			</fields>
		</frontend_editor>
		<backend_editor>
			<fields>
				<field column="german_word" allowNull="false"></field>
			</fields>
		</backend_editor>
	</editors>
	<results>
		<fields>
			<field column="german_word" format="{0}"></field>
			<field column="german_gender" format="<i>({0})</i>"></field>
		</fields>
	</results>
</language>
<language id="english" mainColumn="english_word">
	<editors>
		<frontend_editor>
			<fields>
				<field column="english_word"></field>
			</fields>
		</frontend_editor>
		<backend_editor>
			<fields>
				<field column="english_word" allowNull="false"></field>
			</fields>
		</backend_editor>
	</editors>
	<results>
		<fields>
			<field column="english_word" format="{0}"></field>
			<field column="english_gender" format="<i>({0})</i>"></field>
		</fields>
	</results>
</language>
</languages>

For each language, the fields shown in the frontend-editor and in the backend-editor must be defined. The frontend-editor is presented to visitors who want to suggest a new entry or a modification, whereas the backend-editor can only be used after login-in.

Additionally, the list of fields shown in the result list must be defined, as well as their format. In the example above, english entries will be displayed as the value of the english_word column, followed by the entry in the english_gender column, if any. Note that the values can be formatted: In the example above, english_gender would be shown in italic, surrounded with brackets as in _(gender)_.

Note that for each language, the 'mainColumn' must be defined - this is the column which contains the actual word or phrase, and it is used to automatically generate the alphabetic index as well as all related pages.

2.3. Configuring Search Interfaces

Once the configuration files described in section 2 are defined, the user interface can be set up. 4 different use cases have to be defined: Visitors using the 'default' search UI, visitors using the 'advanced' search UI, and Editors using the 'default' or the 'advanced' search UI. The syntax of all elements is identical and the configuration is not very difficult, as the example below demonstrates:

When the configuration is converted to a html form, the items are rendered from top to bottom. The id of each uifield either refers to an id in the search configuration (no matter is it a query key, a column selector, or a query builder), or it refers to a build-in function provided by Maalr ('highlight', 'pageSize' and 'suggestions' in the example above). The build-in functions are responsible to highlight the search phrase or to show unverified changes in the list of results and do not need any additional configuration.

With the optional attribute 'submit="true"' a submit-button is shown next to the ui element it belongs to (Note that only one submit-attribute is allowed in an UI-configuration).

The 'type' attribute defines how the field is rendered, and can be TEXT, RADIO, CHECKBOX, COMBO, or ORACLE. RADIO and COMBO are replaceable and support the field choices and query modifiers defined in the search configuration. TEXT fields are rendered as text boxes and should be used for free-text input, such as a search phrase. An ORACLE can be useful if a text field should not support free-text, but at the same time should support a large number of values, which cannot reasonably be displayed in a combo box (For instance, if one would assign one or more categories to each dictionary entry, out of a finite list of several hundreds of categories).

2.4 Troubleshooting

During startup, the dictionary configuration is analyzed and interpreted. During that phase, Maalr will try to detect logical errors in the configuration, and will log them as errors. When creating a new Maalr configuration, look for ERROR-level log statements made by the LuceneIndexManager:

ERROR LuceneIndexManager   | 2 errors have been detected in the configuration: 
ERROR LuceneIndexManager   |    Invalid ui field reference: There is not query key, column selector or query builder named language2 
ERROR LuceneIndexManager   |    Query Builder method does not have a unique id in the configuration 

If no errors have been detected, the following statement will be printed:

INFO LuceneIndexManager   | No errors have been detected in the configuration.

3. (Optionally) Defining Overlays

In some cases, it might be useful to offer additional information for a dictionary entry - for instance, declinations or conjugations. For such (optional) cases, Maalr provides 'overlays'. These are defined in a file named 'overlays.xml', which looks as follows:

<overlays>
<overlay firstLanguage="false" type="V">
	<form>maalr_config/overlays/conjugation.html</form>
	<editor>maalr_config/overlays/conjugation-editor.xml</editor>
</overlay>
<overlay firstLanguage="false" type="N">
...
</overlays>

An overlay-item defines the language it belongs to (first or second), and the type of overlay. One can define multiple types, but currently only one type can be assigned to a single dictionary entry. The overlay itself consists of two definitions: The form represented to the user, and an editor-configuration, to create or modify overlays. Note that the values which will be shown in an overlay must be defined in the search configuration, for instance

    <items source="imperfectsing1" dest="imperfectsing1" stored="true" analyzed="false" type="STRING"/>
    <items source="imperfectsing2" dest="imperfectsing2" stored="true" analyzed="false" type="STRING"/>
    <items source="imperfectsing3" dest="imperfectsing3" stored="true" analyzed="false" type="STRING"/>
    <items source="imperfectplural1" dest="imperfectplural1" stored="true" analyzed="false" type="STRING"/>
    <items source="imperfectplural2" dest="imperfectplural2" stored="true" analyzed="false" type="STRING"/>
    <items source="imperfectplural3" dest="imperfectplural3" stored="true" analyzed="false" type="STRING"/>
…

If a dictionary entry with an overlay is presented in the list of results, a links with the overlay type ('V' or 'N' in this example) will be shown next to the result. If the user clicks on this link, a dialog is shown, which contains the html snippet defined through the 'form' element in the overlay definition. This snippet may contain any arbitrary html, and will be transformed by replacing all variables (defined via ${FIELD_NAME}, such as ${imperfectsing1}). The following will give an example of an overlay form:

<h3>${german_word}</h3>
<br/>
<table cellspacing="7" cellpadding="0" width="100%">
<tr>
    <td width="25%">
        <b>Imperfect</b><br/>
        ${imperfectsing1}<br/>
        ${imperfectsing2}<br/>
        ${imperfectsing3}<br/>
        ${imperfectplural1}<br/>
        ${imperfectplural2}<br/>
        ${imperfectplural3}<br/>
    </td>
...

If one or more overlay types are defined, they can be assigned to a dictionary entry within the entry editor. The second element in the overlay definition ('editor') defines how the overlay can be modified. This configuration file describes how the overlay-values are rendered in the editor, as shown below:

<overlayEditor>
<rows>
	<row>
		<column id="present">
			...
		</column>
		<column id="imperfect">
			<item id="imperfectsing1"></item>
			<item id="imperfectsing2"></item>
			<item id="imperfectsing3"></item>
			<item id="imperfectplural1"></item>
			<item id="imperfectplural2"></item>
			<item id="imperfectplural3"></item>
		</column>
		<column id="conjunctive">
			...
		</column>
	</row>
	…
</rows>
</overlayEditor>

To simplify the creation of new overlays, a 'preset chooser' can be defined within an overlay editor. Given a particular word form (such as an infinitive), a preset option (such as a verb class) and of course a language- and use case-specific generator, regular forms can be generated semi-automatically. The related configuration looks as follows:

<presetchooser id="type" base="german_word" generator="de.uni_koeln.spinfo.maalr.overlays.generator.SomeVerbClassGenerator">
	<options>
		<option value="1"/>
		<option value="2"/>
		<option value="3"/>
	</options>
</presetchooser>

Note that the values defined in the attributes 'id' and 'base' must both be defined in the search configuration. This already is the case for 'german_word', but not yet for 'type', so one would have to add

<items source="type" dest="type" stored="true" analyzed="false" type="STRING"/>

to the search configuration.

The class referenced in the 'generator'-attribute must implement the interface 'de.uni_koeln.spinfo.maalr.common.server.IOverlayGenerator', which defines a single method:

HashMap<String, String> buildPreset(String presetId, String base) throws GenerationFailedException;

Preset option and word form will be passed to the method, and it should either return a HashMap containing the generated key-value-pairs or throw a GenerationFailedException.

4. Internationalization

Now that index structure, query mechanism and optional overlays have been defined and configured, the last step to complete the setup is the internationalization of the application. For each language which should be supported by the UI, two files have to be defined, in addition to two fallback-files which will be used if no language code has been provided. The default files are named 'lemma-description.properties' and 'user-searchui.properties', whereas languagee-specific files must be named 'lemma-description_LANGUAGE_CODE.properties' and 'user-searchui_LANGUAGE_CODE.properties'(i.e 'user-searchui_de.properties', 'user-searchui_en.properties', etc.).

In user-searchui.properties, a mapping of all ids defined in the search configuration file must be provided. In this example, this would be

searchPhrase=Search Phrase
searchPhrase_submit=Search
highlight=Highlight
suggestions=Show unverified results

language=Language
german=German
english=English
both=Both

method=Method
exact=Exact
normal=Default

more_options=More Options...
less_options=Less Options...

The lemma-description.properties must contain all translations of all indexed fields, plus some translations for some configurable UI-elements, such as the title of the modify editor.

suggest.title=Missing something?
suggest.subtext=Suggest a new entry
modify.title=Thinking different?
modify.subtext=Suggest a modification of this entry
button.clear=Clear
button.cancel=Cancel
button.ok=Send
description.modify=You can use this form to modify an entry. 
description.suggest=Please use this form to contact us.
dialog.saving=Saving...
dialog.failure=Failed to save suggestion
dialog.success=Thanks!
dialog.comment.header=Comment or Question
dialog.comment.placeholder=Use this field to leave a comment or a question. 
dialog.email.header=Email Address
dialog.email.placeholder=Enter your email address
maalr.query.results={0} (results {1} to {2} of {3})
maalr.query.nothing_found=The Search for "{0}" did not return any results. Click on "{1}" to suggest a new entry.
maalr.query.result_modify=Modify...
suggest.button=Suggest...
maalr.query.results_first_page=First
maalr.query.results_last_page=Last
german=German
english=English
maalr.query.result_title=Results for {0}
dict.lang1_lang2=German-English
dict.lang2_lang1=English-German
dict.title_lang1=English translations of '{0}'
dict.title_lang2=German translations of '{0}'

editor.modify.subtitle=Modify entry...
editor.suggest.subtitle=
editor.suggest.title=New Entry...

header.modify=My fancy app
header.suggest=My fancy app

mail.subject=Your request to my fancy app

english_word=English
type=Type
present=Present
imperfect=Imperfect
...
imperfectsing1=1. pers. sing.
imperfectsing2=2. pers. sing.
imperfectsing3=3. pers. sing.
imperfectplural1=1. pers. plur.
imperfectplural2=2. pers. plur.
imperfectplural3=3. pers. plur.