Query analysis pipeline framework
groove is a library for pipeline construction for query analysis. The groove pipeline comprises a query source (the format of the queries), a statistic source (a source for computing information retrieval statistics), preprocessing steps, any measurements to make, and any output formats.
The groove library is primarily used in boogie which is a front-end DSL for groove. If using groove as a Go library, refer to the simple example below which loads Medline queries and analyses them using Elasticsearch and finally outputs the result into a JSON file.
In the below example, we would like to use Elasticsearch to measure some query performance predictors on some Medline queries. For the experiment, we would like to pre-process the queries by making each one only contain alpha-numeric characters, and in lowercase. Finally, we would like to output the results of the measures into a JSON file.
// Construct the pipeline.
pipelineChannel := make(chan groove.Result)
p := pipeline.NewGroovePipeline(
query.NewTransmuteQuerySource(query.MedlineTransmutePipeline),
stats.NewElasticsearchStatisticsSource(stats.ElasticsearchHosts("http://localhost:9200"),
stats.ElasticsearchIndex("medline"),
stats.ElasticsearchField("abstract"),
stats.ElasticsearchScroll(true),
stats.ElasticsearchSearchOptions(stats.SearchOptions{
Size: 10000,
RunName: "qpp",
})),
pipeline.Measurement(preqpp.AvgICTF, preqpp.SumIDF, preqpp.AvgIDF, preqpp.MaxIDF, preqpp.StdDevIDF, postqpp.ClarityScore),
pipeline.Evaluation(eval.PrecisionEvaluator, eval.RecallEvaluator),
pipeline.MeasurementOutput(output.JsonMeasurementFormatter),
pipeline.EvaluationOutput("medline.qrels", output.JsonEvaluationFormatter),
pipeline.TrecOutput("medline_qpp.results"))
// Execute it on a directory of queries. A pipeline executes queries in parallel.
go p.Execute("./medline", pipelineChannel)
for {
// Continue until completed.
result := <-pipelineChannel
if result.Type == groove.Done {
break
}
switch result.Type {
case groove.Measurement:
// Process the measurement outputs.
err := ioutil.WriteFile("medline_qpp.json", bytes.NewBufferString(result.Measurements[0]).Bytes(), 0644)
if err != nil {
log.Fatal(err)
}
case groove.Evaluation:
// Process the evaluation outputs.
err := ioutil.WriteFile("medline_qpp_eval.json", bytes.NewBufferString(result.Evaluations[0]).Bytes(), 0644)
if err != nil {
log.Fatal(err)
}
}
}
If you use this work for scientific publication, please reference
@inproceedings{scells2018framework,
author = {Scells, Harrisen and Locke, Daniel and Zuccon, Guido},
title = {An Information Retrieval Experiment Framework for Domain Specific Applications},
booktitle = {The 41st International ACM SIGIR Conference on Research \&\#38; Development in Information Retrieval},
series = {SIGIR '18},
year = {2018},
}
The Go gopher was created by Renee French, licensed under Creative Commons 3.0 Attributions license.