Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update tantivy to 0.6.1 #28

Merged
merged 7 commits into from
Jul 23, 2018
Merged

Conversation

nocduro
Copy link
Contributor

@nocduro nocduro commented Jul 22, 2018

Hi, I couldn't get cargo install tantivy-cli to compile, so I went and tried to fix the errors.

I ended up updating tantivy to 0.6.1

Unfortunately I couldn't find the TimerTree stuff in tantivy 0.6, so I deleted/commented out the timer code for now. I'm not sure what the replacement for that is?

  • no longer requires nightly
  • uses tantivy 0.6.1
  • changed version number to match tantivy (0.6.1)
  • update readme to no longer require nightly when installing
  • ran cargo update
  • fix issue Field name must match the pattern #12

I tested the basic examples from the readme running the webserver, and fixed a bug in two of the examples caused by the trailing linefeed in the query. This was my first time looking at the tantivy code, so I'm not 100% sure if I changed the functions to their correct counterparts in 0.6 (example: Index::open() to Index::open_in_dir())

nocduro added 4 commits July 22, 2018 16:32
* change from create() to create_in_dir()

* fix issue quickwit-oss#12 "Field name must match the pattern"
* remove tantivy::TimerTree code
* change Index::open() to Index::open_in_dir()
* change Index::create() to Index::create_in_dir()
README.md Outdated
run `rustup run nightly cargo install tantivy-cli`. (`cargo install tantivy-cli` will work
as well if nightly is your default toolchain).
If you are a Rust programmer, you probably have `cargo` installed and you can just
run `cargo install tantivy-cli`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool you even fixed that

README.md Outdated
run `rustup run nightly cargo install tantivy-cli`. (`cargo install tantivy-cli` will work
as well if nightly is your default toolchain).
If you are a Rust programmer, you probably have `cargo` installed and you can just
run `cargo install tantivy-cli`

Alternatively, you can directly download a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this paragraph. I'll readd it if I build binaries.

@@ -85,6 +82,7 @@ fn run_bench(index_path: &Path,
let mut top_collector = TopCollector::with_limit(10);
query.search(&searcher, &mut top_collector)
.map_err(|e| format!("Failed while retrieving document for query {:?}.\n{:?}", query, e))?;
/*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the commented block.

@fulmicoton
Copy link
Collaborator

Thanks for that. I got a bit sloppy with keeping tantivy-cli up to date.

I wrote a few minor comments. Can you address them?
We should be good to go after that.

@nocduro
Copy link
Contributor Author

nocduro commented Jul 23, 2018

Thanks for the quick review :)

I've made the changes, just wondering if I should remove some of the println()s that mention the timer? For example in bench.rs:

 println!("{}\t{}", "query", "time in microsecs");

@fulmicoton
Copy link
Collaborator

@nocduro Right! Ideally we want to have TimerTree in the tantivy-cli.

The file can be copied from here: https://github.com/tantivy-search/tantivy/blob/0.5.2/src/common/timer.rs

And call it manually in the bench.

fn run_bench(index_path: &Path,
             query_filepath: &Path,
             num_repeat: usize) -> Result<(), String> {
    
    println!("index_path : {:?}", index_path);
    println!("Query : {:?}", index_path);
    println!("-------------------------------\n\n\n");
    
    let index = Index::open(index_path).map_err(|e| format!("Failed to open index.\n{:?}", e))?;
    let searcher = index.searcher();
    let default_search_fields: Vec<Field> = extract_search_fields(&index.schema());
    let queries = read_query_file(query_filepath).map_err(|e| format!("Failed reading the query file:  {}", e))?;
    let query_parser = QueryParser::new(index.schema(), default_search_fields, index.tokenizers().clone());
    
    println!("SEARCH\n");
    println!("{}\t{}\t{}\t{}", "query", "num_terms", "num hits", "time in microsecs");
    for _ in 0..num_repeat {
        for query_txt in &queries {
            let query = query_parser.parse_query(&query_txt).unwrap();
            // let num_terms = query.num_terms();
            let mut top_collector = TopCollector::with_limit(10);
            let mut count_collector = CountCollector::default();
            let mut timing = TimerTree::default();
            {
                let _search = timing.open("search");
                let mut collector = chain().push(&mut top_collector).push(&mut count_collector);
                query.search(&searcher, &mut collector)
                    .map_err(|e| format!("Failed while searching query {:?}.\n\n{:?}", query_txt, e))?;
            }
            println!("{}\t{}\t{}", query_txt, count_collector.count(), timing.total_time());
        }
    }
    
    
    println!("\n\nFETCH STORE\n");
    println!("{}\t{}", "query", "time in microsecs");
    for _ in 0..num_repeat {
        for query_txt in &queries {
            let query = query_parser.parse_query(&query_txt).unwrap();
            let mut top_collector = TopCollector::with_limit(10);
            query.search(&searcher, &mut top_collector)
                .map_err(|e| format!("Failed while retrieving document for query {:?}.\n{:?}", query, e))?;
            let mut timer = TimerTree::default();
            {
                let _scoped_timer_ = timer.open("total");
                for doc_address in top_collector.docs() {
                    searcher.doc(&doc_address).unwrap();
                }
            }
            println!("{}\t{}", query_txt, timer.total_time());
        }
    }
    
    Ok(())
}

@fulmicoton
Copy link
Collaborator

Let me know if you can do that. If not just leave it as is, I will merge it and fix that later.

@nocduro
Copy link
Contributor Author

nocduro commented Jul 23, 2018

Sounds good, I'll add it back in

@fulmicoton
Copy link
Collaborator

Awesome !

@nocduro
Copy link
Contributor Author

nocduro commented Jul 23, 2018

Should be back to how it was now. I tested the web server part and am getting timings at the bottom of the page.

@fulmicoton
Copy link
Collaborator

Awesome. I'll merge once CI has finished its run.

@fulmicoton fulmicoton merged commit a81b270 into quickwit-oss:master Jul 23, 2018
@fulmicoton
Copy link
Collaborator

Thank you @nocduro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants