Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grok or regex support. #1

Open
wuranbo opened this issue Aug 16, 2016 · 2 comments
Open

Add grok or regex support. #1

wuranbo opened this issue Aug 16, 2016 · 2 comments

Comments

@wuranbo
Copy link
Contributor

wuranbo commented Aug 16, 2016

Like elastic 5.0 have done. https://www.elastic.co/guide/en/elasticsearch/reference/master/grok-processor.html.
With regex support, the tantivy-cli should be more practical, eg. use the Nginx or Apache log directly as input file.

@fulmicoton what about you thought? This is what I want to do with the https://github.com/BurntSushi/fst in my owner project.

So I will take it.

But as a very Rust newbie I may take some time. If you think it is a bad idea, actually I will still do it in my fork to familiar the code base of tantivy. ^_^

@fulmicoton
Copy link
Collaborator

Hi @wuranbo.

For several reason I think, Log analytics could be a great use case for tantivy.
Analytics is a use case of search where the super low-level performance
really matters, and it might be easier to reach lucene's performance rapidly
on that front... and quite possibly top them.

A "grok" preprocessor for our document would be a great addition... in the future.

Right now tantivy's document processing is close to inexistant, so it will
be difficult to add a module into something that is anything but modular.

This does not necessarily mean you should not go on with your project.
As you said you can fork it.
When tantivy gets a more mature analysis, hopefully we will be able to reintegrate your code.

I don't have a clear idea of what the analysis pipeline API should look like.
If you are interested, you can also start by that and plug your grok processor in it.

@wuranbo
Copy link
Contributor Author

wuranbo commented Aug 17, 2016

@fulmicoton Totally agree that the pipline is a huge thing needing us to think and design carefully before doing it. And I'm interesting in it.

I think my bigger concerne is about the friendly usage of trantivy-cli to more use case(like log) now. About attracting more people to this project at first glance. For now, I think people can only use the two json files which downloaded in the README easily, is not sexy.

Besides log, another use case I think should be HTML parser.

I think a quick demo usage of log and HTML just implenmented in trantivy-cli (for now) may be enough to together people at first glance. In future, we can fix it.

Imageing that, the man first notice trantivy can use trantivy-cli to a 'real' log file in his daily work or some 'real' HTML pages on Internet, then searching it. This is what I image that the thing should be happened when I first saw lucene serval years ago. If it is, may I had been better on lucene. ^_^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants