Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added example #48

Closed
wants to merge 1 commit into from
Closed

Added example #48

wants to merge 1 commit into from

Conversation

GHLgh
Copy link
Member

@GHLgh GHLgh commented Apr 25, 2017

@danyaljj Example for the first bullet point in #44

We can close this pr after the example is put in ipython notebook

@@ -0,0 +1,43 @@
from sioux import remote_pipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we create an examples folder/module?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm thinking maybe we should make it a iPython notebook. Like all the tutorials and examples can be iPython notebooks. @bhargav comment on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have ipython notebooks. 👍

@danyaljj
Copy link
Member

danyaljj commented Apr 25, 2017

How about we change it slightly in this way?
In the example, retrieve a random document. (Say this: https://github.com/ryanmcdermott/trump-speeches/blob/master/speeches.txt ).
Then count all the verbs (POS = VB, VBB, VBD, VBG, VBN, VBZ, VBP) that occur "immediately after" a person (NER = PER). (By "immediately after" Like same sentence, I mean after, same sentence, within window of 3 words.)

What do you think?

@GHLgh
Copy link
Member Author

GHLgh commented Apr 25, 2017

It's doable, I can try that.

When you said 3 words, do you mind 3 tokens? I ask about it because punctuations are also counted as tokens, right?

@danyaljj
Copy link
Member

Yeah tokens should be fine.

@danyaljj
Copy link
Member

BTW, we shouldn't send everything altogether to the pipeline. We can split based on new lines and tabs, before sending it to the pipeline.

@bhargav
Copy link
Contributor

bhargav commented Apr 25, 2017

Also as a general comment, the usage is not easy. We should make it easier to access neighboring tokens somehow. https://github.com/CogComp/sioux/pull/48/files#diff-dc8b50acc65729bc37a3b573f4ab541eR31

Also being able to iterate over a view would be useful IMO.

for ner_token in pipeline.get_ner(doc):
    print(ner_token['label'])

@GHLgh
Copy link
Member Author

GHLgh commented Apr 25, 2017

Good idea, I can make the class a iterator, then we can get rid of some_view_class.get_cons()

@bhargav how would you want to be easier to access neighboring tokens? If we can iterate the view and find constituent by index, would that be sufficient?

I can make the usage simpler by adding corresponding tokens in the constituent (then we have ner_con['tokens'] == 'tokens of this constituent'). Right now we have to do some_view.get_cons(key='token')[constituent_index]

@GHLgh GHLgh closed this Apr 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants