Skip to content
This repository has been archived by the owner on Mar 6, 2019. It is now read-only.

Issues parsing irrational numbers #13

Open
jackmcdade opened this issue Apr 25, 2018 · 3 comments
Open

Issues parsing irrational numbers #13

jackmcdade opened this issue Apr 25, 2018 · 3 comments

Comments

@jackmcdade
Copy link

jackmcdade commented Apr 25, 2018

Have you guys run into issues trying to parse strings with fractions that correlate to irrational numbers? For example, the following will all return null for qty.

1/3 cup flour
2/3 tsp almond extract
14/15 gallon milk
@adammck
Copy link
Contributor

adammck commented Apr 25, 2018

Could you provide a bit more information here? How are you using this library? Some more examples of things which do vs don't work? I haven't touched this for a long time, so don't have much context.

It does look like your third example won't work (we're only matching a single digit on either side of the slash), but I'd be surprised if "1/3 cup flour" wasn't working, since 1/3 appears so frequently in our training data.

@jackmcdade
Copy link
Author

jackmcdade commented Apr 25, 2018

We're using it inside a PHP application as an API, but even just using the included nyt-ingredients-snapshot-2015.csv data and basic CLI instructions from the README we get the same behavior. 14/15 is obviously not something you'd ever encounter in a recipe, but just trying to push the edges of what's actually happening under the hood here.

For example, here's the tagged result of 1/3 cup milk given the basic training model.

# 0.951035
1/3     I1    L4    NoCAP    NoPAREN    OTHER/0.998681
cup     I2    L4    NoCAP    NoPAREN    B-UNIT/0.956263
milk    I3    L4    NoCAP    NoPAREN    B-NAME/0.994245

1/3 is being tagged as OTHER, while 1/2 and 1/4 work just fine.

@jackmcdade
Copy link
Author

I too would have assumed it wouldn't be an issue on this side, and spent a large amount of time ruling out every other possibility, retaining it with many different subsets of our user-submitted data with no luck. I finally decided to start from the ground up here and noticed that your dataset behaves exactly the same.

Definitely surprised. I'm hoping you have even the slightest idea of what's going on. 🙏

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants