-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance gains using readr::read_files() #74
Comments
Hi @lmullen, just getting back to this now that I have time. We're also preparing a CRAN release. I'd love to gain 30x more performance on the most commonly read type of file (text). I have no problem with adding a readr import. If you want to issue a PR with this change, by all means go ahead! I wonder however how much of the performance is caused by extra |
I experimented with this in a branch, and it's trickier than it looks. Yes I'm putting this on the back burner for now, but definitely something to address in the next revision. I also think we can remove the |
Thanks for the update, @kbenoit. I was just about to start work on this. Sounds like I should hold off for now, but happy to help out when you say the time is right. Looking forward to your first CRAN release. |
readtext is great. My students will thank you.
For reading in a directory of plain text files, you can get substantial time savings (roughly 30x on my machine) by using
readr::read_file()
instead ofread_lines()
and then pasting the lines together.Benchmarks for smallish corpus:
If you're willing to take a dependency on readr, then I would be happy to send a PR. What do you think?
The text was updated successfully, but these errors were encountered: