-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issues with get_nrc_sentiment #36
Comments
Hi thanks for raising your issue, welcome to using R! Do you have a sample of code shows the issue you are facing? |
I am using the following code. I have about a million tweets. clean_tweets = gsub('(RT|via)((?:\b\W*@\w+)+)', '', tweets) clean_tweets text_data = df_new$text |
Thanks for the code sample. So I’m assuming the cleansing is working fine, and it’s the get_nrc_sentiment that is taking up the most time - is that correct, and you can run the code on a subset of your million tweets? depending on what machine you are running your code on, you could partition the tweets into different groups, perhaps by starting letter or range of letters, then run this in parallel. https://www.r-bloggers.com/2017/10/running-r-code-in-parallel/ Apart from running this code on a more powerful cloud instance, all I can suggest is leaving it to run overnight. i hope this helps! |
Yes, the cleansing is fine. get_nrc_sentiment took hours to complete on a subset of my data(~200k tweets). I got the results when I left it to run for a couple of hours. Looks like I will just repeat this process on small chunks of data. Thank you for pointing me in the right direction. |
No problem. If you come up with something that helps, do post a snippet back so it can help others |
Hello I am facing the same issues here. My data consists of 340.000 tweets approx. and I am trying to use the get_nrc_sentiment on it. However, I have timed the code to see an estimation of the total time (as I left it overnight but it didnt finish), and I get that it would last about 14 days in my case. As you mentioned it took some hours in your comment I wondered if there is something wrong or maybe if someone came up with a solution ( also whether someone has tried parallelisation successfully)? Is it normal to last that much? This is my current code: Thanks in advance |
Hi,
I am trying to perform sentiment analysis using the NRC lexicon on Twitter data however when I use get_nrc_sentiment it takes too long to compute. I do have a huge dataset.
How can I reduce the time consumption?
Please advise. Also, I am new to R.
Thank you.
The text was updated successfully, but these errors were encountered: