You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run the searchK function on a dataset of around 400k social media messages from various platforms (including short Twitter tweets but also longer discussion forum posts). I've tried to find the optimal model between the range k=10-300. However when k is close to or over 200, models begin converging after just a couple of iterations, which produces results that are suboptimal in comparison to models that run longer. I've tried using different random seeds for generating the heldout set, and this seems to influence the issue, i.e. under some random splits e.g. the k=200 model would converge in 3 iterations, whereas under others it would take over 200 iterations.
Would you have any idea what might be causing this issue, and whether this is appropriate model behavior? I'm trying to figure out how to assess the reliability of such results, possibly through e.g. doing a 10-fold validation with different random seeds.
Many thanks for help!
The text was updated successfully, but these errors were encountered:
Hi, and thanks for an excellent package!
I'm trying to run the searchK function on a dataset of around 400k social media messages from various platforms (including short Twitter tweets but also longer discussion forum posts). I've tried to find the optimal model between the range k=10-300. However when k is close to or over 200, models begin converging after just a couple of iterations, which produces results that are suboptimal in comparison to models that run longer. I've tried using different random seeds for generating the heldout set, and this seems to influence the issue, i.e. under some random splits e.g. the k=200 model would converge in 3 iterations, whereas under others it would take over 200 iterations.
Would you have any idea what might be causing this issue, and whether this is appropriate model behavior? I'm trying to figure out how to assess the reliability of such results, possibly through e.g. doing a 10-fold validation with different random seeds.
Many thanks for help!
The text was updated successfully, but these errors were encountered: