implementation details about textmodel_svm ? #47

randomgambit · 2021-02-25T14:40:13Z

Hello there!

I hope all is well during these difficult times! I was playing with the great quanteda and discovered the nice textmodel_svm classification model. However, contrary to textmodel_nb where there is a little example which reproduces juravsky's toy case, I cannot find anything about textmodel_svm.

Are any additional details available about this function (a quanteda tutorial, a toy example, etc)? What is happening under the hood when using textmodel_svm with dfms? Can we get back the coefficients for each token?

Thanks!

The text was updated successfully, but these errors were encountered:

randomgambit · 2021-02-26T02:50:38Z

@kbenoit for instance I see here https://github.com/cran/quanteda.textmodels/blob/a1c52468a8004e9c8a23b67eee9584677f2dab71/tests/testthat/test-textmodel_svm.R that you check that the coefficients should be equal to

    expect_equal(
        coef(tmod)[1, 1:3, drop = FALSE],
        matrix(c(0.5535941, 0.1857624, 0.1857624), nrow = 1,
               dimnames = list(NULL, c("Chinese", "Beijing", "Shanghai"))),
        tol = .0000001
    )

How do you know that? The IR example only deals with naive bayes.

Thanks again!

randomgambit · 2021-03-01T17:59:29Z

actually @kbenoit @koheiw by looking at the manual https://cran.r-project.org/web/packages/LiblineaR/LiblineaR.pdf it seems the default textmodel_nb passes the default type = 0 which run a penalized logistic regression, not a SVM. But more generally I would be interested to find the reference for the "official" token coefficient shown in the unit test above.

Thanks again! quanteda rocks!

kbenoit · 2021-03-01T18:05:20Z

Yes we realised this recently... See #45. Easily overridden through type, which is passed via ....

Documentation is available in the references to textmodel_svm() as to how this works (methodologically I mean).

randomgambit · 2021-03-01T18:28:55Z

thanks @kbenoit, I saw the docs but I was curious to understand where do you get the coefficients matrix(c(0.5535941, 0.1857624, 0.1857624) in https://github.com/cran/quanteda.textmodels/blob/a1c52468a8004e9c8a23b67eee9584677f2dab71/tests/testthat/test-textmodel_svm.R

Are these the values computed in another textbook example and you are simply verifying that textmodel_svm produces the same correct coefficients? How do you know these are correct?

Thanks!

kbenoit · 2021-03-01T19:37:21Z

I think they came from running the code outside of the quanteda structure, so we are verifying it against running a non-quanteda version of the model with the same model data. Not a very strong test, but does check whether something went amiss in our wrapper.

Would be delighted for more critical tests or feedback, if you have it.

randomgambit · 2021-03-17T01:17:50Z

I am looking for some interesting docs. By the way, out of curiosity, do you know how does predict_svm recover the predicted probabilities for instance when using penalized logistic classification? Given K classes, is the algorithm fitting K one-vs-all classifiers, computing all K probabilities and then normalizing each probability by the sum of these probabilities (so that the sum is indeed one)? What do you think?

kbenoit · 2021-03-17T12:17:32Z

That's in the paper describing the method, but for multinomial logistic regression (of which the penalised approach is a special version), these are equivalent. The standard way is to compute this as per the last equation in https://en.wikipedia.org/wiki/Multinomial_logistic_regression#As_a_set_of_independent_binary_regressions.

kbenoit transferred this issue from quanteda/quanteda Mar 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation details about textmodel_svm ? #47

implementation details about textmodel_svm ? #47

randomgambit commented Feb 25, 2021

randomgambit commented Feb 26, 2021

randomgambit commented Mar 1, 2021

kbenoit commented Mar 1, 2021 •

edited

Loading

randomgambit commented Mar 1, 2021

kbenoit commented Mar 1, 2021

randomgambit commented Mar 17, 2021

kbenoit commented Mar 17, 2021

implementation details about textmodel_svm ? #47

implementation details about textmodel_svm ? #47

Comments

randomgambit commented Feb 25, 2021

randomgambit commented Feb 26, 2021

randomgambit commented Mar 1, 2021

kbenoit commented Mar 1, 2021 • edited Loading

randomgambit commented Mar 1, 2021

kbenoit commented Mar 1, 2021

randomgambit commented Mar 17, 2021

kbenoit commented Mar 17, 2021

kbenoit commented Mar 1, 2021 •

edited

Loading