Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Mac, Japanese Characters are garbled in output generated by textplot_network #14

Open
hidekoji opened this issue Apr 3, 2021 · 0 comments

Comments

@hidekoji
Copy link

hidekoji commented Apr 3, 2021

Describe the bug

When running the below script on Mac, Japanese characters got garbled. I found a similar issue quanteda/quanteda#1317 and followed the suggestion but it didn't solve the problem.

macOS Big Sure (11.2.3)

library(quanteda)
library(dplyr)

## Load extra fonts
extrafont::font_import()

tokens <- Twitter_Search_Source1 %>% select(text) %>% quanteda::corpus() %>%  
  quanteda::tokens(what = "word", remove_punct = TRUE, remove_numbers = TRUE,  remove_symbols = TRUE, remove_twitter = TRUE, remove_hyphens = TRUE, remove_separators = TRUE, remove_url = TRUE)
stopwords_to_remove <- exploratory::get_stopwords(lang = "japanese")
    tokens <- tokens %>% quanteda::tokens_remove(stopwords_to_remove, valuetype = "fixed")
    tokens <- tokens %>% quanteda::tokens_remove(stringr::str_c("^[\\\u3040-\\\u309f]{1,", 2, "}$"), valuetype = "regex")
fcmat <- quanteda::fcm(tokens, context = "window", tri = FALSE)
feat <- names(topfeatures(fcmat, 30))
quanteda::fcm_select(fcmat, pattern = feat) %>%
	 quanteda::textplot_network(min_freq = 0.5)

This generates

image

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS  10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] quanteda_2.1.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5         pillar_1.4.7       compiler_4.0.2     tools_4.0.2        stopwords_2.1      digest_0.6.27      evaluate_0.14      lifecycle_0.2.0   
 [9] tibble_3.0.1       gtable_0.3.0       lattice_0.20-41    pkgconfig_2.0.3    rlang_0.4.10       Matrix_1.2-18      fastmatch_1.1-0    rstudioapi_0.13   
[17] yaml_2.2.1         xfun_0.20          dplyr_1.0.2        knitr_1.30         generics_0.1.0     vctrs_0.3.6        fs_1.5.0           grid_4.0.2        
[25] tidyselect_1.1.0   glue_1.4.2         data.table_1.13.6  R6_2.5.0           rmarkdown_2.6      ggplot2_3.3.3      purrr_0.3.4        magrittr_2.0.1    
[33] scales_1.1.1       ellipsis_0.3.1     htmltools_0.5.0    usethis_2.0.0      colorspace_2.0-0   stringi_1.5.3      RcppParallel_5.0.2 munsell_0.5.0     
[41] crayon_1.3.4  

font loaded on the R

 > extrafont::fonts()
  [1] ".SF Compact Rounded"     ".Keyboard"               ".New York"               ".SF Compact"             "System Font"            
  [6] ".SF NS Mono"             ".SF NS Rounded"          "Academy Engraved LET"    "Andale Mono"             "Apple Braille"          
 [11] "AppleMyungjo"            "Arial Black"             "Arial"                   "Arial Narrow"            "Arial Rounded MT Bold"  
 [16] "Arial Unicode MS"        "Bodoni Ornaments"        "Bodoni 72 Smallcaps"     ""                        "Brush Script MT"        
 [21] "Comic Sans MS"           "Courier New"             "DIN Alternate"           "DIN Condensed"           "Georgia"                
 [26] "Impact"                  "Khmer Sangam MN"         "Lao Sangam MN"           "Luminari"                "Microsoft Sans Serif"   
 [31] "Noto Sans Adlam"         "Noto Sans Avestan"       "Noto Sans Bamum"         "Noto Sans Bassa Vah"     "Noto Sans Batak"        
 [36] "Noto Sans Bhaiksuki"     "Noto Sans Brahmi"        "Noto Sans Buginese"      "Noto Sans Buhid"         "Noto Sans Carian"       
 [41] "Noto Sans CaucAlban"     "Noto Sans Chakma"        "Noto Sans Cham"          "Noto Sans Coptic"        "Noto Sans Cuneiform"    
 [46] "Noto Sans Cypriot"       "Noto Sans Duployan"      "Noto Sans EgyptHiero"    "Noto Sans Elbasan"       "Noto Sans Glagolitic"   
 [51] "Noto Sans Gothic"        "Noto Sans HanifiRohg"    "Noto Sans Hanunoo"       "Noto Sans Hatran"        "Noto Sans ImpAramaic"   
 [56] "Noto Sans InsPahlavi"    "Noto Sans InsParthi"     "Noto Sans Kaithi"        "Noto Sans Kayah Li"      "Noto Sans Kharoshthi"   
 [61] "Noto Sans Khojki"        "Noto Sans Khudawadi"     "Noto Sans Lepcha"        "Noto Sans Limbu"         "Noto Sans Linear A"     
 [66] "Noto Sans Linear B"      "Noto Sans Lisu"          "Noto Sans Lycian"        "Noto Sans Lydian"        "Noto Sans Mahajani"     
 [71] "Noto Sans Mandaic"       "Noto Sans Manichaean"    "Noto Sans Marchen"       "Noto Sans MeeteiMayek"   "Noto Sans Mende Kikakui"
 [76] "Noto Sans Meroitic"      "Noto Sans Miao"          "Noto Sans Modi"          "Noto Sans Mongolian"     "Noto Sans Mro"          
 [81] "Noto Sans Multani"       "Noto Sans Nabataean"     "Noto Sans Newa"          "Noto Sans NewTaiLue"     "Noto Sans N'Ko"         
 [86] "Noto Sans Ogham"         "Noto Sans Ol Chiki"      "Noto Sans OldHung"       "Noto Sans Old Italic"    "Noto Sans OldNorArab"   
 [91] "Noto Sans Old Permic"    "Noto Sans OldPersian"    "Noto Sans OldSouArab"    "Noto Sans Old Turkic"    "Noto Sans Osage"        
 [96] "Noto Sans Osmanya"       "Noto Sans Pahawh Hmong"  "Noto Sans Palmyrene"     "Noto Sans PauCinHau"     "Noto Sans PhagsPa"      
[101] "Noto Sans Phoenician"    "Noto Sans PsaPahlavi"    "Noto Sans Rejang"        "Noto Sans Runic"         "Noto Sans Samaritan"    
[106] "Noto Sans Saurashtra"    "Noto Sans Sharada"       "Noto Sans Shavian"       "Noto Sans Siddham"       "Noto Sans SoraSomp"     
[111] "Noto Sans Sundanese"     "Noto Sans Syloti Nagri"  "Noto Sans Syriac"        "Noto Sans Tagalog"       "Noto Sans Tagbanwa"     
[116] "Noto Sans Tai Le"        "Noto Sans Tai Tham"      "Noto Sans Tai Viet"      "Noto Sans Takri"         "Noto Sans Thaana"       
[121] "Noto Sans Tifinagh"      "Noto Sans Tirhuta"       "Noto Sans Ugaritic"      "Noto Sans Vai"           "Noto Sans Wancho"       
[126] "Noto Sans WarangCiti"    "Noto Sans Yi"            "Noto Serif Ahom"         "Noto Serif Balinese"     "Party LET"              
[131] "Tahoma"                  "Times New Roman"         "Trattatello"             "Trebuchet MS"            "Verdana"                
[136] "Webdings"                "Wingdings"               "Wingdings 2"             "Wingdings 3"             "Yu Gothic"              
> 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant