-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several non-English words made it into the list #187
Comments
@lserni What was your methodology for collecting this? Funny timing, as I sat down today to write a script to remove every single non-word / non-english-word from this list. In addition to the language stuff you pointed out, I have noticed a number of other non-english words and 'artifacts' of English. Gutteral sounds. etc. |
I stumbled across one word due to a misspelling (it might have been 'kischen'), I realized it was German and started looking for that suffix ("-schen"), then one thing lead to another (e.g. finding "kirschen" led me to look for "kirsch" and so on). |
@lserni Gotcha. So from a processing/scripting standpoint, it was more or less a "manual/organic" search process? I plan to programmatically parse this to get a pure list of "guaranteed English" words. I've thought up two general strategies so far:
Please lmk if you have any ideas 💯 |
I tried using some python dictionary modules, but even online database ones are missing some words I know are real. |
German words ending in -schen: boeschen, goschen, groschen, guldengroschen, hamantaschen, hanschen, kischen, leschen, mariengroschen, menschen, neugkroschen, neugroschen, silbergroschen.
German words ending in -ung: anschauung, aufklarung, delundung, aufklrung, gelandesprung, geldesprung, gelndesprung (the last four are also spelled wrongly), gterdmerung (probably ASCII-filtered from a wrongly-spelled "Götterdämmerung"), kaolikung, lautverschiebung, quellung, quersprung, sturmabteilung, verwanderung, vorstellung.
German-sounding place names that end in -berg: aaberg, amberg, arlberg, baden-wtemberg (should be Baden-Wurttenberg), bamberg, beberg, bemberg, berg, bloxberg, bromberg, bundaberg, clayberg, cohberg, desberg, drakensberg, dusenberg, egeberg, ehrenberg, eisenberg (probably Heisenberg), faberg, feinberg, fineberg, flamberg, floeberg, freberg, frederiksberg, freudberg, friedberg, fromberg, ginsberg, ginzberg, godesberg, goldberg, goldenberg, gomberg, greenberg, grosberg, gruenberg, grunberg, gutenberg, guttenberg, hamberg, hardenberg, hedberg, heidelberg, heisenberg, hertberg, herzberg, hollenberg, houlberg, ingaberg, ingeberg, inselberg, judenberg, kapfenberg, knigsberg (probably Konigsberg), koenigsberg, konigsberg, kornberg, landenberg, lansberg, lederberg, lemberg, lichtenberg, lindberg, lindeberg, lundberg, marshallberg, memberg, mengelberg, moberg, mollberg, mossberg, msterberg (probably Musterberg), muhlenberg, nberg (probably Nueberg), newberg, nyberg, nieberg, noonberg, nuremberg, oberg, overberg, ramberg, rehnberg, reichenberg, rydberg, romberg, rosenberg, rotberg, rothberg, rothenberg, schberg, schoenberg, schonberg, schulberg, shimberg, shinberg, shirberg, sjoberg, slosberg, solberg, spitzenberg, steinberg, sternberg, stormberg, strasberg, strindberg, stromberg, sundberg, svedberg, taberg, tamberg, tanberg, tannenberg, tuneberg, vandenberg, venusberg, vilberg, vorarlberg, waterberg, wattenberg, weinberg, weisberg, weissberg, westberg, wittenberg, wtemberg, wurttemberg,
Swedish-sounding names ending in -borg: aalborg, bjneborg, carlsborg, friborg, goteborg, gteborg, helsingborg, hsingborg, ingaborg, ingeborg, kreymborg, lindsborg, seaborg, swedeborg, swedenborg, valborg, viborg, vyborg, volborg, wiborg.
German words with -rsch- or -wasser: goldwasser, kirschwasser, beterschap, borsch, borsches, bursch, burschenschaft, burschenschaften, clairschach, clairschacher, dauerschlaf, hersch, herschel, herscher, hirsch, hirschfeld, kirsches, kirschner, kursch, lautverschiebung, moersch.
German words that might be considered controversial: sieg, heil, hitler, mein, fuehrer, fuhrer, gott, mit, uns, SchutzStaffel.
French sounding words containing "aux": aboideaux, aboiteaux, agneaux, auxf, auxier, auxil, auxvasse, bandeaux, bateaux, batteaux, beaux, beaux-arts, beaux-esprits, beauxite (should be bauxite), boyaux, boisseaux, bordereaux, boudreaux, capiteaux, carpeaux, castrop-rauxel, chalumeaux, chapeaux, chateaux, cheneaux, chevaux, chevaux-de-frise, ciseaux, clervaux, clitoridauxe, colauxe, coteaux, couteaux, cryptoglaux, cristineaux, dermatauxe, eaux, enterauxe, esquimaux, fabliaux, faux, fauxbourdon, faux-bourdon, faux-na, flambeaux, fricandeaux, gateaux, glaux, hanotaux, hemiauxin, hepatauxe, jambeaux, jouhaux, kastrop-rauxel, knisteneaux, kristinaux, lascaux, laux, malraux, mantappeaux, manteaux, margaux, margeaux, marivaux, mastauxe, maux, meraux, michaux, myelauxe, morceaux, moureaux, nephrauxe, nouveaux, oophorauxe, paravauxite, pauxi, plateaux, portmanteaux, proces-verbaux, prostatauxe, radeaux, raveaux, reseaux, rinceaux, roncevaux, rondeaux, rouleaux, salteaux, splenauxe, subbureaux, tableaux, thibodaux, tonneaux, torteaux, trichauxis, trousseaux, trumeaux, vassaux, vauxite, veneaux, vitraux, wibaux, bureaux.
Loan words that are probably okay but, strictly, still not English: brehmsstrahlung, weltanschauung, volkerwanderung, ubermensch, borscht, borschts, kirsch, meerschaum, meerschaums, Messerschmitt, Rorschach, bordeaux.
East Asian words: Wa-palaung, Telukbetung, bagong, Ronggeng.
Chinese city name: Tzekung, Kaolikung
Korean name: Kyung, Kyaung, Keung.
Kyrgyz name: Issyk-Kul
Icelandic name: Jokul (should be Jokull)
Not sure if this counts: mallangong (Australian name for the platypus), wobbegong (Australian name of the carpet shark)
The text was updated successfully, but these errors were encountered: