Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translation-server should exclude css/html/js from metadata - site is returning js and css in author field. #111

Open
mvolz opened this issue Feb 11, 2020 · 1 comment

Comments

@mvolz
Copy link
Contributor

mvolz commented Feb 11, 2020

curl -d 'https://www.milliyet.com.tr/gundem/canan-dagdeviren-kimdir-2392696' -H 'Content-Type: text/plain' http://127.0.0.1:1969/web

gives the results

[{"key":"F4KIRRQ3","version":0,"itemType":"webpage","creators":[{"firstName":"player-inline {display: inline-block;padding-bottom: 56 25%;position: relative;width: 100%;z-index: 5;} player-box {height: 100%;left: 0;position: absolute;top: 0;width: 100%;}$ ready{quarkPlayer = new","lastName":"QuarkPlayer","creatorType":"author"},{"name":"bufferLength:5","creatorType":"author"},{"firstName":"autoPlay:","lastName":"false","creatorType":"author"},{"firstName":"subTitles:","lastName":"false","creatorType":"author"},{"firstName":"showAds:","lastName":"true","creatorType":"author"},{"firstName":"showNotification:","lastName":"false","creatorType":"author"},{"name":"showB","creatorType":"author"},{"firstName":"widthSelector:","lastName":"true","creatorType":"author"},{"firstName":"customMenu:","lastName":"false","creatorType":"author"},{"name":"Preload: 'None'","creatorType":"author"},{"firstName":"Playsinline:","lastName":"True","creatorType":"author"},{"firstName":"Live:","lastName":"False","creatorType":"author"},{"name":"Poster: 'Https://I2.milimaj.com/I/Milliyet/75/800x450/5e4231ac55427f1b70cf438a.jpg'","creatorType":"author"},{"name":"sources:","creatorType":"author"},{"name":"playType: \"newsdetail\"","creatorType":"author"},{"name":"adTags","creatorType":"author"},{"name":"cust_params=keyword%3dVid_duration_1_3%2cVid_pubdate_new%2cseeding_false%2cautoplay_false%2csilentstart_false%2cst_none%2cpremium_video%26contentid%3d6142190%26kategori%3dml_mtv_milliyet-tv_haberler%26catlist%3dc1_milliyet-tv%2cc2_haberler%2cCct_sivas%2cCct_soguk-hava%2cCct_sicak-su%2cCct_buz%2cct_sivas%2cct_soguk-hava%2cct_sicak-su%2cct_buz%26pub_name%3dmilliyet","creatorType":"author"},{"name":"vpos=preroll\"}","creatorType":"author"},{"name":"{\"id\":\"overlay\"","creatorType":"author"},{"name":"\"offset\":\"00:00:05.000\"","creatorType":"author"},{"name":"\"type\":\"nonlinear\"","creatorType":"author"},{"name":"\"url\":\"https://pubads.g.doubleclick.net/gampad/ads?sz=640x360","creatorType":"author"},{"name":"iu=/9927946/milliyet/sitegeneli/overlay","creatorType":"author"},{"name":"impl=s","creatorType":"author"},{"name":"gdfp_req=1","creatorType":"author"},{"name":"env=vp","creatorType":"author"},{"name":"output=vast","creatorType":"author"},{"name":"unviewed_position_start=1","creatorType":"author"},{"name":"url=https://www.milliyet.com.tr/gundem/canan-dagdeviren-kimdir-2392696","creatorType":"author"},{"name":"description_url=http%3a%2f%2fwww.milliyet.com.tr%2fgundem%2fcanan-dagdeviren-kimdir-2392696","creatorType":"author"},{"name":"correlator=","creatorType":"author"},{"name":"cust_params=keyword%3dVid_duration_1_3%2cVid_pubdate_new%2cseeding_false%2cautoplay_false%2csilentstart_false%2cst_none%2cpremium_video%26contentid%3d6142190%26kategori%3dml_mtv_milliyet-tv_haberler%26catlist%3dc1_milliyet-tv%2cc2_haberler%2cCct_sivas%2cCct_soguk-hava%2cCct_sicak-su%2cCct_buz%2cct_sivas%2cct_soguk-hava%2cct_sicak-su%2cct_buz%26pub_name%3dmilliyet","creatorType":"author"},{"name":"vpos=overlay","creatorType":"author"},{"name":"overlay=1\"}","creatorType":"author"},{"name":"{\"id\":\"postroll\"","creatorType":"author"},{"name":"\"offset\":\"end\"","creatorType":"author"},{"name":"\"type\":\"linear\"","creatorType":"author"},{"name":"\"url\":\"https://pubads.g.doubleclick.net/gampad/ads?sz=640x360","creatorType":"author"},{"name":"iu=/9927946/milliyet/sitegeneli/postroll","creatorType":"author"},{"name":"impl=s","creatorType":"author"},{"name":"gdfp_req=1","creatorType":"author"},{"name":"env=vp","creatorType":"author"},{"name":"output=vast","creatorType":"author"},{"name":"unviewed_position_start=1","creatorType":"author"},{"name":"url=https://www.milliyet.com.tr/gundem/canan-dagdeviren-kimdir-2392696","creatorType":"author"},{"name":"description_url=http%3a%2f%2fwww.milliyet.com.tr%2fgundem%2fcanan-dagdeviren-kimdir-2392696","creatorType":"author"},{"name":"correlator=","creatorType":"author"},{"name":"cust_params=keyword%3dVid_duration_1_3%2cVid_pubdate_new%2cseeding_false%2cautoplay_false%2csilentstart_false%2cst_none%2cpremium_video%26contentid%3d6142190%26kategori%3dml_mtv_milliyet-tv_haberler%26catlist%3dc1_milliyet-tv%2cc2_haberler%2cCct_sivas%2cCct_soguk-hava%2cCct_sicak-su%2cCct_buz%2cct_sivas%2cct_soguk-hava%2cct_sicak-su%2cct_buz%26pub_name%3dmilliyet","creatorType":"author"},{"name":"vpos=postroll\"}]","creatorType":"author"},{"name":"plugins:","creatorType":"author"},{"name":"stats: {gemius: {identifier: 'bIFA4t.SzzEb53fr9ZSQl2ZVzQXZZ4NyqW0wgJzlvwb.e7'}","creatorType":"author"},{"name":"Clicks: {portal: \"Webtv\"","creatorType":"author"},{"name":"Action: \"Video\"","creatorType":"author"},{"firstName":"pathname: \"O ilimizde hava eksi 21 dereceyi gördü! Hayat buz kesti | Haberler |","lastName":"sivas","creatorType":"author"},{"firstName":"Soğuk","lastName":"Hava","creatorType":"author"},{"firstName":"Sıcak","lastName":"Su","creatorType":"author"},{"firstName":"Buz | 117 |","lastName":"Newsdetail\"","creatorType":"author"},{"name":"newsCategory : '/milliyet-tv/haberler/'","creatorType":"author"},{"name":"Base_url: 'Https://Www.milliyet.com.tr/Milliyet-Tv/O-Ilimizde-Hava-Eksi-21-Dereceyi-Gordu-Hayat-Buz-Kesti-6142190'}","creatorType":"author"},{"name":"Bluekai: {}}","creatorType":"author"},{"firstName":"htvThumbnails: {showThumbnail:","lastName":"false","creatorType":"author"},{"name":"thumbnailUrl : '//videocdn.milliyet.com.tr/2020/02/11/mtv_6142190_thmb.jpg'","creatorType":"author"},{"name":"thumbnailWidth: '128'","creatorType":"author"},{"name":"thumbnailHeight: '72'}","creatorType":"author"},{"firstName":"hotkeys: {enableVolumeScroll:","lastName":"false}","creatorType":"author"},{"firstName":"suggestedVideos: {showSuggestedVideos:","lastName":"true","creatorType":"author"},{"name":"nextVideoSummonTime:7","creatorType":"author"},{"name":"autoNextSuggestedVideos:false","creatorType":"author"},{"firstName":"suggestedVideoList: null}});});O ilimizde hava eksi 21 dereceyi gördü! Hayat buz kestiSivas'ta gece saatlerinde termometreler eksi 21 dereceyi gösterdi Havaya serpilen sıcak su yere buz taneciği olarak düştü Sivas'ta günlerdir etkisini sürdüren soğuk hava gece yarısı eksi 21 dereceye düştü Hayat adeta buz","lastName":"kesti","creatorType":"author"},{"firstName":"Caddeler Tamamen Boşaldı daha Fazla Video","lastName":"Için","creatorType":"author"}],"tags":[],"title":"Canan Dağdeviren kimdir?","websiteTitle":"Milliyet","url":"https://www.milliyet.com.tr/gundem/canan-dagdeviren-kimdir-2392696","abstractNote":"Canan Dağdeviren kimdir? Dünyanın en iyi akademisyenlerini Boğaziçi Lectures kapsamında konuşmacı olarak misafir edecek olan Boğaziçi Üniversitesi Giyilebilir kalp pilinin mucidi Dr. Canan Dağdeviren'i konuk ediyor. Bilimsel anlamda birçok başarıya imza atan Canan Dağdeviren aynı zamanda Forbes dergisinin 30 yaş altı Bilim insanı listesinde de yer alıyor","language":"tr","accessDate":"2020-02-11T12:25:16Z"}]

This is a security issue we automatically block the edit, but it's not ideal.

I think I brought this up before, years ago, and it was recommended I file a bug in the translators repo instead, but I really think this is something that should be protected against here ideally as well! Bug for translators lib here: zotero/translators#2117

dstillman added a commit that referenced this issue Feb 16, 2020
JSDOM doesn't support `innerText` [1], so our shim was just forwarding
`textContent`. Instead, remove `script` and `style` elements, which
won't do the same thing as in the browser but will hopefully remove most
unwanted content.

Addresses #111

[1] jsdom/jsdom#1245
@dstillman
Copy link
Member

This isn't really something we can fix generally, but I've committed a change that, when paired with an updated EM translator, will mostly fix the above example. We can't fully emulate innerText in the browser, because it's not supported by JSDOM, but we can at least exclude script and style content when doing fallback author parsing (i.e., elements with byline or vcard classes, which have no actual meaning but usually indicate the presence of an author).

Note that this isn't really a security issue in any general sense. The issue here is just junk data in creator entries — what exactly it is shouldn't matter. translation-server returns text, not code. If it's being inserted into an HTML document, it should be properly escaped, as with any other untrusted input. (The one exception might be the few tags allowed by Zotero/citeproc-js, which it's possible some translators include in returned data, but that would be a strict whitelist.)

mvolz added a commit to mvolz/translation-server that referenced this issue Sep 24, 2020
Merge contains following changes:

commit f0cff95
Author: Dan Stillman <[email protected]>
Date:   Sun Feb 16 14:45:41 2020 -0500

    Don't include `script` and `style` elements in innerText

    JSDOM doesn't support `innerText` [1], so our shim was just forwarding
    `textContent`. Instead, remove `script` and `style` elements, which
    won't do the same thing as in the browser but will hopefully remove most
    unwanted content.

    Addresses zotero#111

    [1] jsdom/jsdom#1245

Bug: T245092
Change-Id: I5e24a0f1813960802e50c38f4b80f9bc0f23ef95
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants