You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present, Wikipedia plugin grabs plain text from Wikipedia pages by stripping HTML tag regexes from the HTML provided by the API. Unfortunately, the first several paragraphs of many pages include a lot of text that is hard to interpret without those tags. it would be better to
identify selectors of generally unwanted page components, and strip those from the API response. This should include at minimum redirect notices, section edit links, TOC elements, summary tables, and perhaps all images & captions.
rewrite relative links as absolute links, so they can be properly clicked.
replace numberofwords with numberofparagraphsparagraphs
currently all wiki plugins live inside the same div. Is that the right decision? Shouldn't each one have its own context? Add a wrapper around each instance, simplifying the code a bit.
The text was updated successfully, but these errors were encountered:
titaniumbones
changed the title
update Wikipedia plugin to generate HTML
update Wikipedia plugin to generate HTML, and other improvements
Nov 17, 2019
At present, Wikipedia plugin grabs plain text from Wikipedia pages by stripping HTML tag regexes from the HTML provided by the API. Unfortunately, the first several paragraphs of many pages include a lot of text that is hard to interpret without those tags. it would be better to
numberofwords
withnumberofparagraphs
paragraphs
currently all wiki plugins live inside the same div. Is that the right decision? Shouldn't each one have its own context?Add a wrapper around each instance, simplifying the code a bit.The text was updated successfully, but these errors were encountered: