update Wikipedia plugin to generate HTML, and other improvements #84

titaniumbones · 2019-11-17T20:30:26Z

At present, Wikipedia plugin grabs plain text from Wikipedia pages by stripping HTML tag regexes from the HTML provided by the API. Unfortunately, the first several paragraphs of many pages include a lot of text that is hard to interpret without those tags. it would be better to

identify selectors of generally unwanted page components, and strip those from the API response. This should include at minimum redirect notices, section edit links, TOC elements, summary tables, and perhaps all images & captions.
rewrite relative links as absolute links, so they can be properly clicked.
consider adding a default target or onbeforeunload event handler to reduce unwanted navigation away from page.
replace numberofwords with ~~numberofparagraphs~~ paragraphs
~~currently all wiki plugins live inside the same div. Is that the right decision? Shouldn't each one have its own context?~~ Add a wrapper around each instance, simplifying the code a bit.

The text was updated successfully, but these errors were encountered:

titaniumbones changed the title ~~update Wikipedia plugin to generate HTML~~ update Wikipedia plugin to generate HTML, and other improvements Nov 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update Wikipedia plugin to generate HTML, and other improvements #84

update Wikipedia plugin to generate HTML, and other improvements #84

titaniumbones commented Nov 17, 2019 •

edited

Loading

update Wikipedia plugin to generate HTML, and other improvements #84

update Wikipedia plugin to generate HTML, and other improvements #84

Comments

titaniumbones commented Nov 17, 2019 • edited Loading

titaniumbones commented Nov 17, 2019 •

edited

Loading