Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Wikipedia plugin to generate HTML, and other improvements #84

Open
4 of 5 tasks
titaniumbones opened this issue Nov 17, 2019 · 0 comments
Open
4 of 5 tasks

Comments

@titaniumbones
Copy link

titaniumbones commented Nov 17, 2019

At present, Wikipedia plugin grabs plain text from Wikipedia pages by stripping HTML tag regexes from the HTML provided by the API. Unfortunately, the first several paragraphs of many pages include a lot of text that is hard to interpret without those tags. it would be better to

  • identify selectors of generally unwanted page components, and strip those from the API response. This should include at minimum redirect notices, section edit links, TOC elements, summary tables, and perhaps all images & captions.
  • rewrite relative links as absolute links, so they can be properly clicked.
  • consider adding a default target or onbeforeunload event handler to reduce unwanted navigation away from page.
  • replace numberofwords with numberofparagraphs paragraphs
  • currently all wiki plugins live inside the same div. Is that the right decision? Shouldn't each one have its own context? Add a wrapper around each instance, simplifying the code a bit.
@titaniumbones titaniumbones changed the title update Wikipedia plugin to generate HTML update Wikipedia plugin to generate HTML, and other improvements Nov 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant