Skip to content

Latest commit

 

History

History
20 lines (11 loc) · 1.21 KB

readme.md

File metadata and controls

20 lines (11 loc) · 1.21 KB

Wiki XML Dump to PDF

The purpose of this is essentially to extract all the text from the XML1 in a (somewhat) readable format.

This was used to create PDF files to upload an entire wiki to NotebookLM.

Explanation of files

  1. one-pdf.py will convert the xml dump to one large pdf.
  2. max-word-count.py will break up the text into multiple PDFs with a maximum of 499,999 words to abide by NotebookLM word limits. You can edit the word limit here.

ko-fi

Notes

The script assumes you are using http://www.mediawiki.org/xml/export-0.11/' # Example namespace; adjust after printing structure but it's possible your XML is using a different namespace.

The script prints out the namespace when running and will throw an error if the namespace is incorrect but in the print it should indicate the correct namespace so you can update the namespace in Line 20 max-word-count.py or one-pdf.py.

Footnotes

  1. The XML was generated utilizing wikiteam3