Wiki XML Dump to PDF

The purpose of this is essentially to extract all the text from the XML¹ in a (somewhat) readable format.

This was used to create PDF files to upload an entire wiki to NotebookLM.

Explanation of files

one-pdf.py will convert the xml dump to one large pdf.
max-word-count.py will break up the text into multiple PDFs with a maximum of 499,999 words to abide by NotebookLM word limits. You can edit the word limit here.

Notes

The script assumes you are using http://www.mediawiki.org/xml/export-0.11/' # Example namespace; adjust after printing structure but it's possible your XML is using a different namespace.

The script prints out the namespace when running and will throw an error if the namespace is incorrect but in the print it should indicate the correct namespace so you can update the namespace in Line 20 max-word-count.py or one-pdf.py.

The XML was generated utilizing wikiteam3 ↩

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
.gitignore		.gitignore
max-word-count.py		max-word-count.py
one-pdf.py		one-pdf.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wiki XML Dump to PDF

Explanation of files

Notes

About

Releases

Sponsor this project

Packages

Languages

jacobrosenfeld/wiki-to-pdf

Folders and files

Latest commit

History

Repository files navigation

Wiki XML Dump to PDF

Explanation of files

Notes

Footnotes

About

Resources

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages