Static site generator requirements analysis

Now that we imported all our ~ 5 100 documents and ~37 000 edits into Git, we got to convert the raw data into static pages. Source webplatform/mediawiki-conversion reports.

Limitations

More than 5000 text files, it would be computationally expensive to regenerate the full site at every contribution. Its better to find a way to regenerate only the changed files.
Support i18n, some content (101 pages) are translated

Project vision

Be as simple as possible but in a way so we can add features incrementally
If something can be delivered by the web server (e.g. directory listing, with NGINX FancyIndex module). That way we’ll get, for free, all child pages. We could eventually emulate pagination with JavaScript but the output would remain the same.
Be lean about what libraries we use and import, we want page generation to be quick

Requirements

Capacity to get all changed documents in Git history, and run generator only on them
Use raw JavaScript, no CoffeeScript, nor work exclusively with configuration in JSON files (e.g. Grunt vs Gulp)
Contents written in text files in Markdown, so we can browse in GitHub the contents and use it as a publication platform
Leverage Git and GitHub to manage contents. Anybody can fork or clone the repo, contributors can push to master, and if somebody don’t have a GitHub account, he still can create a patch. No need to create an account system, target audience (web developers) commonly knows how to use source-control
Ability to compile locally the full docs pages and visualize locally
Capacity to easily adjust (i.e. raw JavaScript) maintenance scripts e.g. regenerating pages, compiling SASS/LESS, send generated HTML into an ElasticSearch index, etc.

Bookmarks of things to have an eye on

Use raw HTML as templating plates
Support multiple templating engine? consolidate
Handle templates asynchronously? QEJS
Use MultiMarkdown (and that too?) format so that it would be possible to convert to other formats and give more liberty in the syntax. (proposed by Mike Siera)

Thoughts

Get last changed pages between commits and regenerate HTML only to them

Most static sites generator would delete the output folder and recreate every files.

Maybe we won’t be able to find a static site generator that we can tell explicitly which files we want to regenerate.

Its assumed that the reason is that many plugins would generate page listing or generate "tag clouds" and they would need every bit of content. This kind of feature seems nice but might not be useful for a documentation site. While we want to have list of sub pages, we could use instead the web server to list child folders. That way we can speed up page generation and allow us to make the static site generator to compile exclusively the files we changed by isolating them in a temporary folder and running the compilation step from there.

With the possible solution to handle regeneration of changed files, we can use Git to know which files changed between two commits.

Here is a starting point with git;

// the HEAD~3 would get all changed files since the last 3 commits
TMPDIR=$(mktemp -d) ; cp $(git diff HEAD~3 --name-only|grep html) $TMPDIR/

Once we get the changed files in html/ folder, we can copy the templates and ask the static site generator to regenerate HTML of those pages only.

As for publishing the changes, we can have a script listening to GitHub hooks and run the compilation step.

Other thoughts

Capability to adjust the generated HTML (i.e. add an id to a title, read more meta in the front matter?)
Capabiilty to load both asynchronously from an HTTP request to an external file and to be used from the static site generator?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly