SLiM reference documentation in a separate cross-platform doc browser would be great #489

bhaller · 2025-01-11T12:37:13Z

Folks such as @petrelharp have said for some time now that it would be great to have the SLiM documentation in a different form that is more easily searchable/browsable. Porting the doc itself out of Pages (a macOS app from Apple) into something like LaTeX would be an immense amount of work, both in itself and because a whole workflow for creating the doc inside SLiMgui is based upon the Pages doc, with a chain going from Pages to RTF to HTML to SLiMgui; I don't see such a port happening any time soon. However, it occurred to me this morning that it would certainly be a possibility to build a new doc-export pipeline that would go from the Pages doc to some other destination – especially one that is easy to derive from the RTF or HTML versions of the doc.

On macOS there is a very nice documentation browser app called Dash; I use it to browse doc for Unix, Qt, C++, etc., as I'm working. I think it might be fairly straightforward to get SLiM's doc into a Dash docset (https://kapeli.com/docsets). But that is only a macOS solution. Is there a good cross-platform doc browser that I could target – perhaps something free and open-source? What do folks on Linux use? @petrelharp please weigh in here; what would you like to see happen here? Feel free to share this issue link with others who you think might be interested.

The ideal solution, from my perspective, would be that somebody reading this issue would take it upon themselves to provide some kind of automated solution. I can imagine a script, running somewhere (maybe inside GitHub Actions?) that would watch the SLiM repo – specifically the RTF or HTML documentation files that live inside it, which derive from the Pages doc – and would produce downloadable docset files, hosted on GitHub or somewhere else, for each new release version and (nightly or weekly) for the current master branch head. Such a script could also potentially create a website based upon the doc, for those who want to view the doc in their browser. I have no idea how the tskit folks make their doc (tagging @benjeffery here), but maybe an auto-generated website like https://tskit.dev/tskit/docs/stable/introduction.html would be possible. I don't know how to make such an auto-running doc-processing script, but I imagine there is someone in the community who does. So, I'm looking for a volunteer who is inspired to make this happen.

@benjeffery I don't imagine you want to volunteer to actually do this (but great if you do!), but since you know more about this kind of thing than anyone else I know, a sketch from you of what you think a good solution would look like might be very helpful. Is running inside GitHub Actions a sensible idea? Where might the produced files – docsets, websites, whatever they are – be hosted? Can a GitHub Action produce new files that live inside the same repository – could the produced files themselves live inside the SLiM repository somehow, and if that is possible, would it be a good idea? And what do you think a good doc solution would look like – is there a doc reader you prefer to use?

For people interesting in looking into this: as mentioned above, the doc is all available inside the SLiM repo. RTF versions of the SLiM doc live inside SLiMguiLegacy (called SLiMgui in the repository, for historical reasons), such as https://github.com/MesserLab/SLiM/blob/master/SLiMgui/SLiMHelpClasses.rtf. RTF versions of the Eidos doc live inside EidosScribe, such as https://github.com/MesserLab/SLiM/blob/master/EidosScribe/EidosHelpFunctions.rtf. HTML versions of both SLiM and Eidos doc live inside SLiMgui (called QtSLiM in the repository, for the same historical reasons), such as https://github.com/MesserLab/SLiM/blob/master/QtSLiM/help/SLiMHelpClasses.html. All the other doc files can be found in the same directories, with .rtf or .html extensions. The set of filenames used has not changed in quite a while, and is probably unlikely to change in future. (The Pages doc is not available directly online; it is in a proprietary binary format, so I'm not sure it would be very useful to start from anyway, and GitHub doesn't really like hosting large binary files – the SLiM manual is 36.7 MB at present.)

So. Any takers?

The text was updated successfully, but these errors were encountered:

bhaller · 2025-01-11T12:44:52Z

Tagging @bryce-carson @currocam @grahamgower as people who seem like they might possibly be interested in this. :->

currocam · 2025-01-11T13:17:19Z

Hi!

It would certainly be possible to host a documentation website easily with "GitHub pages". One option would be to create a separate branch with static HTML files and deploy those.

I guess it is not strictly necessary to create these html files inside a GitHub Action, but could be something manual. Still, it would be pretty nice!

I've created documentation websites for projects with specific tools to extract docs from the comments in the source code, or from markdown files with MKdocs. I have no experience with RTF files or specific MacOS formats whatsoever.

I'm happy to help with (1) setting the deployment of the website from built html static files, (2) scripting the CI to build those html files if someone knows how :) My intuition is that something that just works should be fairly easy, but something prettier and with a nice searchable menu (such as the tskit one) would require lots of CSS and JavaScript extra effort (although perhaps not!)

bryce-carson · 2025-01-11T18:16:33Z

It would certainly be possible to host a documentation website easily with "GitHub pages". One option would be to create a separate branch with static HTML files and deploy those.

Yep, deploying to Pages from a docs branch would be the easiest.

I'm happy to help with (1) setting the deployment of the website from built html static files, (2) scripting the CI to build those html files if someone knows how :) My intuition is that something that just works should be fairly easy, but something prettier and with a nice searchable menu (such as the tskit one) would require lots of CSS and JavaScript extra effort (although perhaps not!)

Simply copy the appropriate folders from the main branch to the docs branch.

An index.html file can be generated quite easily using one of an innumerable number of static site generators. There are a nauseating set of existing GitHub workflows to choose from. With some searching you could simply choose one that does what we want and then change the paths appropriately.

Hypertext

I think the biggest issue will be the lack of hyperlinks from one part of the text to another. I imagine going from Pages to RTF and HTML doesn't add anything new, and I don't think the Pages documentation we have uses hyperlinks.

Simply having the HTML available in a separate branch to be hacked upon is a step in the desired direction, though.

benjeffery · 2025-01-13T14:13:18Z

The tskit-related repos all use sphinx often through jupyterbook to generate their docs from a set of markdown files. This works well as it allows deep linking not only intra-repo but also inter-repo. The sphinx system allows building to html, pdf and tex.

As for deployment we use a custom github action that fetches all the different docs across the ecosystem and builds them. This runs when ever a push happens, and once a day on a cron job: https://github.com/tskit-dev/tskit-site/actions/runs/12746400032
This action pushes the HTML to a named branch of the repo that just contains a single commit of the site, which is then auto deployed on github pages.

Looking through the SLiM manual it looks like it would be not too much work to convert to sphinx, looks like you could get an LLM to most of the work.

petrelharp · 2025-01-13T14:53:33Z

This would be great, especially if it ended up with something that is more searchable than the manual (for instance, one where we can search for "pages that have both strings A and B"; since in the PDF you can just search for a single string).

Ah, hold on - @bhaller, is the full text of the manual in here somewhere? Or, just the API docs?

bhaller · 2025-01-13T15:07:39Z

This would be great, especially if it ended up with something that is more searchable than the manual (for instance, one where we can search for "pages that have both strings A and B"; since in the PDF you can just search for a single string).

Ah, hold on - @bhaller, is the full text of the manual in here somewhere? Or, just the API docs?

Just the API docs. The full text of the manual would be considerably harder to vend online with the present doc architecture, I think; it has lots of embedded images, for one thing, and also lots of syntax-colored scripts, sometimes tables and bullet lists, etc. The reference doc is much simpler in its structure and formatting.

As for searchability of the PDF, that is of course not a limitation of PDF, but of a particular PDF reader; perhaps there's a different PDF reader that would provide better search? But this is a general critique of software from PDF readers to email clients to file managers – search always seems to be a primitive and neglected feature for some reason. :-< That's why I was wondering about maybe vending to a doc reader, like Dash on macOS, rather than just to HTML; good doc readers like Dash do tend to have good search, whereas HTML does not intrinsically have much searchability at all; you have to rely on Google, or build your own search engine on top of your HTML website.

petrelharp · 2025-01-13T17:31:12Z

Well, having the API linkable would be very nice.

I actually wasn't aware there was something called a "doc reader"! Looks like the linux analogues are zeal and devdocs, but I'm not impressed by their documentation, ironically.

bhaller · 2025-01-13T17:55:33Z

I actually wasn't aware there was something called a "doc reader"!

To me, reading doc in a client-side app is just a way better experience than reading it in a web browser. Even when a web interface is done relatively well, as with the tskit doc, a client app is just easier to use and, crucially, faster. I kinda hate that everything in the world is turning into HTML. But YMMV. The really hard thing about providing a good documentation user experience is that every user has a different idea of what that would look like!

bryce-carson · 2025-01-13T21:45:34Z

Looking through the SLiM manual it looks like it would be not too much work to convert to sphinx, looks like you could get an LLM to most of the work.

Better, more specialized tooling already exists which would do most of the work.

https://www.publisha.org/conversion/markdown/2017/02/10/frompages2markdown-md/

@bhaller said that the current pipeline goes from Pages (documenting the API) to RTF then HTML.

Ideally, @bhaller should create a new repository under the MesserLab containing the SLiM and Eidos Pages format manuals, or at least create a branch focused on documentation alone or simply a new folder. This should contain all documentation: the manual, the algorithm flowcharts, diagrams, everything that exists.

We can use this as a starting point to hack together an ideal solution and eventually we will end up with HTML (which is in some respects just XML) which is a good interchange format for documents. We can then still support the current authoring workflow from Pages to HTML for SLiMgui and then have HTML we can use for anything else we want to target.

bhaller · 2025-01-14T19:37:37Z

Better, more specialized tooling already exists which would do most of the work.

https://www.publisha.org/conversion/markdown/2017/02/10/frompages2markdown-md/

That's interesting; Pandoc looks potentially useful, if it actually works and can handle a document the size of the SLiM manual. Seems like a good direction to explore, although then we will depend upon that specific tool. It looks like it is actively maintained, though.

@bhaller said that the current pipeline goes from Pages (documenting the API) to RTF then HTML.

Ideally, @bhaller should create a new repository under the MesserLab containing the SLiM and Eidos Pages format manuals, or at least create a branch focused on documentation alone or simply a new folder. This should contain all documentation: the manual, the algorithm flowcharts, diagrams, everything that exists.

OK, I've made a new repo called SLiM-Doc. It is private, at least for now; it's a place where we can experiment. I invited everybody who has commented in this chat. I've put the original Pages doc there, and the Word export, for both the Eidos and SLiM manuals. Those should contain all of the images and such, embedded. I used the version 4.3 manuals, since the current manuals are very much under construction for the work I'm doing. If we come up with a workflow that can process this doc, presumably it will work for the next version's doc equally well; the format is pretty stable, although chapter/section numbers are always in flux and should not be hard-coded anywhere. I made a separate repo because these files are fairly large, and do not (and probably never will) live in the SLiM repo itself; so doing all this in a branch just sounded like a hassle, and might bog down usage of the main SLiM repo. Also, I did want to keep it private for now.

We can use this as a starting point to hack together an ideal solution and eventually we will end up with HTML (which is in some respects just XML) which is a good interchange format for documents. We can then still support the current authoring workflow from Pages to HTML for SLiMgui and then have HTML we can use for anything else we want to target.

OK, sounds good. Don't worry about the SLiMgui path for the time being; maybe that will eventually get replaced by some other workflow here, but for now the Pages > RTF > HTML workflow that I have works fine for that, and it might be fussy to replace since SLiM is looking for the HTML doc to be in quite a specific form (in order to build the index for the help window and such).

If Pandoc does end up being a good solution, there will doubtless be aspects of it that fall short. One key question is whether it captures the text styling, such as body text vs. (monospace) code, code syntax coloring, etc. Another is whether it understands where images are placed in the text flow, or if it just dumps all the embedded images to a folder and you have to place them into the text flow yourself on the other end. Another is places where the doc has a table, or a bullet list, etc. If some elements, like tables, need to be replaced with a figure graphic in PDF or something, that can be done on the Pages side to improve the Pandoc output; but if it doesn't even understand code formatting, then it probably fails the test. We shall see! Have at it!

bhaller added enhancement help wanted long-term labels Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLiM reference documentation in a separate cross-platform doc browser would be great #489

SLiM reference documentation in a separate cross-platform doc browser would be great #489

bhaller commented Jan 11, 2025

bhaller commented Jan 11, 2025

currocam commented Jan 11, 2025

bryce-carson commented Jan 11, 2025

benjeffery commented Jan 13, 2025

petrelharp commented Jan 13, 2025 •

edited

Loading

bhaller commented Jan 13, 2025 •

edited

Loading

petrelharp commented Jan 13, 2025

bhaller commented Jan 13, 2025

bryce-carson commented Jan 13, 2025 •

edited

Loading

bhaller commented Jan 14, 2025

SLiM reference documentation in a separate cross-platform doc browser would be great #489

SLiM reference documentation in a separate cross-platform doc browser would be great #489

Comments

bhaller commented Jan 11, 2025

bhaller commented Jan 11, 2025

currocam commented Jan 11, 2025

bryce-carson commented Jan 11, 2025

Hypertext

benjeffery commented Jan 13, 2025

petrelharp commented Jan 13, 2025 • edited Loading

bhaller commented Jan 13, 2025 • edited Loading

petrelharp commented Jan 13, 2025

bhaller commented Jan 13, 2025

bryce-carson commented Jan 13, 2025 • edited Loading

bhaller commented Jan 14, 2025

petrelharp commented Jan 13, 2025 •

edited

Loading

bhaller commented Jan 13, 2025 •

edited

Loading

bryce-carson commented Jan 13, 2025 •

edited

Loading