Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libretexts libraries #1035

Open
Popolechien opened this issue Jun 10, 2024 · 12 comments
Open

Libretexts libraries #1035

Popolechien opened this issue Jun 10, 2024 · 12 comments

Comments

@Popolechien
Copy link
Collaborator

  • Website URL: https://libretexts.org/platforms/libraries/
  • License: Creative Commons
  • Desired ZIM Title: Libretexts XX Bookshelf (see list
  • Desired ZIM Description: Textbooks curated by the LibreTexts team
  • Desired ZIM Icon –png (URL or attach one): Pending. Will be put on drive.farm.openzim.org (though I see it already in the test that was done with Engineering currently on dev.library.kiwix.org)
  • Language (ISO 639-3): eng
  • Is this a MediaWiki?: no

The following libraries should be zimed up:
Biology
Business
Engineering
Geoscience
Medicine
Humanities
K-12 Education
Mathematics
Physics
Social sciences
Statistics
Workforce
Espanol
Ukrayinska

@Popolechien
Copy link
Collaborator Author

Icon is available in drive.

@benoit74
Copy link
Contributor

Thank you, planned for end of summer as discussed

@benoit74
Copy link
Contributor

For the record, I'm beginning work on this project.

I've created a project where I've initialized first discovery tasks on https://github.com/openzim/librechef/

I'm now investigating if it is better to update (and maintain in the future) this sushichef recipe, or if we should rather change our plans and use zimit scraper. Very first test with zimit seems to indicate it is at least not impossible.

@benoit74
Copy link
Contributor

Regarding the last point, using zimit is probably not even an option. The "killer" reason is that it is impossible to compress Youtube videos currently. Or at least without significant investment in warc2zim, but this has even already been discussed as a nogo, we do not want to begin to alter things in warc2zim more than necessary for HTML/JS to work.

There are also some significant issues which have been discovered in the first try (see dropped issues in project), not speaking about custom CSS and behaviors needed to make everything in place.

The balance would probably have been quite even without the first "killer" reason, but here it is!

ATM, I hence consider that using kolibri scraper is the way to move this forward and I've already found how to fix most obvious issues to make something run end-to-end and create a very first draft ZIM. Only need to assemble it in meaningful PRs ready to review ^^

And of course, there are still a bunch of issues to solve at kolibri v2 level.

@benoit74
Copy link
Contributor

After investigating a bit more into librechef and kolibri, I now consider this strategy is also not the optimal one.

librechef imposes a set of constraints (e.g. navigation by topic, no description longer than 200 chars on topics, ...) which are going to be painful.

librechef is hard to debug, for instance fixing a UI bug only present in HTML is requiring to rerun the whole process of crawling the website, pushing to the Studio, creating the ZIM

Using librechef also poses problems in term of deployment: we have no idea how to run this in production or at least we know it is going to be a "trick" (see kiwix/operations#262)

Another concern is that librechef is based on ricecooker which seems to be barely maintained (e.g. it still depends on Python 3.10 while we already have 3.11 since 2 years, and 3.12 since 1 year)

All this could have made sense if we knew that we were going to have more and more kolibri channels to ZIM, but it does not looks like it is going to happen in the coming months.

So I now consider we have to serisouly consider creating a new scraper libretext because this is going to be the cheapest solution in term of initial creation AND in terms of maintenance. Not speaking about the fact that it will reduce our dependency to a partner (should Kolibri Studio stop working, librechef would become a problem as well).

I will investigate this way forward in the coming days.

@benoit74
Copy link
Contributor

Some remarks:

  • suggested title in first comment is too long for the 30 chars limit: I propose xx LibreTexts
  • there is not only bookshelves but also courses + all libraries would have the same description + it does not mention libretexts (for the ones focusing more on the description than the title), so I suggest to update description to use xxx courses and bookshelves from LibreTexts.org
  • the icon is automatically populated from online icons (supposed to be identical to the ones at https://libretexts.org/platforms/libraries/, tbc)

@benoit74
Copy link
Contributor

For the record, I've created requested recipes: https://farm.openzim.org/recipes?name=libretexts.org

What is not yet requested / configured:

  • all global (French, Portugues, ...) => all probably easily feasible if we decide we want
  • chemistry => biggest one ... should be feasible, but I'm glad we do not have to do it, might have been a pain in terms of duration and hence risk of interruption by upstream issue ^^

@Popolechien
Copy link
Collaborator Author

Well here is the bad news: we should do chemistry. Not having it listed was an oversight, obviously.

@benoit74
Copy link
Contributor

First ZIMs are mostly ready

We have 3 ZIMs close to be ready at https://dev.library.kiwix.org/#lang=eng&q=libretexts (stats, k-12 and workforce, ignore others if they are still here).

@Popolechien @kelson42 @rgaudin can you please make a first review of these?

The known remaining issues are :

  • Licensing, Detailed Licensing and Table of Content pages are not fetched
  • what is broken online ... is still broken inside the ZIM

Foreign languages split

It was requested to create one Espanol and one Ukrayinska ZIM. I think this is wrong. Both collections are extremely huge. I think we should create one ZIM per topic, just like in English: one ZIM per topic in https://espanol.libretexts.org/ and one ZIM per topic in https://ukrayinska.libretexts.org/

Are we OK?

It means that I also need ZIM title and description both in Spanish and Ukrainian (current Zimfarm recipe I've configured are plain wrong, they even say that the ZIM content is in English ...)

@rgaudin
Copy link
Member

rgaudin commented Nov 25, 2024

Great work! I spent a few minutes browsing stats and it all looked very clean and complete. Only things I noticed:

  • slight delay showing the latex code before it is rendered. Not an issue at all. I guess it's the same online anyway.
  • Suggestions are not much useful: Since those list the titles of chapters (with number) but one doesn't know which book those are part of.
  • Search results contain no extract. Only the entry title.
  • The careful printing style is prepended by an ugly first page due to kiwix-serve.

@benoit74
Copy link
Contributor

slight delay showing the latex code before it is rendered. Not an issue at all. I guess it's the same online anyway.

We can live with it I think, I don't want to open an issue which will live here forever because there is no real solution and limited added value anyway.

Suggestions are not much useful: Since those list the titles of chapters (with number) but one doesn't know which book those are part of.

I've opened openzim/mindtouch#94 to discuss this

Search results contain no extract. Only the entry title.

I've opened openzim/mindtouch#95

The careful printing style is prepended by an ugly first page due to kiwix-serve.

openzim/mindtouch#17 (comment) and kiwix/libkiwix#1163

@benoit74
Copy link
Contributor

Discussed live: we don't need Espanol and Ukrayinska in fact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants