FeatureRequest/HelpNeeded: highlight is not an exact subset of the text content #179

thiswillbeyourgithub · 2024-02-24T13:27:05Z

Hi,

I'm the dev behind LogseqMarkdownParser and am working on a small script to directly turn highlights into anki flashcards.

It's not yet working because I'm running into an issue with text formats.

You see, I don't just want the highlight to be sent to anki, I want to grab the 1000 ish characters before and after the highlight, make a cloze card (= putting a hole in the text and you have to guess the content) with the highlight then sending that to anki.

The main issue I have is that for example I have this highlight:
For example, suppose ΔW is the weight update for a weight matrix W∈RA×B.
And the relevant section of text is this:
For example, suppose \\(\\Delta W\\) is the weight update for a weight ' 'matrix \\(W \\in \\mathbb{R}^{A \\times B}\\).

I'm guessing this is mathjax.

I can't seem to find a good python lib to parse mathjax into text, or text into mathjax, let alone reliably.

So is it possible to:

Either add {{{rawText}}} for the highlight, that would not be parsed (so would still contain the mathjax)
Or parse the content of the article just like the highlight (currently only the highlight is parsed to text)
Also, it seems the position highlight is broken because they are all equal to 0 on my end. Is this normal?

Thanks!

The text was updated successfully, but these errors were encountered:

thiswillbeyourgithub · 2024-03-14T19:08:36Z

Hi ! Just a quick bump as I would really like to wrap up my project while I got some free time :) But if you can't find the time to take a look it totally fine of course!

jacksonh · 2024-03-15T06:40:51Z

Hi i think what you are seeing in the highlight text is raw text or at least markdown. Can you post a screenshot of the highlight itself?

thiswillbeyourgithub · 2024-03-16T09:32:26Z

Here's the highlighted section of the text:

The article link is that one: https://sebastianraschka.com/blog/2023/llm-finetuning-lora.html

thiswillbeyourgithub · 2024-03-23T14:34:12Z

Hi,

I decided to go the "most robust way" anyway and implement a function that finds the best substring in a corpus that matches the highlight. This is computationaly intensive and probably will be an issue for very long texts but at least I can move on towards finishing this.

When I finish this project, if I think it's worth it I'll come back to you to see if that's worth a mention in a blog post or whatever :)

In the meantime, although I still think my request is legit and someone might have a real need for more precise filter access in the API, I'll let you decide if you want to close this or not :)

Have a nice day!

thiswillbeyourgithub changed the title ~~FR/Help needed: Omnivore to Anki~~ FeatureRequest/HelpNeeded: highlight is not an exact subset of the texte content Mar 14, 2024

thiswillbeyourgithub changed the title ~~FeatureRequest/HelpNeeded: highlight is not an exact subset of the texte content~~ FeatureRequest/HelpNeeded: highlight is not an exact subset of the text content Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FeatureRequest/HelpNeeded: highlight is not an exact subset of the text content #179

FeatureRequest/HelpNeeded: highlight is not an exact subset of the text content #179

thiswillbeyourgithub commented Feb 24, 2024

thiswillbeyourgithub commented Mar 14, 2024

jacksonh commented Mar 15, 2024

thiswillbeyourgithub commented Mar 16, 2024

thiswillbeyourgithub commented Mar 23, 2024

FeatureRequest/HelpNeeded: highlight is not an exact subset of the text content #179

FeatureRequest/HelpNeeded: highlight is not an exact subset of the text content #179

Comments

thiswillbeyourgithub commented Feb 24, 2024

thiswillbeyourgithub commented Mar 14, 2024

jacksonh commented Mar 15, 2024

thiswillbeyourgithub commented Mar 16, 2024

thiswillbeyourgithub commented Mar 23, 2024