Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandoc adapter fails on *.htm files #205

Open
tionis opened this issue Jan 30, 2024 · 4 comments
Open

pandoc adapter fails on *.htm files #205

tionis opened this issue Jan 30, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@tionis
Copy link

tionis commented Jan 30, 2024

Describe the bug
When searching across ebooks the pandoc adapter fails due to an Unknown input format htm error in pandoc.
I originally wanted to solve this by defined as custom adapters but then both are in conflict with each other.

Possible Solution
Instead of doing something like

Command { std: "pandoc" "--from=htm" "--to=plain" "--wrap=none" "--markdown-headings=atx", kill_on_drop: false }

do something like that

Command { std: "pandoc" "--from=html" "--to=plain" "--wrap=none" "--markdown-headings=atx", kill_on_drop: false }

for htm files.

Operating System and Version
Manjaro Linux

Output of rga --version
ripgrep-all 0.10.6

@tionis tionis added the bug Something isn't working label Jan 30, 2024
@KlyithSA
Copy link

KlyithSA commented Apr 21, 2024

This custom adapter works without conflicting with normal html:

        {
            "name": "htm custom",
            "version": 1,
            "description": "fix for https://github.com/phiresky/ripgrep-all/issues/205",

            "extensions": ["htm"],
            "mimetypes": ["application/x-extension-htm"],

            "binary": "pandoc",
            "args": ["--from=html", "--to=plain", "--wrap=none", "--markdown-headings=atx"],
            "disabled_by_default": false,
            "match_only_by_mime": false
        }

@tionis
Copy link
Author

tionis commented Apr 22, 2024

Oh right, I forgot to update the issue!
I figured out a similar config, but the standard config should probably still handle this correctly.
Thanks though!

@g-berthiaume
Copy link

g-berthiaume commented Apr 26, 2024

Hi!
I think I just encountered the same issue.

G:\...\myfile.htm.txt adapter: postprocprefix
Unknown input format htm
Error: copying adapter output to stdout

Caused by:
    0: subprocess: Command { std: "pandoc" "--from=htm" "--to=plain" "--wrap=none" "--markdown-headings=atx", kill_on_drop: false }
    1: ExitStatus(ExitStatus(21))

The custom adapter doesn't work for me.
That said, it could be my fault: I never used custom adapters before.

@tionis
Copy link
Author

tionis commented Apr 26, 2024

You have to add the config for the custom adapter in the rga-config. Depending on system configuration the location may vary, but normally it should be at ~/.config/ripgrep-all/config.jsonc.
Mine for example looks like this:

{
  "$schema": "./config.schema.json",
  "custom_adapters": [
    {
      "name": "htm-pandoc",
      "version": 1,
      "description": "Uses pandoc to transform htm files",
      "extensions": ["htm"],
      "mimetypes": ["application/x-extension-htm","application/htm"],
      "binary": "pandoc",
      "args": ["--from=html", "--to=plain", "--wrap=none", "--markdown-headings=atx"],
      "disabled_by_default": false,
      "match_only_by_mime": false
    },
    {
      "name": "gron",
      "version": 1,
      "description": "Transform JSON into discrete JS assignments",
      "extensions": ["json"],
      "mimetypes": ["application/json"],
      "binary": "gron",
      "args": [],
      "disabled_by_default": false,
      "match_only_by_mime": false
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants