<img ....> gets corrupted when using parse-html #20

fappelman · 2019-05-06T18:09:40Z

Mercury Parser API Version:Latest
Node Version:8

Expected Behavior

The parser should not corrupt the <img> content.

Current Behavior

The <img> tag originally is

  <img src=\"https://cdn.example-domain.com/example1.jpg"/>

and after parsing

 <img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/">

Steps to Reproduce

Take the following HTML

<html>
<head>
<body>     
Main content
<br/>
<img src="https://cdn.example-domain.com/example1.jpg"/>
More content
<br/>
More Content to Simulate main content.
<img src="https://cdn.example-domain.com/example2.jpg"/>
</body>
</html>

Call the api with the path /parse-html. The API takes a POST with a JSON object containing a URL and HTML. The HMTL is the HTML as provided in step 1 but is first converted to the following format:

<html>\\n<head>\\n<body>\\nMain content\\n<br/>\\n<img src=\"https://cdn.example-domain.com/example1.jpg\"/>\\nMore content\\n<br/>\\nMore Content to Simulate main content.\\n<img src=\"https://cdn.example-domain.com/example2.jpg\"/>\\n</body>\\n</html>\\n

and the URL value that is passed is https://www.example-domain.com

The JSON result content being returned contains the main content including the images. The image values are however corrupted:

<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/">
<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example2.jpg/%22/">

Question/Comment

Am I using the API in a correct way? I could not find any documentation so this is a bit of reverse engineering.

The reason for not doing this directly, i.e. using the /parser?url=..... is that I am trying to work around a problem where a TypeError is returned. See. The page gives back a 202 which the parser cannot handle. I am now downloading the content and try to pass the HTML into the API as a workaround instead. Unfortunately it doesn't react as I expected it would.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<img ....> gets corrupted when using parse-html #20

<img ....> gets corrupted when using parse-html #20

fappelman commented May 6, 2019

<img ....> gets corrupted when using parse-html #20

<img ....> gets corrupted when using parse-html #20

Comments

fappelman commented May 6, 2019

Expected Behavior

Current Behavior

Steps to Reproduce

Question/Comment