You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<html><head><body>
Main content
<br/><imgsrc="https://cdn.example-domain.com/example1.jpg"/>
More content
<br/>
More Content to Simulate main content.
<imgsrc="https://cdn.example-domain.com/example2.jpg"/></body></html>
Call the api with the path /parse-html. The API takes a POST with a JSON object containing a URL and HTML. The HMTL is the HTML as provided in step 1 but is first converted to the following format:
<html>\\n<head>\\n<body>\\nMain content\\n<br/>\\n<imgsrc=\"https://cdn.example-domain.com/example1.jpg\"/>\\nMore content\\n<br/>\\nMore Content to Simulate main content.\\n<imgsrc=\"https://cdn.example-domain.com/example2.jpg\"/>\\n</body>\\n</html>\\n
and the URL value that is passed is https://www.example-domain.com
The JSON result content being returned contains the main content including the images. The image values are however corrupted:
Am I using the API in a correct way? I could not find any documentation so this is a bit of reverse engineering.
The reason for not doing this directly, i.e. using the /parser?url=..... is that I am trying to work around a problem where a TypeError is returned. See. The page gives back a 202 which the parser cannot handle. I am now downloading the content and try to pass the HTML into the API as a workaround instead. Unfortunately it doesn't react as I expected it would.
The text was updated successfully, but these errors were encountered:
Expected Behavior
The parser should not corrupt the <img> content.
Current Behavior
The <img> tag originally is
and after parsing
Steps to Reproduce
/parse-html
. The API takes a POST with a JSON object containing a URL and HTML. The HMTL is the HTML as provided in step 1 but is first converted to the following format:and the URL value that is passed is
https://www.example-domain.com
Question/Comment
Am I using the API in a correct way? I could not find any documentation so this is a bit of reverse engineering.
The reason for not doing this directly, i.e. using the
/parser?url=.....
is that I am trying to work around a problem where a TypeError is returned. See. The page gives back a 202 which the parser cannot handle. I am now downloading the content and try to pass the HTML into the API as a workaround instead. Unfortunately it doesn't react as I expected it would.The text was updated successfully, but these errors were encountered: