Skip to content
This repository has been archived by the owner on Apr 26, 2020. It is now read-only.

Remove controls characters from input HTML #14

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Remove controls characters from input HTML #14

wants to merge 1 commit into from

Conversation

jagermesh
Copy link

There is a problem with control characters for server with libxml 2.6.7 (most of current Linux) servers. In some cases HTML become incorrect (extra closing/opening body/html tags added):

Input:

string(128) "<html><head><meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
</head></body><b>BEL</b><b>normal</b></body></html>"

Output:

string(250) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><meta http-equiv="Content-Type"
content="text/html;charset=UTF-8"></head>
<body><b></b></body>
<html><b>normal</b></html>
</html>
"

There is a problem with control characters for server with libxml 2.6.7
(most of current Linux) servers. In some cases HTML become incorrect
(extra closing/opening body/html tags added):

Input:

string(128) "<html><head><meta http-equiv="Content-Type"
content="text/html;charset=UTF-8"
></head></body><b>BEL</b><b>normal</b></body></html>"

Output:

string(250) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><meta http-equiv="Content-Type"
content="text/html;charset=UTF-8"></head>
<body><b></b></body>
<html><b>normal</b></html>
</html>
"
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant