Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About some web content encoding issues #197

Open
l3m0n opened this issue Mar 26, 2019 · 7 comments
Open

About some web content encoding issues #197

l3m0n opened this issue Mar 26, 2019 · 7 comments

Comments

@l3m0n
Copy link

l3m0n commented Mar 26, 2019

I found that some web content will be encoded as \ufffd
E.g:
010-4200-4200.com
010apartment.com

The data stored in the database is garbled.

@imfht
Copy link

imfht commented Sep 19, 2019

you can modify the code and base64encode it before output.
have a lootk at -> https://github.com/commonscan/zgrab2/blob/a7a93f3f5da9fc7a09a57274cc6799d24844266d/modules/rapid7banner/scanner.go#L167

@p-l-
Copy link

p-l- commented Jun 2, 2020

@imfht do you have, by any chance, a copy of the content? The link is dead now and I could also use a base64 encoding for the output.

@mzpqnxow
Copy link
Contributor

@l3m0n I could be wrong but I believe this is the correct way to encode these characters in JSON. If you load using a standards compliant JSON loader (e.g. Python3 json.load or Pandas pandas.read_json()) it should come out as expected

Are you processing the JSON data without actually de-serializing theJSON? if so, that is expected behavior

@p-l-
Copy link

p-l- commented Jun 22, 2021

@mzpqnxow I have issues with binary content: many (different) non-printable chars are encoded as \ufffd, which makes the output unusable, regardless of the tool used to read the JSON file.

@mzpqnxow
Copy link
Contributor

mzpqnxow commented Jun 22, 2021

@p-l- here is a capture from 010apartment.com - though I'm not having any issues with any apps or libraries in loading it, despite the presence of the escaped characters. Maybe it's helpful to you, maybe not:

echo '/Td6WFoAAAFpIt42A8CcKa6UAyEBFgAAXA9FJuDKLRSUXQA9iIiHM/vVEuWguaLfwdGA0sKn8mUPuAItNB8h4ttMUuZpvPs5+Wa+nnJbCpb+KB88aVMEEJH3AWVwB6ksnfjE1ZEfg5o9vFlhYeiHFUmqEId3UKtQfTcMzz4sjVU84r05U9Nlv4mG8gJqnRcMjrAZ52egmGCBsRoTRhSpM1vqwQEQfpxBv7xU9dnN9dQty5zueBSm0fzKySgu+yf7oq4RJlbvd1nT++lyTE6X3TK6MYD/ZO0xM8YVcQU0hwO7xq3VYqSMd3oZzrFXldbMpmyo3GE9Ezao8vzdcXR2IvwDCbO6sRh0d9D/PyNLluwINEQKJKa6prAWjGPsa0iU2lIgKqza7/rrsjFedsml1z93f6tfX1LHTYBIV5WzzoA3z3AjY9BKug/t1NrHqXae5MrrIYE/KoKA+IWTDGIBmqLjVL4KA7mDIZbhPJe1Rv3toep4++l+Pbg2rdU8lQsJcr587IV54TLgfNvj324dHiiqvW2PA19Oicc8jBiTuQD8fk2ynTcgjueK5KTDB5/3jyURg/h/DWgYgvCuIsoTLbjuLVlCqEiyFXj4FAZ0Xi/EBgv7buJm5Ek+kiQcEqln+4zhzr10hkf7JbxR6GaWjzsNXO9ipHJN55ioy+TMd+Xk5AdDrfk7gpXbju2JkExz4jVenKJZ7ThBCWcDejcnq6oWJr+wuUqzzIowrgNH42RPvwfWGK6WTCMqFfH2FTItkRNPSB5eovs4Tf61UPeWiyuUJ4RYyT8Udy2F9S+IoSwe3B+u68LyNIqfBIjrYJOWlj5NFSaxduuDCkdc4605dYecdNH/dhkkS1F8qUrBFnJWhzMAix7w8uxyqs4gjasnSma2MiAJBb2qAJALkxlYkJxzNXvLTmaHWHTQILYDC8sEm9ckE9qPWeviA8MdEE4zBfEY5jh35X/JzGOFRyd9UUX8XUHAjPbvvIDIsbuiBTNEc1IxOIIUjDDpV74MYplaBQRVLqe60OmMp5MDNZZUHLnbLWahVI5tnMMSTrospq9U5H0ZKVnjNMJtjn77pd7Sx4jiV9RtcHmvMws7HBxfYZ+Mermnx0GV1kpWtWRv2WSac7yuQarnXiCg3oEcHwligUsq9E2M7vTwO9k03zsbm7bwVJ9HxYtr5QCQIpB9RNRomGLkXgbp4uUyzwuqVuvkh4lt3yMesHiFxN8XXpat/JC+Aq+G9q3aKr0YG6EBpwoEkzq1lK26WmoBK8ViXsBay8OSWJ7ytGHLZ30abTiGfwu5SEX02pYkl3WoL1BvnDTqgz9UzIR6E0J5MBifZg1ESn1hyLK2T51V12Nilq8zD1JZoE1zjvTlz2i5B5rXlPar1nAn2s24pL+69hStFcFIELyjNeCZBO8avCB6Mk51j5XiyDSffGPQGbtU0U5NI7fOHWDyfRV9eLj1LiRSe0fRIL6Y61TiBfIJ4LlL4VCckFFmmuya1DRvzU8pWys4woE3xoFPMyG8ute5ypJrZrOiMl5/TThA4PtUMCoIBrKuwwxwE4wcwZ2NvJ9lbx/XzHq+OyBhFPx6Oe6xw5hrZaF/14mBD+5sdwRFaLvk7ILcA4I31lw0HYwtPdXQMcAUOFBqHIdCkV0CTbXf86D9jtwRGhXrLtREQmePjwekqYxJeD1UXqpqo6zul864cjPHJP+/MX0dFy1Asu6gHKNam8Wj6mcB11rLk1YmCNNup0LDEzscuvZ41jAZ1vKhNrKUZLKliM/qo6hOPmdChJ/DzY1Djf7TFXJa9fPnLfhBPkwM7PxcpcxTw7yMsA0wWJeYsXiL5kSPHIyG7VXVbDqS7/CqzpKPSVfl3XRd7WswlqNFxQyPrSz6n207GLtx0d//LN30CFkAqyd+nfkNQ0VhBwqip5bcZDgTNS6en9x0tz7L3RvmcRn7fRM3SAg/nuDO7RYDphQsgQ7SUOMI8HQ+NZxnSuZS2/bNZoyDAbmXYt0BQP7Bj7wKdAXoxRbp739KrHY11T1YxrH7IWEuz3sJdouXUNocj+T+fOokbLYplxtdwDiXgaJcvFB55DwJo9JpA/WvyYqYTEgCLaTwhlJ5gL4l9sEsa/OnmZiM164syFeJNA+B3riBbkoRpDz0QFPkcCsn/fgCgTPM+E9S+U4Gaqur8RVb7tu7hZc+v5XR5qb8nC9OL/lqM0/DhfDuMMg9eRl6aHk3ZCgmP69Pu1YGxbDZdeyAhDv5OxnwTKtUIBZ9VYbhF6f4/PXd0DouKqs93k4icQM39dm+2Dk1Uq+K1dlaFtc5/Og05qz8fjVprZcQ9iY0dohr5ohkWGdSKWzzARKdptB4l8JGflcCHWxMNTnNo9J0+RkGhlzFcjjvyv/6b0fgjZHL/02+kw8BfVdSOQ240A9u84q+5rEexj8R6cRi/P7oss+HtDd4ExmCQs6hfl0SujkqKd4qKa/HO2hvY7JCNq026Cs81dV2AJOKMuypbJIpGUXtu4tlYlVny9X22h8SbBSXsg6zOxPvRyhCSOIjT0ti8Dm0lO5o0Oug7WQcS2moNCkh+h7xXecdPyPVEwor/8eSs8oTO8HNu/L0tDXy/XtXF9bjQrXUOHfiS/wCTz5v/Np3e3fWehBVmKEbNWtA4A4lzDaim9cFaw5USRRGwZFBa5L4YrVf2u/eMqjrn6NvEoMpaeJLSxNzriosKMPcT1FCJ/H2r90H+s4LjW0quRVYCoAjVwbNhnaws7wzAcC/ccVwGZs7aWVRWE+et4tfnzwUNYS96BwdJh4VkBlrev3Tiy5BYbTnMkhhg9wzlrA2j2WizE7sf7kn61Q3n2VwgpaDVc9t0yzEq/A+/eDu9C7xW56dXdWWe0kA3cwJWRaPejNq2qw/d0fVhVgbPvfeAlefqPbw2OdxQ3NREnaZNq8W4fttFYlJuVGAc7CU9b8vgYKVDmMHnSE5ybfwkbk0J9BZAgL7AR9nj0SYGBKXLN7+L+Wyc9FcKcEP8vwOOI7XjwihnZ5yDbnWM986/bpHF5HhPNJiW1qrQDApqd91uiN1AyVGolpVDXH/ikK4Sy82uqHWUSi8yuJqAHqtN0VuFmAVbKUZWVZvBeQMuqQtToJ7Br2/uxzYjd9xjNRdDjLwWn3ujLNIr8C8AETtKa5LXpRBhpYQefnWLgtvgrEOUx7QT20ZV501QsruXhylfnezBIqqOCYyZZrqzXjqIjbuqBhZ3bjp7vswdNJnDpdbVQIopmoWJ4E8HjAF5GejYL6gP+lM4KM3tot+Z2r7zuq8GXJxKRIcBk6ktgmKS82dbZIa8VMYVB1qYrfbUPn7XovWKTpJISSGAELpSsLgiZoJX1k4RwblepeciOsW+qs1uS35EOZ6CC1n0cCtXZ9Qg+9CiScOifSIdRDp3E2IY6YXE/M5ZXHFTSPnogj0wGmgOwhU5ScQR9kMX9M0f7A1HrxDGk5cFLQ3a10lAWK0UaNTkuRdmWaR1fXiP0JB4XyKN2q2aYh3xCuZdY3isT6IosfIopTfsg5pr+ZFpzH0fZMS5jOeN/5kgWlgN3URx011wIgaiOMxoaJsMm5PoYTBcYAKS7AT/oKXqoAFaNaBebU93QVkyDElFPVdLOSRPQGRimgkJ0xSrYmQrAewe5myMZ5aGCL4MD9to2Q6jONp4Qt2/LvbG1EPXN4xmL736pYbOX9CJ898alENPejI2ZqhPBxj6GZlmStHlyTHeBh0dye/VwwgxEbu3ANXZcCBN3uxaoQWYyO2/Ta5bu0DGL4SaEghuEljyU6dDD3mW+2zg/VXboNu405/AQV0bHoiQdkD/g6S+Gcgs+eNvhH/3xrFIRM5WmdTiz0h6rwweF6HMAevDUPx2tny7dadCp+j63nltF8lXWrLcV5cSf5ZfG7hR3qF7sklAm+bx/12gCfv1aWqhPC7b1bKjsevbRIp6Uc7YKvNNbbKdgD24aWc4t597fx3QwvvwXZBrRa4tIILU8id9vELcWBeVqEud41ak42SgEHPpOF8L2r2hb+tl2/8cVaJLm3ylBHeDPNXLlsXkU3Q7lfgE+dUiMaBdeYm8RQansA0Lo/uQho4r7dSYrvQP44rja8JPysX3vXCdQVcDMDlmBF4FbVt90FMbbcJudid1A79h6WtHMSsv7ue8k9DvRDOVOKEtNRYhp7kDZhoxEEJLCXKaNbpVVSlRh2NGvpDQ0f4jkXgAK7n6ueRcOL10EnJ8IoKsS3bfs/wIvhR5/Zgi2jE9anEF+t7z1NoGWFp0yTGKczo7wlXATSxvXjNZWBp2r7NJAEtBBdnqo+FJEeSG4HzWcd7haa0ZjVsPY1BOb3VP1aI6OrN1f1JCVkOgNzdbjnhofhk0YQCWDcNq7b8h+tiWCgTRWdyDRIu3w1DLtD7XEB2ZciJJCh2qZwnJ9Ub5tFsWBtduJWbDilLp2sLkM0Z0rKJ1mZZihUrRckxXXiD/NEkUaGd6yqiy09stzjoP3ep+lpdIml1cvnW2UgUI+zONuTp0rE1OpjdcHJ2JmE5ZTWP7WOd14R1x0ff2Rb+gfEz0Tjbc6EUOHLIo3OcIS3ouYN7ma6DrzmEX+nn0I8nh+rvvhHbo/El0eFtPw1U4jbeMHfiPBYaUcBonYO5ebNOqNnKchEP/rQ6mjrTyj+jAuP6paQ2VDh7JW2o3/2dsCCheFivfb5J7pkC1gdJ8rASwbghR6CLQIj/Dl3BNcaxWqqf0gOv3TWRTXC7cH9jJ3FjED2ejOqgGZBD2jNEgY6qfqdisTYQW9RDXcYmWIIQ9GJx84uuNKYnPLGbzoLxiyr3oxeb+D0RMdXV/LiYh6RCppgiUCWpm2kaKPQg7fVvYYZFbTNNaxcXRltbOzPusH6B2JOCEBxj6+843iCtCaiHGPatKMhp9pJ5JRoQ1DbvySXXYUnJMFWM2FAyelImRNM02lCZUTYwY6+QUw8Q1WDsxr7JnRj61X6GUBgA+9GYFqNHon84eNhOkLJ9QuZD998YnTAunqgZWyllIWfGZyfTJXZsXohmJ27D7UIDDrrxzeskq0Dogvhg1nvn/uqOmuwZvRY/0v7RTHDnDUFEPD66aC1nCyB3p9EKQcc1cajYY4yARvbphoGsFX2BuYybBHx+hk2lV1UbmpU66xl9Ww4S0JLauE/yDeb+tCfl9kGbZ+gpsjzls4I6ncCBMfXSeGjP2r6HZ7SYRUR+KRWlCBbNCgtLyCHjCRwh0l1BS0jra7wHS9iQvO6wJAXV5BXYrZPWUDFSc2H0tdBI81SufpENs7ZQsXHgnMrgg+L+CAm+2/AAm2Ir32No6l8Zr6Qmj8qwoS1Dk13Lrvk0oeoWkkK6dtxmn66fQlQwkZ+px4amcxFTnDkXYbxYzGwbET48Kjs1LfZqUJ0ztvN1Kw+t5OPel6+Q8gNOLc84dYOoJC8P5ph/2xpb2JYK4ZS6StNfPgVD06fkk7+4xpgVKmyHBHisrtVYq9Tw21yDK5gY2cqBIGkwzBr+97xH/JdjBAqMDElI1XBAQLT0HtTUa5U0e0Hay0OuGP8mOORTLHYsBnQ+IxG9yIq0JWN5kdAlW8DfIRyqY8pzBoBhv5PFaoVtF6K/Yx113uvSaEklpOK2AEKXIcoazkythQWqqchMJBFBajN6OiO7dInBJzO9ERQ0SDJQOTzOXVazLikDfCGU40fNR/pZVLzMDwmSePpPJE6z26bhuigJBp3m9Z53RiXwJHPmJo73GWkrHS3U+VXMUqnVN35k0HSmAvFuJGu/AWcgbqZxVYD1/53DIgVOLavoibdfrzK7j0RID66iAXedhwAjwrJNFXNhABXKb7Ho0DbBtW5DaTy4vHNRENoCfPuKCUQlNETeMw+IGnqfgWjqxQPw6VmfC1aJp62zOauW2qKd7j+Izw/a1qccdZYPf/8e0qUG7179W03PhseWWjzrBZtFu130Q+KyRiezyx2VrcQZCD2hjzya8pWygwHA5EmqGaW7CwVDX9gOy8rdWSLcDuAFIzl7EeHwJIOhacg3p0w2UAeOmtqpro2D/GtI6mre744gzSqaPu2d4+ierfxDEfBRzrHDHneTYFLcjJtxeabXY1uAhsLCpQOscwvx5mryG7Iw6f6liEHBb2u8qbXWaF8Ao0FcA116++bY2K7If/yt7/Z8s7CUZW0I7V6o9Vr3XNSNIQRDWUcXbT/OmdhVtdczPiishGhs2ELZiDubBB+3rMPhmf2omdybzYj1p+VAVI2edykCO8T9gkJc5vub19OZ+PI+dm0XG4WBG9BqUtP+CoLpTIRtBu6c/TSE5h9QGmoABerbO1aK3u7xHGDuW6uldhpGLaWsibsiOCd4ZYhkCLt7Sd/39SKssL88a+KQJGlJsLQnQmkSnkZIZKhCtKVWDXCzjT1pRoy/Loe85t4rv2F34xHRvjHvUaja42q5o/vATlY0fDnOXJuugQ0vQRZaO7vvg9e1QjIGx8JB+SvA+zXB7A6If7qfEx60VCa4l12QgM25g3lY50fDOhRWDTEh1Ey/Dqz1/r5cAcWCWLJrbJomV9+c4+0rBhOvW/5KpfSVJ2RdjlfnDtKVG+RAs7oQN7ndjNE+b0ilQSyK0I/mZvlffXiY6BuUvYM53NHNK5icAtEzS0SJGItcF8EtiqU2L/mEdshY8f+DtLQpfNyqtr7v+7PyjAFxj3Joxh5SLaeVTqPbLIaGI8M9GaMB1w3SKPkbBJh0X7f/YNMj+ykvPkFVG/vjkt2IhEj0ex531tVbl16K5gSUYOjzCbYjzC3G87tCpHkvbEFycFx3HLYKFtIY0GKbKNqrLD1ZespB33r6GwoSFyGZuByzYhsWzMUmcVcrb94e0homAkcok4XMZ5VeyTlqlMBV61MWcRE5YRGweovUVhRERUIqUVgL8l4+FKLS145L8ATJhuKTCBUXt2iR2fEkDkn4yruvjXx6FTWN2nyg9rGKFjCY6fnuN776sWnZzZL9nPtBsRWL/jPsacgdVT7ALhqVVPkLzHz4ffm+ijjlVdAzWotmy4Bu8KnCel/3oKXhVxiI2yt7PwN7r7onQIMuivyV4f0oym0V7tiZrAAA88APQAABsCmulAMABKfNaT4wDYsCAAAAAAFZWg==' | base64 -d | xz -d > 010apartment.com.ndjson

@p-l-
Copy link

p-l- commented Jun 24, 2021

@mzpqnxow I have no issues with that particular site. My use case is, for example, making a GET request to /favicon.ico files.

@mzpqnxow
Copy link
Contributor

@mzpqnxow I have no issues with that particular site. My use case is, for example, making a GET request to /favicon.ico files.

Ahhh... that makes much more sense, thanks for clarifying

svbatalov added a commit to svbatalov/zgrab2 that referenced this issue Aug 20, 2021
Conversion of binary responses to UTF8 occasionally yields U+FFFD [replacement characters](https://en.wikipedia.org/wiki/Specials_(Unicode_block))
(see zmap#197, zmap#263). As a result it is not possible to restore the original response.

This introduces the `--hex` option to the `banner` module. When enabled,
the `banner` value will contain server response in hex.

Refs zmap#197, zmap#263
dadrian pushed a commit that referenced this issue Aug 29, 2021
Conversion of binary responses to UTF8 occasionally yields U+FFFD [replacement characters](https://en.wikipedia.org/wiki/Specials_(Unicode_block))
(see #197, #263). As a result it is not possible to restore the original response.

This introduces the `--hex` option to the `banner` module. When enabled,
the `banner` value will contain server response in hex.

Refs #197, #263

#325
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants