You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This was my command python photon.py -u "https://en.wikipedia.org/wiki/Tom_Crean_(explorer)" -l 2
and this was the output:
____ __ __
/ __ \/ /_ ____ / /_____ ____
/ /_/ / __ \/ __ \/ __/ __ \/ __ \
/ ____/ / / / /_/ / /_/ /_/ / / / /
/_/ /_/ /_/\____/\__/\____/_/ /_/ v1.3.2
Level 1: 1 URLs
Progress: 1/1
Level 2: 478 URLs
Progress: 478/478
Crawling 1 JavaScript files
Progress: 1/1
Traceback (most recent call last):
File "photon.py", line 385, in <module>
writer(datasets, dataset_names, output_dir)
File "C:\Users\Tejaswa\Documents\GitHub\Photon\core\utils.py", line 85, in writer
out_file.write(str(joined.encode('utf-8').decode('utf-8')))
File "C:\Python38\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0142' in position 17758: character maps to <undefined>
I (think I) added a mapping in lib\encodings\cp1252.py by doing:
.
.
'\xff' # 0xFF -> LATIN SMALL LETTER Y WITH DIAERESIS
'\u0142' # 0xFF -> LATIN SMALL LETTER L WITH DIAERESIS
)
### Encoding table
encoding_table=codecs.charmap_build(decoding_table)
But I doubt this is correct (the hex values are maxed out at \xff too)
Is there any parameter to ignore such encoding problems that I can specify with photon itself? Or some underlying file to edit?
Thanks
The text was updated successfully, but these errors were encountered:
I encountered a similar issue and tracked it to the writer function in utils.py line 83, fixed it like this:
with open(filepath, 'w+', encoding='utf-8') as out_file:
I'm not smart enough to do pull requests or anything...
This was my command
python photon.py -u "https://en.wikipedia.org/wiki/Tom_Crean_(explorer)" -l 2
and this was the output:
I (think I) added a mapping in
lib\encodings\cp1252.py
by doing:But I doubt this is correct (the hex values are maxed out at
\xff
too)Is there any parameter to ignore such encoding problems that I can specify with
photon
itself? Or some underlying file to edit?Thanks
The text was updated successfully, but these errors were encountered: