Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update for deprecated HTMLParser.unescape for python >=3.9
considering backward compatibility Python 3.9.0 changelog: https://docs.python.org/release/3.9.0/whatsnew/changelog.html
- Loading branch information
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
come from coursera-dl/coursera-dl#778
this works perfectly!
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to use it. i am using coursera-dl
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit
edx_dl/utils.py
file, just like this commit did. copy & paste these code to correct position @Saksham2k15490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it works. Thank you
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worked for me! thanks!
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Fix , thanks it worked for me too.
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I locate this file in the system files?
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did those changes and still didn't fix it (windows).
In the suggested function
if sys.version_info[0] >= 3: import html else: from six.moves import html_parser html = html_parser.HTMLParser()
I get: "Code is unreachable Pylance"
Here is how my utils.py file is:
`# -- coding: utf-8 --
"""
This module provides utility functions that are used within the script.
"""
import six
import sys
if sys.version_info[0] >= 3:
import html
else:
from six import html_parser
html = html_parser.HTMLParser()
import os
import re
import time
import json
import errno
import random
import string
import logging
import datetime
from bs4 import BeautifulSoup as BeautifulSoup_
from xml.sax.saxutils import escape, unescape
from six import iteritems
if six.PY3: # pragma: no cover
from urllib.parse import urlparse, urljoin
else:
from urlparse import urlparse, urljoin
Python3 (and six) don't provide string
if six.PY3:
from string import ascii_letters as string_ascii_letters
from string import digits as string_digits
else:
from string import letters as string_ascii_letters
from string import digits as string_digits
from .define import COURSERA_URL, WINDOWS_UNC_PREFIX
Force us of bs4 with html.parser
def BeautifulSoup(page): return BeautifulSoup_(page, 'html.parser')
if six.PY2:
def decode_input(x):
stdin_encoding = sys.stdin.encoding
if stdin_encoding is None:
stdin_encoding = "UTF-8"
return x.decode(stdin_encoding)
else:
def decode_input(x):
return x
def spit_json(obj, filename):
with open(filename, 'w') as file_object:
json.dump(obj, file_object, indent=4)
def slurp_json(filename):
with open(filename) as file_object:
return json.load(file_object)
def is_debug_run():
"""
Check whether we're running with DEBUG loglevel.
def random_string(length):
"""
Return a pseudo-random string of specified length.
"""
valid_chars = string_ascii_letters + string_digits
Taken from: https://wiki.python.org/moin/EscapingHtml
escape() and unescape() takes care of &, < and >.
HTML_ESCAPE_TABLE = {
'"': """,
"'": "'"
}
HTML_UNESCAPE_TABLE = dict((v, k) for k, v in HTML_ESCAPE_TABLE.items())
def unescape_html(s):
h = html_parser.HTMLParser()
s = h.unescape(s)
s = unquote_plus(s)
return unescape(s, HTML_UNESCAPE_TABLE)
def clean_filename(s, minimal_change=False):
"""
Sanitize a string to be used as a filename.
def normalize_path(path):
"""
Normalizes path on Windows OS. This means prepending
? to the path to get access to
Win32 device namespace instead of Win32 file namespace.
See https://msdn.microsoft.com/en-us/library/aa365247%28v=vs.85%29.aspx#maxpath
def get_anchor_format(a):
"""
Extract the resource file-type format from the anchor.
"""
def mkdir_p(path, mode=0o777):
"""
Create subdirectory hierarchy given in the paths argument.
"""
def clean_url(url):
"""
Remove params, query and fragment parts from URL so that
os.path.basename
and
os.path.splitext
can work correctly.def fix_url(url):
"""
Strip whitespace characters from the beginning and the end of the url
and add a default scheme.
"""
if url is None:
return None
def is_course_complete(last_update):
"""
Determine is the course is likely to have been terminated or not.
def total_seconds(td):
"""
Compute total seconds for a timedelta.
def make_coursera_absolute_url(url):
"""
If given url is relative adds coursera netloc,
otherwise returns it without any changes.
"""
def extend_supplement_links(destination, source):
"""
Extends (merges) destination dictionary with supplement_links
from source dictionary. Values are expected to be lists, or any
data structure that has
extend
method.def print_ssl_error_message(exception):
"""
Print SSLError message with URL to instructions on how to fix it.
"""
message = """
#####################################################################
ATTENTION! PLEASE READ THIS!
The following error has just occurred:
%s %s
Please read instructions on how to fix this error here:
https://github.com/coursera-dl/coursera-dl#sslerror-errno-1-_sslc504-error14094410ssl-routinesssl3_read_bytessslv3-alert-handshake-failure
#####################################################################
""" % (type(exception).name, str(exception))
logging.error(message)`
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Worked for me...
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please give instructions on how to paste it? I can't wrap my head around it, I got notepad++ of course but Idk where to paste
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the util.py file and still get the same error: "File "/XXX/env/lib/python3.9/site-packages/coursera/utils.py", line 118, in clean_filename
s = h.unescape(s)
AttributeError: 'HTMLParser' object has no attribute 'unescape'"
The line s = h.unescape(s) is not on 118 and I already set h = html before that line. Any suggestions? Thanks!
//I figured it out. I revised the util file from previous download. After I fixed the correct file, it worked.
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem not resolved
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot, it worked for me!
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, it worked for me too!
5490a99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this works fine, i had the same error while trying to install flasgger package on python3.10, i edited /usr/local/lib/python3.10/dist-packages/setuptool/py33compat.py , and i applied the fix and everything works now. I guess the error is caused by html_parser.HTMLParser() which doesn't exist in sys.version_info < 3.