Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http error for downloading the datasets #39

Open
GeoffNN opened this issue May 2, 2020 · 8 comments
Open

Http error for downloading the datasets #39

GeoffNN opened this issue May 2, 2020 · 8 comments
Assignees

Comments

@GeoffNN
Copy link
Collaborator

GeoffNN commented May 2, 2020

Got the following error when trying to load the Madelon dataset.

madelon dataset is not present in the folder /home/geoff/copt_data/madelon. Downloading it ...
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-5-4c820c436ec3> in <module>
----> 1 A, b = cp.datasets.load_madelon()

~/PycharmProjects/copt/copt/datasets.py in load_madelon(subset, data_dir)
    153         * :ref:`sphx_glr_auto_examples_frank_wolfe_plot_vertex_overlap.py`
    154     """
--> 155     return _load_dataset("madelon", subset, data_dir)
    156 
    157 

~/PycharmProjects/copt/copt/datasets.py in _load_dataset(name, subset, data_dir)
     58         )
     59         url = "https://storage.googleapis.com/copt/datasets/%s.tar.gz" % name
---> 60         local_filename, _ = urllib.request.urlretrieve(url)
     61         print("Finished downloading")
     62 

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in urlretrieve(url, filename, reporthook, data)
    245     url_type, path = splittype(url)
    246 
--> 247     with contextlib.closing(urlopen(url, data)) as fp:
    248         headers = fp.info()
    249 

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in http_response(self, request, response)
    639         if not (200 <= code < 300):
    640             response = self.parent.error(
--> 641                 'http', request, response, code, msg, hdrs)
    642 
    643         return response

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    501         for handler in handlers:
    502             func = getattr(handler, meth_name)
--> 503             result = func(*args)
    504             if result is not None:
    505                 return result

~/anaconda3/envs/copt/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 403: Forbidden
@fabianp fabianp self-assigned this May 7, 2020
@fabianp
Copy link
Member

fabianp commented May 7, 2020

looking into it

@gideonite
Copy link
Collaborator

Seems to be a problem with the URL. The following also gives 403: Forbidden.

wget "https://storage.googleapis.com/copt/datasets/gisette.tar.gz"

@fabianp
Copy link
Member

fabianp commented May 13, 2020 via email

@arokem
Copy link
Collaborator

arokem commented May 13, 2020

From the peanut gallery: maybe best to put it on figshare, or somesuch, where persistent URLs can be minted and you don't have to worry about keeping google storage paid for.

@fabianp
Copy link
Member

fabianp commented May 15, 2020

Thanks @arokem, that's a good idea. The issue is fixed but i'll leave it open to look into figshare

@fabianp
Copy link
Member

fabianp commented May 15, 2020

@arokem do you know if one can upload to figshare and easily access it using urllib, i.e., without authentication?

@arokem
Copy link
Collaborator

arokem commented May 21, 2020

Sorry : I missed the notification for your message until now. Yes, you can direct download from figshare with no authentication. The files have somewhat wonky URLs (e.g., https://ndownloader.figshare.com/files/5273800), but that's not an issue.

@fabianp
Copy link
Member

fabianp commented May 22, 2020

TODO for myself: upload again kdd10, kdd12, news20, criteo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants