Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Chapter 15: wrong path specification for loading CTA_-_Ridership_-_Daily_Boarding_Totals.csv #177

Open
MiladKetabGhale opened this issue Jan 9, 2025 · 0 comments

Comments

@MiladKetabGhale
Copy link

Describe the bug
There is a FileNotFoundError in cell 7 of the notebook when attempting to load the CSV file CTA_-_Ridership_-_Daily_Boarding_Totals.csv. The file path specified (datasets/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv) does not exist after running the code in cell 6.

To Reproduce
The issue can be reproduced as follows:

  1. Run the code in cell 6:

    tf.keras.utils.get_file(
        "ridership.tgz",
        "https://github.com/ageron/data/raw/main/ridership.tgz",
        cache_dir=".",
        extract=True
    )
  2. Run the code in cell 7:

    import pandas as pd
    from pathlib import Path
    
    path = Path("datasets/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv")
    df = pd.read_csv(path, parse_dates=["service_date"])
    df.columns = ["date", "day_type", "bus", "rail", "total"]  # shorter names
    df = df.sort_values("date").set_index("date")
    df = df.drop("total", axis=1)  # no need for total, it's just bus + rail
    df = df.drop_duplicates()  # remove duplicated months (2011-10 and 2014-07)
  3. This results in the following error:

    ---------------------------------------------------------------------------
    

FileNotFoundError Traceback (most recent call last)
Cell In[10], line 5
2 from pathlib import Path
4 path = Path("datasets/ridership/CTA_-Ridership-_Daily_Boarding_Totals.csv")
----> 5 df = pd.read_csv(path, parse_dates=["service_date"])
6 df.columns = ["date", "day_type", "bus", "rail", "total"] # shorter names
7 df = df.sort_values("date").set_index("date")

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
1013 kwds_defaults = _refine_defaults_read(
1014 dialect,
1015 delimiter,
(...)
1022 dtype_backend=dtype_backend,
1023 )
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
617 _validate_names(kwds.get("names", None))
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
623 return parser

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.init(self, f, engine, **kwds)
1617 self.options["has_index_names"] = kwds["has_index_names"]
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
1878 if "b" not in mode:
1879 mode += "b"
-> 1880 self.handles = get_handle(
1881 f,
1882 mode,
1883 encoding=self.options.get("encoding", None),
1884 compression=self.options.get("compression", None),
1885 memory_map=self.options.get("memory_map", False),
1886 is_text=is_text,
1887 errors=self.options.get("encoding_errors", "strict"),
1888 storage_options=self.options.get("storage_options", None),
1889 )
1890 assert self.handles is not None
1891 f = self.handles.handle

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/common.py:873, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
868 elif isinstance(handle, str):
869 # Check whether the filename is to be opened in binary mode.
870 # Binary mode does not support 'encoding' and 'newline'.
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
876 encoding=ioargs.encoding,
877 errors=errors,
878 newline="",
879 )
880 else:
881 # Binary mode
882 handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ridership/CTA_-Ridership-_Daily_Boarding_Totals.csv'
```

Expected behavior
The file CTA_-_Ridership_-_Daily_Boarding_Totals.csv should be available in the datasets/ridership directory after extracting ridership.tgz in cell 6.

How To Fix
There are two ways to fix the error:

  • change the path under which the extracted file is saved. Currently it is specified as ./datasets/ridership_extracted. You could change it to be ./datasets/ridership. Then the code in cell 7 (which is broken will be fixed).

  • change the path specified in cell 7. Currently it is specified as:

                                  `path = Path("datasets/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv")`
    

but for it to work with cell 6 as is, it can be changed to the following:

                                `path = Path("datasets/ridership_extracted/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv")`

I believe the second take less effort to fix the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant