[BUG] Chapter 15: wrong path specification for loading CTA_-_Ridership_-_Daily_Boarding_Totals.csv #177

MiladKetabGhale · 2025-01-09T06:27:58Z

Describe the bug
There is a FileNotFoundError in cell 7 of the notebook when attempting to load the CSV file CTA_-_Ridership_-_Daily_Boarding_Totals.csv. The file path specified (datasets/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv) does not exist after running the code in cell 6.

To Reproduce
The issue can be reproduced as follows:

Run the code in cell 6:

tf.keras.utils.get_file(
    "ridership.tgz",
    "https://github.com/ageron/data/raw/main/ridership.tgz",
    cache_dir=".",
    extract=True
)

Run the code in cell 7:

import pandas as pd
from pathlib import Path

path = Path("datasets/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv")
df = pd.read_csv(path, parse_dates=["service_date"])
df.columns = ["date", "day_type", "bus", "rail", "total"]  # shorter names
df = df.sort_values("date").set_index("date")
df = df.drop("total", axis=1)  # no need for total, it's just bus + rail
df = df.drop_duplicates()  # remove duplicated months (2011-10 and 2014-07)

This results in the following error:

---------------------------------------------------------------------------

FileNotFoundError Traceback (most recent call last)
Cell In[10], line 5
2 from pathlib import Path
4 path = Path("datasets/ridership/CTA_-Ridership-_Daily_Boarding_Totals.csv")
----> 5 df = pd.read_csv(path, parse_dates=["service_date"])
6 df.columns = ["date", "day_type", "bus", "rail", "total"] # shorter names
7 df = df.sort_values("date").set_index("date")

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
1013 kwds_defaults = _refine_defaults_read(
1014 dialect,
1015 delimiter,
(...)
1022 dtype_backend=dtype_backend,
1023 )
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
617 _validate_names(kwds.get("names", None))
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
623 return parser

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.init(self, f, engine, **kwds)
1617 self.options["has_index_names"] = kwds["has_index_names"]
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
1878 if "b" not in mode:
1879 mode += "b"
-> 1880 self.handles = get_handle(
1881 f,
1882 mode,
1883 encoding=self.options.get("encoding", None),
1884 compression=self.options.get("compression", None),
1885 memory_map=self.options.get("memory_map", False),
1886 is_text=is_text,
1887 errors=self.options.get("encoding_errors", "strict"),
1888 storage_options=self.options.get("storage_options", None),
1889 )
1890 assert self.handles is not None
1891 f = self.handles.handle

File ~/Library/Python/3.9/lib/python/site-packages/pandas/io/common.py:873, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
868 elif isinstance(handle, str):
869 # Check whether the filename is to be opened in binary mode.
870 # Binary mode does not support 'encoding' and 'newline'.
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
876 encoding=ioargs.encoding,
877 errors=errors,
878 newline="",
879 )
880 else:
881 # Binary mode
882 handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ridership/CTA_-Ridership-_Daily_Boarding_Totals.csv'
```

Expected behavior
The file CTA_-_Ridership_-_Daily_Boarding_Totals.csv should be available in the datasets/ridership directory after extracting ridership.tgz in cell 6.

How To Fix
There are two ways to fix the error:

change the path under which the extracted file is saved. Currently it is specified as ./datasets/ridership_extracted. You could change it to be ./datasets/ridership. Then the code in cell 7 (which is broken will be fixed).

change the path specified in cell 7. Currently it is specified as:

                              `path = Path("datasets/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv")`

but for it to work with cell 6 as is, it can be changed to the following:

                                `path = Path("datasets/ridership_extracted/ridership/CTA_-_Ridership_-_Daily_Boarding_Totals.csv")`

I believe the second take less effort to fix the bug.

The text was updated successfully, but these errors were encountered:

MiladKetabGhale mentioned this issue Jan 9, 2025

BUG Fix: Chapter 15: Path string in the code cell number 7 is incorrect #178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Chapter 15: wrong path specification for loading CTA_-_Ridership_-_Daily_Boarding_Totals.csv #177

[BUG] Chapter 15: wrong path specification for loading CTA_-_Ridership_-_Daily_Boarding_Totals.csv #177

MiladKetabGhale commented Jan 9, 2025

[BUG] Chapter 15: wrong path specification for loading CTA_-_Ridership_-_Daily_Boarding_Totals.csv #177

[BUG] Chapter 15: wrong path specification for loading CTA_-_Ridership_-_Daily_Boarding_Totals.csv #177

Comments

MiladKetabGhale commented Jan 9, 2025