Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Fix: #60343 Construction of Series / Index fails from dict keys when "str" dtype is specified explicitly #60383

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,7 @@ doc/source/savefig/
# Interactive terminal generated files #
########################################
.jupyterlite.doit.db

# Ignore virtual environments
pandas-env/
*.pyc
4 changes: 2 additions & 2 deletions doc/source/getting_started/comparison/comparison_with_sas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ SAS provides ``PROC IMPORT`` to read csv data into a data set.
The pandas method is :func:`read_csv`, which works similarly.

.. ipython:: python
import pandas as pd
url = (
"https://raw.githubusercontent.com/pandas-dev/"
"pandas/main/pandas/tests/io/data/csv/tips.csv"
Expand Down Expand Up @@ -523,7 +523,7 @@ the first entry for each.
In pandas this would be written as:

.. ipython:: python
import pandas as pd
tips.groupby(["sex", "smoker"]).first()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ dataset from the pandas tests, which is a CSV file. In Excel, you would download
In pandas, you pass the URL or local path of the CSV file to :func:`~pandas.read_csv`:

.. ipython:: python

import pandas as pd
url = (
"https://raw.githubusercontent.com/pandas-dev"
"/pandas/main/pandas/tests/io/data/csv/tips.csv"
Expand Down Expand Up @@ -379,7 +379,7 @@ entering the first two or three values and then dragging.
This can be achieved by creating a series and assigning it to the desired cells.

.. ipython:: python

import pandas as pd
df = pd.DataFrame({"AAA": [1] * 8, "BBB": list(range(0, 8))})
df

Expand All @@ -397,7 +397,7 @@ Excel has built-in functionality for `removing duplicate values <https://support
This is supported in pandas via :meth:`~DataFrame.drop_duplicates`.

.. ipython:: python

import pandas as pd
df = pd.DataFrame(
{
"class": ["A", "A", "A", "B", "C", "D"],
Expand Down Expand Up @@ -426,7 +426,7 @@ In Excel, we use the following configuration for the PivotTable:
The equivalent in pandas:

.. ipython:: python

import pandas as pd
pd.pivot_table(
tips, values="tip", index=["size"], columns=["sex"], aggfunc=np.average
)
Expand All @@ -438,7 +438,7 @@ Adding a row
Assuming we are using a :class:`~pandas.RangeIndex` (numbered ``0``, ``1``, etc.), we can use :func:`concat` to add a row to the bottom of a ``DataFrame``.

.. ipython:: python

import pandas as pd
df
new_row = pd.DataFrame([["E", 51, True]],
columns=["class", "student_count", "all_pass"])
Expand All @@ -453,13 +453,13 @@ takes you to cells that match, one by one. In pandas, this operation is generall
entire column or ``DataFrame`` at once through :ref:`conditional expressions <10min_tut_03_subset.rows_and_columns>`.

.. ipython:: python

import pandas as pd
tips
tips == "Sun"
tips["day"].str.contains("S")

pandas' :meth:`~DataFrame.replace` is comparable to Excel's ``Replace All``.

.. ipython:: python

import pandas as pd
tips.replace("Thu", "Thursday")
52 changes: 26 additions & 26 deletions doc/source/getting_started/comparison/comparison_with_sql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ the data into a DataFrame called ``tips`` and assume we have a database table of
structure.

.. ipython:: python

import pandas as pd
url = (
"https://raw.githubusercontent.com/pandas-dev"
"/pandas/main/pandas/tests/io/data/csv/tips.csv"
Expand Down Expand Up @@ -43,7 +43,7 @@ to select all columns):
With pandas, column selection is done by passing a list of column names to your DataFrame:

.. ipython:: python

import pandas as pd
tips[["total_bill", "tip", "smoker", "time"]]

Calling the DataFrame without the list of column names would display all columns (akin to SQL's
Expand All @@ -59,7 +59,7 @@ In SQL, you can add a calculated column:
With pandas, you can use the :meth:`DataFrame.assign` method of a DataFrame to append a new column:

.. ipython:: python

import pandas as pd
tips.assign(tip_rate=tips["tip"] / tips["total_bill"])

WHERE
Expand All @@ -86,7 +86,7 @@ Tips of more than $5 at Dinner meals:
WHERE time = 'Dinner' AND tip > 5.00;

.. ipython:: python

import pandas as pd
tips[(tips["time"] == "Dinner") & (tips["tip"] > 5.00)]

Tips by parties of at least 5 diners OR bill total was more than $45:
Expand All @@ -98,14 +98,14 @@ Tips by parties of at least 5 diners OR bill total was more than $45:
WHERE size >= 5 OR total_bill > 45;

.. ipython:: python

import pandas as pd
tips[(tips["size"] >= 5) | (tips["total_bill"] > 45)]

NULL checking is done using the :meth:`~pandas.Series.notna` and :meth:`~pandas.Series.isna`
methods.

.. ipython:: python

import pandas as pd
frame = pd.DataFrame(
{"col1": ["A", "B", np.nan, "C", "D"], "col2": ["F", np.nan, "G", "H", "I"]}
)
Expand Down Expand Up @@ -133,7 +133,7 @@ Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series.
WHERE col1 IS NOT NULL;

.. ipython:: python

import pandas as pd
frame[frame["col1"].notna()]


Expand Down Expand Up @@ -170,14 +170,14 @@ Notice that in the pandas code we used :meth:`.DataFrameGroupBy.size` and not
the number of ``NOT NULL`` records within each.

.. ipython:: python

import pandas as pd
tips.groupby("sex").count()

Alternatively, we could have applied the :meth:`.DataFrameGroupBy.count` method
to an individual column:

.. ipython:: python

import pandas as pd
tips.groupby("sex")["total_bill"].count()

Multiple functions can also be applied at once. For instance, say we'd like to see how tip amount
Expand All @@ -197,7 +197,7 @@ to your grouped DataFrame, indicating which functions to apply to specific colum
*/

.. ipython:: python

import pandas as pd
tips.groupby("day").agg({"tip": "mean", "day": "size"})

Grouping by more than one column is done by passing a list of columns to the
Expand All @@ -221,7 +221,7 @@ Grouping by more than one column is done by passing a list of columns to the
*/

.. ipython:: python

import pandas as pd
tips.groupby(["smoker", "day"]).agg({"tip": ["size", "mean"]})

.. _compare_with_sql.join:
Expand All @@ -240,7 +240,7 @@ parameters allowing you to specify the type of join to perform (``LEFT``, ``RIGH
join behaviour and can lead to unexpected results.

.. ipython:: python

import pandas as pd
df1 = pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)})
df2 = pd.DataFrame({"key": ["B", "D", "D", "E"], "value": np.random.randn(4)})

Expand All @@ -258,15 +258,15 @@ INNER JOIN
ON df1.key = df2.key;

.. ipython:: python

import pandas as pd
# merge performs an INNER JOIN by default
pd.merge(df1, df2, on="key")

:meth:`~pandas.merge` also offers parameters for cases when you'd like to join one DataFrame's
column with another DataFrame's index.

.. ipython:: python

import pandas as pd
indexed_df2 = df2.set_index("key")
pd.merge(df1, indexed_df2, left_on="key", right_index=True)

Expand All @@ -283,7 +283,7 @@ Show all records from ``df1``.
ON df1.key = df2.key;

.. ipython:: python

import pandas as pd
pd.merge(df1, df2, on="key", how="left")

RIGHT JOIN
Expand All @@ -299,7 +299,7 @@ Show all records from ``df2``.
ON df1.key = df2.key;

.. ipython:: python

import pandas as pd
pd.merge(df1, df2, on="key", how="right")

FULL JOIN
Expand Down Expand Up @@ -327,7 +327,7 @@ UNION
``UNION ALL`` can be performed using :meth:`~pandas.concat`.

.. ipython:: python

import pandas as pd
df1 = pd.DataFrame(
{"city": ["Chicago", "San Francisco", "New York City"], "rank": range(1, 4)}
)
Expand All @@ -353,7 +353,7 @@ UNION
*/

.. ipython:: python

import pandas as pd
pd.concat([df1, df2])

SQL's ``UNION`` is similar to ``UNION ALL``, however ``UNION`` will remove duplicate rows.
Expand All @@ -379,7 +379,7 @@ In pandas, you can use :meth:`~pandas.concat` in conjunction with
:meth:`~pandas.DataFrame.drop_duplicates`.

.. ipython:: python

import pandas as pd
pd.concat([df1, df2]).drop_duplicates()


Expand All @@ -392,7 +392,7 @@ LIMIT
LIMIT 10;

.. ipython:: python

import pandas as pd
tips.head(10)


Expand All @@ -410,7 +410,7 @@ Top n rows with offset
LIMIT 10 OFFSET 5;

.. ipython:: python

import pandas as pd
tips.nlargest(10 + 5, columns="tip").tail(10)

Top n rows per group
Expand All @@ -430,7 +430,7 @@ Top n rows per group


.. ipython:: python

import pandas as pd
(
tips.assign(
rn=tips.sort_values(["total_bill"], ascending=False)
Expand All @@ -445,7 +445,7 @@ Top n rows per group
the same using ``rank(method='first')`` function

.. ipython:: python

import pandas as pd
(
tips.assign(
rnk=tips.groupby(["day"])["total_bill"].rank(
Expand Down Expand Up @@ -475,7 +475,7 @@ Notice that when using ``rank(method='min')`` function
(as Oracle's ``RANK()`` function)

.. ipython:: python

import pandas as pd
(
tips[tips["tip"] < 2]
.assign(rnk_min=tips.groupby(["sex"])["tip"].rank(method="min"))
Expand All @@ -494,7 +494,7 @@ UPDATE
WHERE tip < 2;

.. ipython:: python

import pandas as pd
tips.loc[tips["tip"] < 2, "tip"] *= 2

DELETE
Expand All @@ -508,5 +508,5 @@ DELETE
In pandas we select the rows that should remain instead of deleting the rows that should be removed:

.. ipython:: python

import pandas as pd
tips = tips.loc[tips["tip"] <= 9]
2 changes: 1 addition & 1 deletion doc/source/getting_started/comparison/includes/case.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ The equivalent pandas methods are :meth:`Series.str.upper`, :meth:`Series.str.lo
:meth:`Series.str.title`.

.. ipython:: python
import pandas as pd
firstlast = pd.DataFrame({"string": ["John Smith", "Jane Cook"]})
firstlast["upper"] = firstlast["string"].str.upper()
firstlast["lower"] = firstlast["string"].str.lower()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ pandas provides vectorized operations by specifying the individual ``Series`` in
a column from the ``DataFrame``.

.. ipython:: python
import pandas as pd
tips["total_bill"] = tips["total_bill"] - 2
tips["new_bill"] = tips["total_bill"] / 2
tips
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@ Keep certain columns
''''''''''''''''''''

.. ipython:: python

import pandas as pd
tips[["sex", "total_bill", "tip"]]

Drop a column
'''''''''''''

.. ipython:: python

import pandas as pd
tips.drop("sex", axis=1)

Rename a column
'''''''''''''''

.. ipython:: python

import pandas as pd
tips.rename(columns={"total_bill": "total_bill_2"})
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ a Python dictionary, where the keys are the column names
and the values are the data.

.. ipython:: python
import pandas as pd
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
df
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ from a string by position locations. Keep in mind that Python
indexes are zero-based.

.. ipython:: python
import pandas as pd
tips["sex"].str[0:1]
4 changes: 2 additions & 2 deletions doc/source/getting_started/comparison/includes/filtering.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@ DataFrames can be filtered in multiple ways; the most intuitive of which is usin
:ref:`boolean indexing <indexing.boolean>`.

.. ipython:: python
import pandas as pd
tips[tips["total_bill"] > 10]
The above statement is simply passing a ``Series`` of ``True``/``False`` objects to the DataFrame,
returning all rows with ``True``.

.. ipython:: python
import pandas as pd
is_dinner = tips["time"] == "Dinner"
is_dinner
is_dinner.value_counts()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ method returns its position. If not found, it returns ``-1``. Keep in mind that
zero-based.

.. ipython:: python
import pandas as pd
tips["sex"].str.find("ale")
Loading
Loading