-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: np.nan
is converted to pd.NA
in nullable column.
#56836
Comments
Thanks for the issue. This issue is being actively discussed in #32265 so happy to have your input there. |
@mroeschke I don't think it makes sense to conflate an obvious bug with a four year discussion. As to why this is obviously a bug:
Frankly, the discussion you linked is too tolerant of fringe opinions in my opinion. Saying "having both NA and NaN in a nullable column could be confusing for users" (quote from there) is no different from saying "having both NULL and empty strings in a nullable string column is confusing for the user". It's right there in the name "nullable float": you take your floats (which everyone agrees what they are since IEEE-754, and which do include NaN), and you add a NULL, which pandas calls NA. Everybody that doesn't want both NA and NaN in the same column could always not use a nullable float column. If IEEE had called it "invalid" instead of "NaN", the discussion in #32265 wouldn't exist. Imagine Microsoft decided that a I feel like all serious discussion in #32265 is about questions how to treat legacy concerns, like |
Actually, after just reading the entire discussion in #32665 I think I'll take to heart the advice given by @jbrockmendel "Honestly, not really. I'm planning to make a push on this for 3.0. In the medium-term my advice would be to not use nullable-float dtypes". I still think it'd make sense to fix this issue independently of #32265 but I have no hope it would make the nullable float experience in pandas noticeably saner in the short term. I'll add my few cents at #32265 nonetheless. |
FWIW I would also hope pandas would one day have As an alternative, you can use pyarrow to have
|
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Above returns
but shouldn't because
np.nan != pd.NA
, orIEEE-NotANumber != missing data
(that was the whole point of having nullable columns).Expected Behavior
polars does it right:
Installed Versions
INSTALLED VERSIONS
commit : a671b5a
python : 3.10.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-76-generic
Version : #83-Ubuntu SMP Thu Jun 15 19:16:32 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_GB.utf8
LANG : C.UTF-8
LOCALE : en_GB.UTF-8
pandas : 2.1.4
numpy : 1.24.4
pytz : 2023.3
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.14.0
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.2
numba : 0.57.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 12.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : 0.21.0
tzdata : 2023.3
qtpy : 2.3.1
pyqt5 : None
The text was updated successfully, but these errors were encountered: