Pandas DataFrame在str计数中的怪异行为

2024-05-21 • 问答

我有以下Pandas DataFrame：

>>> sample_dataframe
        P
0  107.35
1   99.35
2   75.85
3   92.34

当我尝试以下操作时，输出如下：

>>> sample_dataframe[sample_dataframe['P'].astype(str).str.count('.') == 1]

Empty DataFrame
Columns: [P]
Index: []

使用正则表达式转义字符会发生以下情况：

>>> sample_dataframe[sample_dataframe['P'].astype(str).str.count('\.') == 1]

        P
0  107.35
1   99.35
2   75.85
3   92.34

以下内容进一步强化了这一点：

>>> sample_dataframe['P'].astype(str).str.count('.')

0    6
1    5
2    5
3    5
Name: P,dtype: int64

vs。

sample_dataframe['P'].astype(str).str.count('\.')

0    1
1    1
2    1
3    1
Name: P,dtype: int64

因此，.表达式实际上将所有字符都计为正则表达式通配符，减去换行符，因此计数6、5、5、5与转义的\.相比，后者仅计数实际字符.的出现。

但是，从字符串本身调用的常规函数的行为似乎有所不同，并且不需要'。'的正则表达式转义：

>>> '105.35'.count('.')
1

>>> '105.35'.count('\.')
0

编辑：基于一些答案，我将尝试阐明下面的类函数调用（而上面的是实例化对象的方法调用）：

>>> str.count('105.35','.')
1

>>> str.count('105.35','\.')
0

我不确定在后台使用CPython的与Pandas相关的方法（由于NumPy操作）是否将其实现为正则表达式（包括df.apply），或者是否与{{1}中的差异有关}类函数str（即count）与实例化对象的str.count()类方法（在上面的示例str中）'105.35'（即{{ 1}}）。类与对象函数/方法之间的差异是根本原因（以及它们的实现方式），还是由通过NumPy实现DataFrames引起的？

我真的想了解更多有关此的信息，以真正了解其工作原理

那是因为Pandas.Series.str.count和字符串计数方法不同。您可以在此处（https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.count.html#pandas.Series.str.count看到Pandas.Series.str.count以正则表达式作为参数。而“。”正则表达式表示“任何符号”，而str.count获取所提供的子字符串（不是正则表达式）的计数

如果选中Series.str.count，则默认情况下它与正则表达式模式一起使用，因此必须对计数\.进行转义.，否则它由正则表达式.'计算所有值。>

如果要检查熊猫中的功能如何实现，请检查this。

str.count在纯python中的工作方式不同，不是使用正则表达式而是子字符串，因此输出方式不同。

Pandas DataFrame在str计数中的怪异行为

zhuangting 回答：Pandas DataFrame在str计数中的怪异行为

大家都在问