合并后将熊猫的“ Int64”类型转换为“对象”类型

2024-05-06 • 问答

使用Int64时，我注意到以下行为。有没有一种方法可以避免类型转换并保留createXYZCoverageReport类型的后合并？

Int64

我应该澄清，我正在寻找一个答案，该答案不涉及明确地做以下事情：

df1 = pd.DataFrame(data={'col1': [1,2,3,4,5],'col2': [10,11,12,13,14]},dtype=pd.Int64Dtype())
df2 = pd.DataFrame(data={'col1': [1,3],12]},dtype=pd.Int64Dtype())
df = df2.merge(df1,how='outer',indicator=True,suffixes=('_x',''))

df1.dtypes
Out[8]: 
col1    Int64
col2    Int64
dtype: object

df2.dtypes
Out[9]: 
col1    Int64
col2    Int64
dtype: object

df.dtypes
Out[10]: 
col1        object
col2        object
_merge    category
dtype: object

它来自需要重新索引 df2（基本数据帧），需要重新索引以匹配 df1（合并数据帧）。它可能应该按照您的预期运行，但是使用 pandas Int64Dtype 类型而不是 python int 类型是一个边缘情况。

执行合并时，将调用此重新索引：

> /home/tron/.local/lib/python3.7/site-packages/pandas/core/reshape/merge.py(840)_maybe_add_join_keys()
    838                     key_col = rvals
    839                 else:
--> 840                     key_col = Index(lvals).where(~mask,rvals)
    841 
    842                 if result._is_label_reference(name):

然后调用这个数组数据类型。

> /home/tron/.local/lib/python3.7/site-packages/pandas/core/indexes/base.py(359)__new__()
    357                 data = ea_cls._from_sequence(data,dtype=dtype,copy=False)
    358             else:
--> 359                 data = np.asarray(data,dtype=object)
    360 
    361             # coerce to the object dtype

您可以通过使用 pdb 调试器并逐步查看结果来自行探索这一点。

df1 = pd.DataFrame(data={'col1': [1,2,3,4,5],'col2': [10,11,12,13,14]},dtype=pd.Int64Dtype())
df2 = pd.DataFrame(data={'col1': [1,3],12]},dtype=pd.Int64Dtype())
def test():
    import pdb
    pdb.set_trace()
    df = df2.merge(df1,how='outer',indicator=True,suffixes=('_x',''))
    return df
df = text()

一些有趣的笔记：

如果您使用 dtype=int 而不是 dtype=pd.Int64Dtype() 类型实际上是预期的。它可能应该与两者类似，但 int 类型在 pandas/core/indexes/base.py(359)__new__() 中有不同的逻辑路径，它将 int 解释为来自 python 的“# index-like. That said,you should likely default to using the default int,float,bool` 类型除非你有特定的用例。
df2.merge(df1,how='inner') 保留类型，因为不需要重新索引。
df1.merge(df2,how='outer') 保留类型，因为 df1（基础数据框）不需要重新索引以合并 df2。

合并后将熊猫的“ Int64”类型转换为“对象”类型

casillas00 回答：合并后将熊猫的“ Int64”类型转换为“对象”类型

大家都在问