pd.merge无法使用转换后的数据类型

我正在尝试使用pd.merge在公共字段GEOID上合并两个熊猫数据帧(一个是空间数据帧-sdf-,另一个是简单数据帧)。在sdf中,GEOID是一个字符串,在df中则是一个int。我也使用.astype('str')将df GEOID字段也转换为字符串。尽管如此,当调用pd.merge时,输出要么为空,要么出现错误提示“您正在尝试合并object和int64列。”我已经通过.dtypes()确认都是字符串。知道合并为什么不起作用吗?

我尝试将它们都转换为字符串,也都转换为int。我也尝试使用pd.join和pd.concat,但是都无法正常工作。

import pandas as pd
#Read in CSV with updates (would already be df from socrata pull in real version)
updated_csv= r"C:\Users\mad10412\Desktop\active_Business_Data_Edited.csv"
updated_csv_df = pd.read_csv(updated_csv)
updated_csv_df.head(5)
updated_csv_df['GEOID10']=updated_csv_df['GEOID10'].astype(str)
updated_csv_df.dtypes
output_layer_name = 'Join_Features_Test5'
actbus=gis.content.search(output_layer_name)
activeBusinesses_item = actbus[0]
activeBusinesses_item
activeBusinesses_flayer = activeBusinesses_item.layers[0]
activeBusinesses_flayer
activeBusinesses_fset = activeBusinesses_flayer.query() #querying without any conditions returns all the features
activeBusinesses_fset.sdf.head()
activeBusinesses_fset.sdf.shape
activeBusinesses_fset.sdf.dtypes
##Attempt 1: Includes original data and Adds Column names but no data

overlap_rows = activeBusinesses_fset.sdf.join(updated_csv_df.set_index('GEOID10'),on='GEOID10',lsuffix='_left',rsuffix='_right')
overlap_rows.head(10)
overlap_rows.to_csv("C:\\Users\\mad10412\\Desktop\\concatDF.csv")

##Attempt 2: Only includes column name. no data at all
overlap_rows = pd.merge(left = activeBusinesses_fset.sdf,right = updated_csv_df,how='inner',on = 'GEOID10')
overlap_rows.head(5)
overlap_rows.to_csv("C:\\Users\\mad10412\\Desktop\\concatDF2.csv")

##Attempt 3: Includes all columns and all data,but GEOIDs don't match
result = pd.concat([activeBusinesses_fset.sdf,updated_csv_df],axis=1,join='inner')
result.head(5)
result.to_csv("C:\\Users\\mad10412\\Desktop\\concatDF3.csv")


##Attempt 4:  Only includes column name. no data at all
left=activeBusinesses_fset.sdf
right=updated_csv_df
result = pd.merge(left,right,on=['GEOID10','GEOID10'])
result.head(5)
result.to_csv("C:\\Users\\mad10412\\Desktop\\concatDF4.csv")

两个数据帧的数据如下:

df=pd.DataFrame({'GEOID': ['060372932023','060372941201','060372932022'],'Mining': [6,4,2 ],'Agriculture': [10,12,4]})
df

数据框之间的唯一区别是其中的一个形状列包含几何形状。本质上,我试图将这些数据框合并在一起,以查找实例,其中农业和采矿等字段的值不同。

df=pd.DataFrame({'GEOID': ['060372932023',4],'Mining2': [8,3,1],'Agriculture2': [14,6]})
df

对于每个GEOID,这应该在一行中产生,同时包含来自两个数据帧的数据。有关输出实际的外观,请参见最后的代码片段注释。

swf1014 回答:pd.merge无法使用转换后的数据类型

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3156715.html

大家都在问