如何提取重复指定列值的组合的数据框的行？

2024-05-20 • 问答

说我有以下数据框：

import pandas as pd
data = {'Year':[2018,2018,2018],'Month':[1,1,2,3,3],'ID':['A','A','B','B'],'Fruit':['Apple','Banana','Apple','Pear','Mango','Mango']}
df = pd.DataFrame(data,columns=['Year','Month','ID','Fruit'])
df = df.astype(str)
df

我想提取重复的“年”，“月”和“ ID”的组合。因此，使用上述数据框，预期结果就是该数据框：

我这样做的方法是首先执行groupby以计算Year，Month和ID的组合出现的次数：

df2 = df.groupby(['Year','Month'])['ID'].value_counts().to_frame(name = 'Count').reset_index()
df2 = df2[df2.Count>1]
df2

然后，我的想法是遍历groupby数据框中的Year，Month和ID组合，然后将与原始数据框中的组合匹配的那些行提取到新的数据框：

df_new = pd.DataFrame(columns=df.columns,index=range(sum(df2.Count)))

count = 0
for i in df2.index:
    temp = df[(df.ID==df2.ID[i]) & (df.Year==df2.Year[i]) & (df.Month==df2.Month[i])]
    temp.reset_index(drop=True,inplace=True)
    for j in range(len(temp)):
        df_new.iloc[count] = temp.iloc[j]
        count+=1
df_new

但这会产生以下错误：

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-38-7f2d95d71270> in <module>()
      6     temp.reset_index(drop=True,inplace=True)
      7     for j in range(len(temp)):
----> 8         df_new.iloc[count] = temp.iloc[j]
      9         count+=1
     10 df_new

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexing.py in __setitem__(self,key,value)
    187         else:
    188             key = com.apply_if_callable(key,self.obj)
--> 189         indexer = self._get_setitem_indexer(key)
    190         self._setitem_with_indexer(indexer,value)
    191 

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexing.py in _get_setitem_indexer(self,key)
    173 
    174         try:
--> 175             return self._convert_to_indexer(key,is_setter=True)
    176         except TypeError as e:
    177 

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self,obj,axis,is_setter)
   2245 
   2246         try:
-> 2247             self._validate_key(obj,axis)
   2248             return obj
   2249         except ValueError:

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexing.py in _validate_key(self,axis)
   2068             return
   2069         elif is_integer(key):
-> 2070             self._validate_integer(key,axis)
   2071         elif isinstance(key,tuple):
   2072             # a tuple should already have been caught by this point

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexing.py in _validate_integer(self,axis)
   2137         len_axis = len(self.obj._get_axis(axis))
   2138         if key >= len_axis or key < -len_axis:
-> 2139             raise IndexError("single positional indexer is out-of-bounds")
   2140 
   2141     def _getitem_tuple(self,tup):

IndexError: single positional indexer is out-of-bounds

怎么了？我不知道。

当我将for循环的内容更改为以下内容时，该错误消失，从而产生所需的结果：

for j in range(len(temp)):
    df_new.ID[count] = temp.ID[j]
    df_new.Year[count] = temp.Year[j]
    df_new.Month[count] = temp.Month[j]
    df_new.Fruit[count] = temp.Fruit[j]
    count+=1

但这是一个繁琐的解决方法，涉及为原始数据帧中的每个n列编写n行。

constructor(props){ super(props) this.state = { ... redirectTo:" }; } this.setRedirect = (path)=>{ this.setState({ redirectTo:path } } render(){ if(this.state.redirectTo !== "") return <Redirect to={this.state.redirectTo}/> else return ( <div> <Map google={this.props.google} zoom={14} styles={this.props.mapStyles} disableDefaultUI={true} onClick={this.saveCoords} > {this.state.data.map(m => { return ( <Marker onClick={()=>this.setRedirect(`/c/contribution/${m.id}`)} position={{ lat: m.x,lng: m.y }} title={m.title} /> ) })} </Map> </div> ) }

如何提取重复指定列值的组合的数据框的行？

liutalent 回答：如何提取重复指定列值的组合的数据框的行？

大家都在问