在熊猫数据框中,计算点之间最短(欧式)距离的最快方法

考虑以下熊猫数据框:

print(df)

     Id      X      Y Type  X of Closest  Y of Closest
0   201  73.91  34.84    A           NaN           NaN
1   201  74.67  32.64    A           NaN           NaN
2   201  74.00  33.20    A           NaN           NaN
3   201  71.46  27.70    A           NaN           NaN
4   201  69.32  35.42    A           NaN           NaN
5   201  75.06  24.00    B           NaN           NaN
6   201  74.11  16.64    B           NaN           NaN
7   201  73.37  18.73    B           NaN           NaN
8   201  56.63  26.90    B           NaN           NaN
9   201  73.35  38.83    B           NaN           NaN
10  512  74.15  28.90    A           NaN           NaN
11  512  75.82  17.56    A           NaN           NaN
12  512  74.78  33.21    A           NaN           NaN
13  512  75.43  32.41    A           NaN           NaN
14  512  75.90  25.12    A           NaN           NaN
15  512  79.76  29.49    B           NaN           NaN
16  512  76.47  36.91    B           NaN           NaN
17  512  74.70  19.19    B           NaN           NaN
18  512  78.75  30.53    B           NaN           NaN
19  512  74.60  31.88    B           NaN           NaN

请注意,对于每个ID,总有10行,其中A类型为5行,B类型为5行。

我想创建2列,“ X of Closest”和“ Y of Closest”。我的意思是说,X,Y对(每个ID的类型相反)是最短的欧氏距离。

第一行的示例:(B型)最接近(73.91,34.84)的对是(73.35,38.83)对-欧氏距离为4.03。

一种(可能的!?)方法是构造10列-每个ID中点之间的欧式距离,然后从相反的Type中选择最小欧式距离。我敢肯定,会有更快的方法。

wokun688 回答:在熊猫数据框中,计算点之间最短(欧式)距离的最快方法

对于快速(编码)解决方案,我们可以在groupby上使用apply

from scipy.spatial import distance_matrix

def get_min_dist(x):
    # compute distance matrix
    tmp = distance_matrix(x.iloc[:5],x.iloc[5:])

    # get index min of corresponding types
    idx = np.concatenate((np.argmin(tmp,1)+5),# type A to type B
                          np.argmin(tmp,0)     # type B to type A
                        )

    return pd.DataFrame(x.iloc[idx].values,index=x.index,columns=[a+'_closest' for a in x.columns])

df.groupby('Id')[['X','Y']].apply(get_min_dist)

输出:

    X_closest  Y_closest
0       73.35      38.83
1       73.35      38.83
2       73.35      38.83
3       75.06      24.00
4       73.35      38.83
5       71.46      27.70
6       71.46      27.70
7       71.46      27.70
8       71.46      27.70
9       73.91      34.84
10      74.60      31.88
11      74.70      19.19
12      74.60      31.88
13      74.60      31.88
14      79.76      29.49
15      75.43      32.41
16      74.78      33.21
17      75.82      17.56
18      75.43      32.41
19      75.43      32.41
,

这是我使用Numpy广播的解决方案

df = pd.DataFrame([[201,73.91,34.84,'A',np.nan,np.nan],[201,74.67,32.64,74.0,33.2,71.46,27.7,69.32,35.42,75.06,24.0,'B',74.11,16.64,73.37,18.73,56.63,26.9,73.35,38.83,[512,74.15,28.9,75.82,17.56,74.78,33.21,75.43,32.41,75.9,25.12,79.76,29.49,76.47,36.91,74.7,19.19,78.75,30.53,74.6,31.88,np.nan]],columns=('Id','X','Y','Type','X-of-Closest','Y-of-Closest'))

## assuming that df is sorted by ID and Type we can create this 4 dimensional array where
## dim0->no of unique ids,dim1-> 2 (type A,B),dim2->5 values of each type,dim3->X or Y
values = df[['X','Y']].values.reshape(-1,2,5,2).copy()

## values[:,:,:] will take rows of type A for all ids
## and the broadcast repeates values of type A and B 5 times each
## which represents 5X5=25 possible pairs of points of type A and B
diff = values[:,:][:,np.newaxis,:] - values[:,1,:]

## get index of min distance for type A and B 
ind1 = np.argmin(np.sum(diff**2,axis=-1),axis=-1)
ind2 = np.argmin(np.sum(diff**2,axis=-2)

## use the index to set point with min distance to other type
closest_points = np.empty_like(values)
closest_points[:,0] = values[0,ind1]
closest_points[:,1] = values[0,ind2]

## assign result back to df
df[["X-of-Closest","Y-of-Closest"]] = closest_points.reshape(-1,2)
print(df)

结果

     Id      X      Y Type  X-of-Closest  Y-of-Closest
0   201  73.91  34.84    A         73.35         38.83
1   201  74.67  32.64    A         73.35         38.83
2   201  74.00  33.20    A         73.35         38.83
3   201  71.46  27.70    A         75.06         24.00
4   201  69.32  35.42    A         73.35         38.83
5   201  75.06  24.00    B         71.46         27.70
6   201  74.11  16.64    B         71.46         27.70
7   201  73.37  18.73    B         71.46         27.70
8   201  56.63  26.90    B         71.46         27.70
9   201  73.35  38.83    B         73.91         34.84
10  512  74.15  28.90    A         73.35         38.83
11  512  75.82  17.56    A         73.37         18.73
12  512  74.78  33.21    A         73.35         38.83
13  512  75.43  32.41    A         73.35         38.83
14  512  75.90  25.12    A         75.06         24.00
15  512  79.76  29.49    B         71.46         27.70
16  512  76.47  36.91    B         74.00         33.20
17  512  74.70  19.19    B         74.67         32.64
18  512  78.75  30.53    B         71.46         27.70
19  512  74.60  31.88    B         71.46         27.70

有关广播工作原理的详细信息,请查看this blog的广播部分

本文链接:https://www.f2er.com/3137261.html

大家都在问