使用熊猫在数据框中聚类

2024-05-19 • 问答

我需要帮助才能处理熊猫和标签这是一个标签：

test = {**mapper,'a': 'Updated'}

从这个标签中，我想创建集群并获得一个新的标签，例如：

Col1    Col2
A   B
C   B
D   B
E   F
G   F
F   A
Z   Y
H   Y
L   P

如您所见，Cluster Names Cluster1 A Cluster1 B Cluster1 C Cluster1 D Cluster1 F Cluster1 E Cluster1 G Cluster2 Z Cluster2 Y Cluster2 H Cluster3 L Cluster3 P中的字母A B C D E F和G具有共同点。

Cluster1

有人对使用熊猫有想法吗？

这是一个图形问题，称为connected components，建议您使用networkx.connected_components：

import networkx as nx

g = nx.from_pandas_edgelist(df,source='Col1',target='Col2',create_using=nx.Graph)

for component in nx.connected_components(g):
    print(component)

输出

{'E','G','C','D','F','A','B'}
{'Y','H','Z'}
{'L','P'}

请注意，组件与您的输出组匹配。要将其转换为DataFrame，请执行以下操作：

data = [[f'Cluster{i}',element] for i,component in enumerate(nx.connected_components(g),1) for element in component]

result = pd.DataFrame(data=data,columns=['Cluster','Names'])
print(result)

输出

     Cluster Names
0   Cluster1     D
1   Cluster1     A
2   Cluster1     B
3   Cluster1     G
4   Cluster1     C
5   Cluster1     F
6   Cluster1     E
7   Cluster2     Z
8   Cluster2     Y
9   Cluster2     H
10  Cluster3     L
11  Cluster3     P

使用熊猫在数据框中聚类

marish 回答：使用熊猫在数据框中聚类

大家都在问