根据包含数组的不同列之间的匹配元素创建新列

2024-05-05 • 问答

我有一个数据框，其中col2列的行包含数字列表

    col1   col2

1 . 54319    54319,54317
2 . 54317    37604,37603,37605
3 . 37603    123
4 . 37604    124
5 . 37605    1255

我想在col1和col2之间找到匹配项，并创建一个新列'new'，将col2中包含与col1匹配项的组追加到该行中

结果

    col1   col2                    new

1 . 54319    54319,54317          54319,37605   54319,54317
3 . 37603    123                   37604,37605
4 . 37604    124                   37604,37605
5 . 37605    1255                  37604,37605

这是我的代码，但是崩溃。我想我看不懂col2 [rows]

行中包含的列表

new = []
for val in col1:
    for i in col2:
        if val in i:
           new.append(i)
        else:
           continue

打印（新）

如果col1中的col2值匹配，并且对于一般解决方案，则返回默认值，如果值不存在，则返回默认值，此处为no match：

f = lambda x: next(iter([y for y in df['col2'].tolist() if str(x) in y]),'no match')
df['new'] = df['col1'].apply(f)
print (df)
   col1                 col2                  new
1   123       123,562,7779       123,7779
2   456        456,111,123        456,123
3   789       667,1213,456  1011,444,909,789
4  1011  1213,445,123  1011,789
5  1213  1011,789       667,456

同样适用于值列表：

df['new'] = df['col1'].apply(f)
print (df)
   col1                   col2                    new
1   123       [123,7779]       [123,7779]
2   456        [456,123]        [456,123]
3   789       [667,456]  [1011,789]
4  1011  [1213,123]  [1011,789]
5  1213  [1011,789]       [667,456]

编辑：

使用split和,的最后解决方案：

df['col2'] = df['col2'].str.split(',')
f = lambda x: next(iter([y for y in df['col2'].tolist() if str(x) in y]),'no match')
df['new'] = df['col1'].apply(f)
print (df)
    col1                   col2                    new
1  54319         [54319,54317]         [54319,54317]
2  54317  [37604,37603,37605]         [54319,54317]
3  37603                  [123]  [37604,37605]
4  37604                  [124]  [37604,37605]
5  37605                 [1255]  [37604,37605]

根据包含数组的不同列之间的匹配元素创建新列

dongxingchao 回答：根据包含数组的不同列之间的匹配元素创建新列

大家都在问