除了使用np.where之外,如何在python熊猫中添加包含多个字符串的新列包含条件?

我试图通过使用str.contains()np.where()函数赋予多个包含条件的字符串来添加新列。通过这种方式,我可以得到想要的最终结果。

但是,代码很长。有什么好的方法可以使用pandas函数重新实现它?

df5['new_column'] = np.where(df5['sr_description'].str.contains('gross to net',case=False).fillna(False),1,np.where(df5['sr_description'].str.contains('gross up',np.where(df5['sr_description'].str.contains('net to gross',np.where(df5['sr_description'].str.contains('gross-to-net',np.where(df5['sr_description'].str.contains('gross-up',np.where(df5['sr_description'].str.contains('net-to-gross',np.where(df5['sr_description'].str.contains('gross 2 net',np.where(df5['sr_description'].str.contains('net 2 gross',np.where(df5['sr_description'].str.contains('gross net',np.where(df5['sr_description'].str.contains('net gross',np.where(df5['sr_description'].str.contains('memo code',0)))))))))))

此输出将是

如果这些字符串包含在“ sr_description”中,则给1,否则给0new_column

也许将多个字符串条件存储在列表中,然后读取并将它们应用于函数。

编辑:

样本数据:

sr_description                  new_column
something with gross up.           1
without those words.               0
or with Net to gross               1
if not then we give a '0'          0
uyghun 回答:除了使用np.where之外,如何在python熊猫中添加包含多个字符串的新列包含条件?

这是我想出的。

代码:

import re
import pandas as pd
import numpy as np

# list of the strings we want to check for
check_strs = ['gross to net','gross up','net to gross','gross-to-net','gross-up','net-to-gross','gross 2 net','net 2 gross','gross net','net gross','memo code']

# From the re.escape() docs: Escape special characters in pattern. 
# This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
check_strs_esc = [re.escape(curr_val) for curr_val in check_strs]

# join all the escaped strings as a single regex
check_strs_re = '|'.join(check_strs_esc)

test_col_1 = ['something with gross up.','without those words.',np.NaN,'or with Net to gross','if not then we give a "0"']
df_1 = pd.DataFrame(data=test_col_1,columns=['sr_description'])

df_1['contains_str'] = df_1['sr_description'].str.contains(check_strs_re,case=False,na=False)

print(df_1)

结果:

              sr_description  contains_str
0   something with gross up.          True
1       without those words.         False
2                        NaN         False
3       or with Net to gross          True
4  if not then we give a "0"         False

请注意,numpy并不是解决方案运行所必需的,我只是用它来测试NaN的值。

让我知道是否有任何不清楚的地方或您有任何疑问! :)

本文链接:https://www.f2er.com/3163790.html

大家都在问