在数据框单元格中搜索关键字

我目前有一个数据框,其中的一列包含一些单词或字符,我试图通过该对应单元格中的搜索关键字对每一行进行分类。

示例

  words             |   category
-----------------------------------
im a test email     |  email
here is my handout  |  handout

这是我所拥有的

conditions = [
        (df['words'].str.contains('flyer',False,regex=True)),(df['words'].str.contains('report',(df['words'].str.contains('form',(df['words'].str.contains('scotia',(df['words'].str.contains('news',(df_prt_copy['words'].str.contains('questions.*\.pdf',.
         .
         .
         .
    ]
    choices = ['open house flyer','report','form','news',‘question',.
                  .
                  .
                  .
              ]
     df['category']=np.select(conditions,choices,default='others')

这很好用,但是问题是我有很多关键字(大概超过120个左右),因此维护此关键字列表非常困难,有没有更好的方法呢? 顺便说一句,我正在使用python3

注意:我正在寻找一种更简单的方法来管理大量关键字,这不同于简单的找到关键字here

的方法。
caonimabi1314520 回答:在数据框单元格中搜索关键字

您本可以动态创建conditions列表。如果您有一个关键字列表,例如key_words,则可以for遍历关键字列表,并将append之类的条件(df['words'].str.contains(key_words[iter],False,regex=True))循环到列表conditions

,

如果一行中有多个关键字,则可以加入所有关键字并使用str.findall,然后map表示条件与选择的对应关系:

df = pd.DataFrame({"words":["im a test email","here is my handout","This is a flyer"]})

choices = {"flyer":"open house flyer","email":"email from someone","handout":"some handout"}

df["category"] = df["words"].str.findall("|".join(choices.keys())).str.join(",").map(choices)

print (df)

#
                words            category
0     im a test email  email from someone
1  here is my handout        some handout
2     This is a flyer    open house flyer
,

您可以使用flashtext ..

eventEmitter = new EventEmitter()

eventEmitter
  .on('connection',(e) => {
    ...
    if (...) {
      # here,cancel the event with some method.
    }
  })
  .on('connection',(e) => {
    ...
  });

enter image description here

现在出现问题,例如“ todayIgotAemailReport”,您可以参考 How to split text without spaces into list of words?认为这可能有助于您拆分任何类型的未知连接词

 import pandas as pd
 from flashtext import KeywordProcessor

 keyword_dict = {
 'programming': ['python','pandas','java','java_football'],'sport': ['cricket','football','baseball']
 } 

 kp = KeywordProcessor()
 kp.add_keywords_from_dict(keyword_dict)
 df = pd.DataFrame(['i love working in python','pandas is very popular library','i love playing football'],columns= ['text'])

 df['category'] = df['text'].apply(lambda x: kp.extract_keywords(x,span_info = True))
本文链接:https://www.f2er.com/3147506.html

大家都在问