在数据框单元格中搜索关键字

2024-05-02 • 问答

我目前有一个数据框，其中的一列包含一些单词或字符，我试图通过该对应单元格中的搜索关键字对每一行进行分类。

示例

  words             |   category
-----------------------------------
im a test email     |  email
here is my handout  |  handout

这是我所拥有的

conditions = [
        (df['words'].str.contains('flyer',False,regex=True)),(df['words'].str.contains('report',(df['words'].str.contains('form',(df['words'].str.contains('scotia',(df['words'].str.contains('news',(df_prt_copy['words'].str.contains('questions.*\.pdf',.
         .
         .
         .
    ]
    choices = ['open house flyer','report','form','news',‘question',.
                  .
                  .
                  .
              ]
     df['category']=np.select(conditions,choices,default='others')

这很好用，但是问题是我有很多关键字（大概超过120个左右），因此维护此关键字列表非常困难，有没有更好的方法呢？顺便说一句，我正在使用python3

注意：我正在寻找一种更简单的方法来管理大量关键字，这不同于简单的找到关键字here

的方法。

您本可以动态创建conditions列表。如果您有一个关键字列表，例如key_words，则可以for遍历关键字列表，并将append之类的条件(df['words'].str.contains(key_words[iter],False,regex=True))循环到列表conditions

如果一行中有多个关键字，则可以加入所有关键字并使用str.findall，然后map表示条件与选择的对应关系：

df = pd.DataFrame({"words":["im a test email","here is my handout","This is a flyer"]})

choices = {"flyer":"open house flyer","email":"email from someone","handout":"some handout"}

df["category"] = df["words"].str.findall("|".join(choices.keys())).str.join(",").map(choices)

print (df)

#
                words            category
0     im a test email  email from someone
1  here is my handout        some handout
2     This is a flyer    open house flyer

您可以使用flashtext ..

eventEmitter = new EventEmitter()

eventEmitter
  .on('connection',(e) => {
    ...
    if (...) {
      # here,cancel the event with some method.
    }
  })
  .on('connection',(e) => {
    ...
  });

现在出现问题，例如“ todayIgotAemailReport”，您可以参考 How to split text without spaces into list of words?认为这可能有助于您拆分任何类型的未知连接词

 import pandas as pd
 from flashtext import KeywordProcessor

 keyword_dict = {
 'programming': ['python','pandas','java','java_football'],'sport': ['cricket','football','baseball']
 } 

 kp = KeywordProcessor()
 kp.add_keywords_from_dict(keyword_dict)
 df = pd.DataFrame(['i love working in python','pandas is very popular library','i love playing football'],columns= ['text'])

 df['category'] = df['text'].apply(lambda x: kp.extract_keywords(x,span_info = True))

在数据框单元格中搜索关键字

caonimabi1314520 回答：在数据框单元格中搜索关键字

大家都在问