比较2个句子并分成数据框

2024-05-15 • 问答

例如，有2列...

This is my original data frame

quote：['最初由Gilly发表，是吗？嗯我喜欢随意。它改变了我的心情。我想并不是每个人的反应都一样。 ;）']

all_post：['最初由Gilly发表，是吗？嗯我喜欢随意。它改变了我的心情。我想并不是每个人的反应都一样。 ;）好吧..也许我听不懂../动物爬回了它的洞']

我想分隔句子

“嗯，好吧。也许我听不懂。。/动物爬回了它的洞”

进入另一列

这是我尝试过的

def extract_post(Quote,Post):

    post = [x for x in all_post if x not in quote]

    return post

for j,row in confession.iterrows():
    if type(row['quote']) == float:
        continue
    else:
        print(extract_post(row['quote'],row['all_post']))

但是第二个结果是['A'，'k'，“'”，'/'，'k']（基本上是比较每个字符）

您只想从quote中删除all_post及其后面的空格。我们无需为此简单任务定义函数。

        print(row['all_post'].replace(row['quote']+" ",""))

棘手的一点是转义了您可能遇到的任何特殊字符。如果您尝试提取的结果注释中包含那些特殊字符，并且您希望保留这些特殊字符，则可能会出现问题。

在下面的代码中，括号需要通过将其替换为'）'来进行转义，以使其起作用。如果它们不能逃脱，例如可能会有不平衡的括号。

confession = pd.DataFrame(columns=['quote','all_post'])
confession = confession.append(pd.Series({'quote':"Originally Posted by Gilly Is it? Hmmmm. I enjoy randomness. It changes my mood. I guess not everyone responds the same way. ;)",'all_post':"Originally Posted by Gilly Is it? Hmmmm. I enjoy randomness. It changes my mood. I guess not everyone responds the same way. ;) Ah ok.. maybe I didn't understand.. /animal crawls back in its hole"}),ignore_index=True)

for index,row in confession.iterrows():
    quote = confession['quote'].iloc[index].replace(')','\)')
    post = confession['all_post'].iloc[index].replace(')','\)')
    comment = post.replace(f'{quote}','').strip()
    confession.loc[index,'comment'] = comment

您将不得不考虑所有特殊字符，并以比我在示例中建议的更好的方式对其进行转义。 How to escape special characters of a string with single backslashes 上的答案可能有助于识别在处理的各种引号中可能遇到的任何转义字符。

比较2个句子并分成数据框

heyongkun 回答：比较2个句子并分成数据框

大家都在问