如何在列上循环.replace以将多个字符串更改为一个字符串？

2024-05-04 • 问答

我在数据框中有一列，我试图将所有拼写错误/输入错误修复为正确的字符串（例如，将“ femake”更改为“ female”）。有没有办法在一行代码中列出所有拼写错误，以将其全部更改为正确的变量。我有以下代码。如果循环可行，我该如何创建循环？

mh2014['Gender'] = mh2014['Gender'].replace('f' and 'woman' and 'femail' and 'cis-female/femme' and 'female (cis)' and 'cis female' and 'femake','female')

假设这是您正在使用的Pandas DataFrame，则只需将列重新分配给列表推导即可，在其中您可以检查拼写错误，例如：

misspellings = {'f','woman','femail','cis-female/femme','female (cis)','cis female','femake'}
mh2014['Gender'] = ["female" if entry in misspellings else entry for entry in mh2014['Gender']]

我们使用一个集合来加快拼写错误的查找速度，因为它具有O(1) average search time。

如果您想添加更多的拼写错误以进行捕获，请修改拼写错误列表，如果列表繁琐而难以硬编码，则可以从文件中加载。

您只需要遍历要替换的字符串：

misspellings = ['f','femake']
for s in misspellings:
    mh2014['Gender'] = mh2014['Gender'].replace(s,'female')

and并没有您认为的那样。来自Python Tutorial：

布尔运算符and和or就是所谓的 short 运算符：它们的自变量从左到右求值，结果一结束就停止求值决心，决意，决定。例如，如果A和C为true，但B为false，则A and B and C不对表达式C求值。当用作通用值而不是布尔值时，短路运算符的返回值是最后计算的参数。

例如：

>>> 'x' and 'y'  # 'x' is true so try next expression
'y'
>>> '' and 'y'  # '' is false so don't try next expression
''
>>> 'x' and ''  # Same as the first
''

由于您在问题中使用了单词数据框，所以我希望它会出现在熊猫里。

import pandas as pd
df = pd.read_excel('loation/to/myfile')
misspelled = set('f','femake')
df['Gender'] = df['Gender'].str.replace(misspelled,'female')

如何在列上循环.replace以将多个字符串更改为一个字符串？

jian123789 回答：如何在列上循环.replace以将多个字符串更改为一个字符串？

大家都在问