我目前正在尝试从数据框的行中获取国家。这是我目前拥有的代码:
l = [
['[Aydemir,Deniz\',\' Gunduz,Gokhan\',\' Asik,Nejla] Bartin
Univ,Fac Forestry,Dept Forest Ind Engn,TR-74100 Bartin,Turkey\',\' [Wang,Alice] Lulea Univ Technol,Wood Technol,Skelleftea,Sweden',1990],['[Fang,Qun\',\' Cui,Hui-Wang] Zhejiang A&F Univ,Sch Engn,Linan
311300,Peoples R China\',\' [Du,Guan-Ben] Southwest Forestry
Univ,Kunming 650224,Yunnan,Peoples R China',2005],['[Blumentritt,Melanie\',\' Gardner,Douglas J.\',\' Shaler
Stephen M.] Univ Maine,Sch Resources,Orono,ME USA\',\' [Cole,Barbara J. W.] Univ Maine,Dept Chem,ME 04469 USA',2012],['[Kyvelou,Pinelopi; Gardner,Leroy; Nethercot,David A.] Univ
London Imperial Coll Sci Technol & Med,London SW7 2AZ,England',1998]]
dataf = pd.DataFrame(l,columns = ['Authors','Year'])
这是数据帧。这是代码:
df = (dataf['Authors']
.replace(r"\bUSA\b","United States",regex=True)
.apply(lambda x: geotext.geotext(x).countries))
问题是geotext无法识别“美国”,但是现在我还看到我需要将“英国”,“苏格兰”,“威尔士”和“北爱尔兰”更改为“英国”。
如何扩展.replace
来实现这一目标?