我想像下面那样拆分 DataFrame。 DataFrame 包含字符串。我想将每个字符串分成 2 个列表
Input :
AGE
0 '1 years 2 days ago'
1 '3 days 4 mins ago'
2 ' 5 mins 6 secs ago'
OUTPUT:
AGE
0 [1,years],[2,days]
1 [3,days],[4,mins]
2 [5,mins],[6,secs]
我想像下面那样拆分 DataFrame。 DataFrame 包含字符串。我想将每个字符串分成 2 个列表
Input :
AGE
0 '1 years 2 days ago'
1 '3 days 4 mins ago'
2 ' 5 mins 6 secs ago'
OUTPUT:
AGE
0 [1,years],[2,days]
1 [3,days],[4,mins]
2 [5,mins],[6,secs]
使用extract
:
>>> df['AGE'].str.extract(r'(\d+ \w+) (\d+ \w+)') \
.applymap(str.split).apply(lambda x: [x[0],x[1]],axis=1) \
.rename('AGE').to_frame()
AGE
0 [[1,years],[2,days]]
1 [[3,days],[4,mins]]
2 [[5,mins],[6,secs]]
,
这里有一个更容易阅读的解决方案:
# Creating the Dataframe objects
df = pd.DataFrame({'AGE':['1 years 2 days ago','3 days 4 mins ago','5 mins 6 secs ago']})
final_df = pd.DataFrame()
# splitting into 4 separate columns
df = df['AGE'].str.split(' ',n=4,expand=True)
# Grouping into lists
final_df['first'] = df.iloc[:,[0,1]].values.tolist()
final_df['second'] = df.iloc[:,3]].values.tolist()
print(final_df)
输出:
first second
0 [1,years] [2,days]
1 [3,days] [4,mins]
2 [5,mins] [6,secs]