我有col1这样的数据框df列。从col1,我需要创建两列数字和单词。 df ['col1']分为df ['code'],df ['name']。
(index) col1
94 520XX
111 316aa
114 Entry
144 325 Sport
146 xColor35d
166 420 M Sport
167 XX
199 XX
225 645 Ai
我以此方式尝试过
import pandas as pd
import numpy as np
result = df['col1'].str.split('(\d+)([A-Za- z]+)',expand=True)
result = result.loc[:,[0,1,2,3]]
result.rename(columns={0:'split_0',1:'split_1',2:'split_2',3:'split_3'},inplace=True)
result['split_0'] = result['split_0'].fillna(value=pd.np.NaN,inplace=False)
result['split_0'] = result['split_0'].astype(str).replace(r'^\s*$',np.nan,regex=True)
result
结果是
split_0 split_1 split_2 split_3
94 520 XX None None
111 NaN 316 aa
114 Entry None None None
144 325 Sport None None None
146 xColor 35 d
166 420 M Sport None None None
167 XX None None None
199 XX None None None
225 645 Ci None None
当我尝试将'split_0'列拆分为数字和单词并随后进行连接,并且最后只有两列包含来自所有'split_ *'列的数字和单词的问题时,我的问题出现了,并保留了索引,如: / p>
code name
94 520 XX
111 316 aa
114 NaN Entry
144 325 Sport
146 35 xColor d
166 420 M Sport
167 NaN XX
199 NaN XX
225 645 Ci