您可以尝试以下方法。它基本上通过第一个URL将lst
中的子列表分组为列表的默认字典,然后仅将第二个项目号相加来构建新结果。
from collections import defaultdict
from pprint import pprint
lst = ...
d = defaultdict(list)
for item in lst:
d[item[0]].append(item)
result = [[v[0][0]] + [sum(x[1] for x in v)] + v[0][2:] for v in d.values()]
pprint(result)
输出:
[['https://www.website.com/directory/link-1',206,'Long Text Field 1','String 1',{'url': 'https://www.website.com/images/image-1.jpg'},255],['https://www.website.com/directory/link-2',296,'Long Text Field 2',{'url': 'https://www.website.com/images/image-2.jpg'},303],['https://www.website.com/directory/link-3',354,'Long Text Field 3',{'url': 'https://www.website.com/images/image-3.jpg'},388],['https://www.website.com/directory/link-4',606,'Long Text Field 4',{'url': 'https://www.website.com/images/image-4.jpg'},624]]
,
如果您想使用pandas
,可以得到如下内容:
Page Count Text String Url Magic
0 https://www.website.com/directory/link-1 21 Long Text Field 1 String 1 https://www.website.com/images/image-1.jpg 255
1 https://www.website.com/directory/link-1 185 Long Text Field 1 String 1 https://www.website.com/images/image-1.jpg 255
2 https://www.website.com/directory/link-2 296 Long Text Field 2 None https://www.website.com/images/image-2.jpg 303
3 https://www.website.com/directory/link-3 354 Long Text Field 3 None https://www.website.com/images/image-3.jpg 388
4 https://www.website.com/directory/link-4 606 Long Text Field 4 None https://www.website.com/images/image-4.jpg 624
----
Page Count Magic String Url Text
0 https://www.website.com/directory/link-1 206 255 String 1 https://www.website.com/images/image-1.jpg Long Text Field 1
1 https://www.website.com/directory/link-2 296 303 None https://www.website.com/images/image-2.jpg Long Text Field 2
2 https://www.website.com/directory/link-3 354 388 None https://www.website.com/images/image-3.jpg Long Text Field 3
3 https://www.website.com/directory/link-4 606 624 None https://www.website.com/images/image-4.jpg Long Text Field 4
通过运行以下代码。请注意,由于您的数据格式有些不一致,因此我必须为缺少的字符串添加伪值。
import pandas as pd
data = [
['https://www.website.com/directory/link-1',21,['https://www.website.com/directory/link-1',185,624]
]
columns = ['Page','Count','Text','String','Url','Magic']
for d in data:
if len(d) != 6:
d.insert(3,None)
d[4] = d[4]['url']
df = pd.DataFrame(data,columns=columns)
agg = dict.fromkeys(columns,'first')
agg.update({'Count': 'sum'})
del agg['Page']
df2 = df.groupby(['Page'],as_index=False).agg(agg)
pd.options.display.width = 0
print df
print '\n----\n'
print df2
本文链接:https://www.f2er.com/3148516.html