读取具有多个工作表的大文件excel以python转换为json

2024-05-05 • 问答

我有一个很大的excel文件，并有多个工作表，其大小为100 MB

工作表A

id | name | address
1  | joe  |  A
2  | gis  |  B
3  | leo  |  C

工作_1

id| call
1 | 10
1 | 8
2 | 1
3 | 3

work_2

id| call
2 | 4  
3 | 8
3 | 7

每个ID所需的json

data = { id: 1,address: A,name: Joe,log : [{call:10},{call:8 }]
}
data= { id: 2,address: B,name: Gis,log : [{call:1},{call:4}]
}
data= { id: 3,address: C,name: Leo,log : [{call:3},{call:8},{call:7}]
}

我尝试过使用熊猫，但是运行它需要5分钟，并且它只有read_excel，没有任何处理。有什么解决方案可以使其更快，以及如何获得所需的json？

也许可以将进程划分为大块（但是大熊猫为read_excel删除了大块大小），并添加了一些线程来设置间隔，以便可以在每批中打印过程。

works=pd.concat([work1,work2],ignore_index=True) mapper_works=works.groupby('id')[['call']].apply(lambda x: x.to_dict('records')) dfa['log']=dfa['id'].map(mapper_works) data=dfa.reindex(columns=['id','address','name','log']).to_dict('records') print(data)

[{'id': 1,'address': 'A','name': 'joe','log': [{'call': 10},{'call': 8}]},{'id': 2,'address': 'B','name': 'gis','log': [{'call': 1},{'call': 4}]},{'id': 3,'address': 'C','name': 'leo','log': [{'call': 3},{'call': 8},{'call': 7}]} ]

dfa['dicts']=data print(dfa) id name address log \ 0 1 joe A [{'call': 10},{'call': 8}] 1 2 gis B [{'call': 1},{'call': 4}] 2 3 leo C [{'call': 3},{'call': 7}] dicts 0 {'id': 1,'log'... 1 {'id': 2,'log'... 2 {'id': 3,'log'...

读取具有多个工作表的大文件excel以python转换为json

duoqing321 回答：读取具有多个工作表的大文件excel以python转换为json

大家都在问