我在数据帧方面苦苦挣扎,因此我不得不用熊猫从txt创建CSV文件,然后通过“ id2”将其合并为内部联接,并希望在数据库上进行操作(想将某些行打印为sql命令“ SELECT *”。 ..”这是我的代码示例: Python
df = pd.read_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracks.txt',delimiter='<SEP>',engine='python',names=["id","id2","artysta","track"])
print(df.head(5))
sv = df.to_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracks.csv',index=None,header=True)
df1 = pd.read_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracksV1.txt',names=["id3","data"])
sv1 = df1.to_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracksV1.csv',header=True)
print(df1.head(5))
merged_inner = pd.merge(left=df,right=df1,left_on='id2',right_on='id2')
print(merged_inner.head(5))
engine = create_engine('sqlite:///C:/Users/Maticz/PycharmProjects/zadanie/artists.db')
merged_inner.to_sql('tabela',con=engine,if_exists='replace',index=False)
result = engine.execute('SELECT * FROM tabela ')
for r in result:
print(r)
if r == 5:
pass
输出:
id id2 artysta track
0 TRMMMYQ128F932D901 SOQMMHC12AB0180CB8 Faster Pussy cat Silent Night
1 TRMMMKD128F425225D SOVFVAK12A8C1350D9 Karkkiautomaatti Tanssi vaan
2 TRMMMRX128F93187D9 SOGTUKN12AB017F4F1 Hudson Mohawke No One Could Ever
3 TRMMMCH128F425532C SOBNYVR12A8C13558C Yerba Brava Si Vos Querés
4 TRMMMWA128F426B589 SOHSBXH12A8C13B0DF Der Mystic Tangle Of Aspens
id3 id2 data
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBBMDR12A8C13253B 1203083335
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOBXALG12A8C13C108 984663773
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 1275071044
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e SODDNQT12A6D4F5F7E 1097509573
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e SOEOBYG12A6D4F8AE2 1009874043
id ... data
0 TRMMMYQ128F932D901 ... 1124439104
1 TRMMMYQ128F932D901 ... 1052628295
2 TRMMMYQ128F932D901 ... 1068116905
3 TRMMMCH128F425532C ... 999485493
4 TRMMMNS128F93548E1 ... 1039819697
[5 rows x 6 columns]
Traceback (most recent call last):
File "C:/Users/Maticz/PycharmProjects/zadanie/main.py",line 46,in <module>
merged_inner.to_sql('tabela',index=False)
File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py",line 2702,in to_sql
sql.to_sql(
File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 509,in to_sql
pandas_sql.to_sql(
File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 1320,in to_sql
table.insert(chunksize,method=method)
File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 756,in insert
exec_insert(conn,keys,chunk_iter)
File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 669,in _execute_insert
data = [dict(zip(keys,row)) for row in data_iter]
File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",in <listcomp>
data = [dict(zip(keys,row)) for row in data_iter]
MemoryError
那么我应该做一个“块化”吗?或我该如何操作?一个文件大约80mb,第二个文件大约2gb,它的编译时间大约为20分钟,这就是为什么即时通讯寻求帮助的原因: