在Pandas,Python3中创建合并表

我在数据帧方面苦苦挣扎,因此我不得不用熊猫从txt创建CSV文件,然后通过“ id2”将其合并为内部联接,并希望在数据库上进行操作(想将某些行打印为sql命令“ SELECT *”。 ..”这是我的代码示例: Python

df = pd.read_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracks.txt',delimiter='<SEP>',engine='python',names=["id","id2","artysta","track"])
print(df.head(5))
sv = df.to_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracks.csv',index=None,header=True)
df1 = pd.read_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracksV1.txt',names=["id3","data"])
sv1 = df1.to_csv(r'C:\Users\Maticz\PycharmProjects\zadanie\tracksV1.csv',header=True)
print(df1.head(5))

merged_inner = pd.merge(left=df,right=df1,left_on='id2',right_on='id2')
print(merged_inner.head(5))

engine = create_engine('sqlite:///C:/Users/Maticz/PycharmProjects/zadanie/artists.db')
merged_inner.to_sql('tabela',con=engine,if_exists='replace',index=False)

result = engine.execute('SELECT * FROM tabela  ')
for r in result:
    print(r)
    if r == 5:
        pass

输出:


                   id                 id2           artysta              track
0  TRMMMYQ128F932D901  SOQMMHC12AB0180CB8  Faster Pussy cat       Silent Night
1  TRMMMKD128F425225D  SOVFVAK12A8C1350D9  Karkkiautomaatti        Tanssi vaan
2  TRMMMRX128F93187D9  SOGTUKN12AB017F4F1    Hudson Mohawke  No One Could Ever
3  TRMMMCH128F425532C  SOBNYVR12A8C13558C       Yerba Brava      Si Vos Querés
4  TRMMMWA128F426B589  SOHSBXH12A8C13B0DF        Der Mystic   Tangle Of Aspens
                                        id3                 id2        data
0  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOBBMDR12A8C13253B  1203083335
1  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOBXALG12A8C13C108   984663773
2  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SODDNQT12A6D4F5F7E  1275071044
3  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SODDNQT12A6D4F5F7E  1097509573
4  b80344d063b5ccb3212f76538f3d9e43d87dca9e  SOEOBYG12A6D4F8AE2  1009874043
                   id  ...        data
0  TRMMMYQ128F932D901  ...  1124439104
1  TRMMMYQ128F932D901  ...  1052628295
2  TRMMMYQ128F932D901  ...  1068116905
3  TRMMMCH128F425532C  ...   999485493
4  TRMMMNS128F93548E1  ...  1039819697

[5 rows x 6 columns]
Traceback (most recent call last):
  File "C:/Users/Maticz/PycharmProjects/zadanie/main.py",line 46,in <module>
    merged_inner.to_sql('tabela',index=False)
  File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py",line 2702,in to_sql
    sql.to_sql(
  File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 509,in to_sql
    pandas_sql.to_sql(
  File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 1320,in to_sql
    table.insert(chunksize,method=method)
  File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 756,in insert
    exec_insert(conn,keys,chunk_iter)
  File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",line 669,in _execute_insert
    data = [dict(zip(keys,row)) for row in data_iter]
  File "C:\Users\Maticz\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\sql.py",in <listcomp>
    data = [dict(zip(keys,row)) for row in data_iter]
MemoryError

那么我应该做一个“块化”吗?或我该如何操作?一个文件大约80mb,第二个文件大约2gb,它的编译时间大约为20分钟,这就是为什么即时通讯寻求帮助的原因:

baichichangchun 回答:在Pandas,Python3中创建合并表

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3098221.html

大家都在问