dask to_dataframe重复索引

我正在将文本文件列表加载到dask数据框中。每个文本文件都有多行字典(用换行符分隔)。对于文本文件的每一行,我都会进行“ remove_escapes”中定义的一些小处理并返回一个列表。我打电话给flatten以确保我有一个列表(而不是列表列表)。

    input_file_list = self.get_file_list()
    posts_db = db.from_sequence(input_file_list)
    posts_db = posts_db.map(self.remove_escapes).flatten()
    posts_df = posts_db.to_dataframe()
    posts_df = posts_df.compute()

def remove_escapes(self,chunk_file):
    json_list = []
    with open(chunk_file,'r') as fp:
        for line in fp:
            line = line.strip()
            if line:
                line = line.replace("\\\\","\\")
                json_data = json.loads(line)
                json_list.append(json_data)
    return json_list

我明白了:

                                                 Body                                           Comments     Id                                              Title
0   <p>It depends on the context:</p>&#xA;&#xA;<ol...  side note: Hash#fetch is not exactly Hash#[]. ...  13935                                               None
1   <p>It depends on the context:</p>&#xA;&#xA;<ol...                            @tokland `:c` not found  13935                                               None
2   <p>It depends on the context:</p>&#xA;&#xA;<ol...  "There is also a convention that it is used as...  13935                                               None
3   <p>I'd like to have a python program alert me ...  `import os; os.system('say "Beer time."'); pri...  13941                              Python Sound ("Bell")
4   <p>I'd like to have a python program alert me ...  the question is answered but... you do need qu...  13941                              Python Sound ("Bell")
5   <p>I'd like to have a python program alert me ...       Does not seem to be working for me on Mojave  13941                              Python Sound ("Bell")
6   <p>Have you tried :</p>&#xA;&#xA;<pre><code>im...   I'm on ubuntu,it doesn't work for me. Any idea?  13949                                               None
7   <p>Have you tried :</p>&#xA;&#xA;<pre><code>im...  @kecske it's common [to disable the audible-be...  13949                                               None
8   <p>Have you tried :</p>&#xA;&#xA;<pre><code>im...    Works on Windows XP as well (in a console app).  13949                                               None
9   <p>I had to turn off the "Silence terminal bel...               Seems to work with python 2 only....  13959                                               None
0   <p>I want to use a track-bar to change a form'...  Also,Decimal can't represent as wide a value ...      4                         Convert Decimal to Double?
1   <p>Given a <code>DateTime</code> representing ...  what all of the answers so far have missed is ...      9            How do I calculate someone's age in C#?
2   <p>Given a <code>DateTime</code> representing ...  No one has considered leap years? or checking ...      9            How do I calculate someone's age in C#?
3   <p>Given a <code>DateTime</code> representing ...  Note that for someone less than one year old,...      9            How do I calculate someone's age in C#?
4   <p>Given a <code>DateTime</code> representing ...                      why nobody is using TimeSpan?      9            How do I calculate someone's age in C#?
5   <p>Given a specific <code>DateTime</code> valu...  What if you want to calculate a relative time ...     11                      Calculate relative time in C#
6   <p>Given a specific <code>DateTime</code> valu...  moment.js is a very nice date parsing library....     11                      Calculate relative time in C#
7   <p>Given a specific <code>DateTime</code> valu...  There is the .net package https://github.com/N...     11                      Calculate relative time in C#
8   <p>Here's how I do it</p>&#xA;&#xA;<pre class=...  "< 48*60*60s" is a rather unconventional defin...     12                                               None
9   <p>Here's how I do it</p>&#xA;&#xA;<pre class=...  Since all those If..else are just timeslabs,y...     12                                               None
0   <p>Best solution is to let IIS do it.</p>&#xA;...  Jeff Atwood List some problems he’s run into...  17068                                               None
1   <p>use <code>system.xml.Linq.XElement</code> a...                           I'm working with NET 2.0  17093                                               None
2   <p>We are developing an application that invol...  I fail to see answers for this questions which...  17106  How to generate sample XML documents from thei...
3   <p><a href="http://netbeans.org" rel="nofollow...                            That era is now over...  17110                                               None
4   <p><a href="http://www.altova.com/xmlspy.html"...  XMLSpy looked good but generated xml that then...  17114                                               None
5   <p>How do you run an external program and pass...  I think you need to rewrite your question - op...  17140             How do you spawn another process in C?
6   <pre><code>#include &lt;stdlib.h&gt;&#xA;&#xA;...  Never use system. It is far from multithreadin...  17148                                               None
7   <p>I know that IList is the interface and List...  If anyone is still wondering,I find the best ...  17170             When to use IList and when to use List
8   <p>I don't think there are hard and fast rules...  why not make it a just a List in the first pla...  17177                                               None
9   <p>Here's how I do it</p>&#xA;&#xA;<pre class=...  But currently SO only show the "Time ago" form...     12                                               None
..                                                ...                                                ...    ...                                                ...
0   <p>I'm going to continue my habit of going aga...  No,I'm not talking about apps that are that s...  10448                                               None
1   <p>I'm going to continue my habit of going aga...  I don't see how moving business logic into sto...  10448                                               None
2   <p>If you were on Windows,I'd tell you to use...  +1  I've used this named pipe methodology seve...  10450                                               None
3   <p>The 'click sound' in question is actually a...  I had a problem with this line:  isEnabled = v...  10456  HowTo Disable WebBrowser 'Click Sound' in your...
4   <p>Ideally,I'm looking for a templated logica...  @d03boy: Well it has HashSet<T> now,but after...  10458           Is there a "Set" data structure in .Net?
5   <p>Ideally,I'm looking for a templated logica...  See [this question](https://stackoverflow.com/...  10458           Is there a "Set" data structure in .Net?
6   <p>Ideally,I'm looking for a templated logica...  Possible duplicate of [C# Set collection?](htt...  10458           Is there a "Set" data structure in .Net?
7   <p><a href="http://msdn.microsoft.com/en-us/li...  Matt,+1. That sounds like exactly what he ask...  10459                                               None
8   <p>I've noticed that if you use WebBrowser.Doc...  your suggested solution prevents the control f...  10463                                               None

如上所示,索引被重复。有没有办法确保索引排序正确并不断增加?

sky43420024420 回答:dask to_dataframe重复索引

这是继承给partitioning的。
有关pandasdask数据帧之间的其他区别,请参见dask dataframe examples

,

从这里https://github.com/dask/dask/issues/3788,我了解到这种行为是预期的。

本文链接:https://www.f2er.com/3122932.html

大家都在问