如何反序列化大JSON文件(〜300Mb)

我想解析一个JSON文件(大小约为300Mb)。我使用Jackson库和ObjectMapper。如果我遇到内存问题,这正常吗?

我第一次使用BufferedReader,它使应用程序崩溃。接下来,我使用这个库。解析并保存到SQLite数据库需要多少时间,这很长?

chinawr 回答:如何反序列化大JSON文件(〜300Mb)

Jackson

您可以将Streaming API与常规ObjectMapper混合使用。使用这些我们可以实现不错的Iterator类。使用URL,我们可以构建流并传递给我们的实现。示例代码如下所示:

import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.core.JsonToken;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.math.BigDecimal;
import java.net.URL;
import java.util.Iterator;

public class JsonPathApp {

    public static void main(String[] args) throws Exception {
        //Just to make it work. Probably you should not do that!
        SSLUtilities.trustAllHostnames();
        SSLUtilities.trustAllHttpsCertificates();

        URL url = new URL("https://data.opendatasoft.com/explore/dataset/vehicules-commercialises@public/download/?format=json&timezone=Europe/Berlin");
        try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
            FieldsJsonIterator fieldsJsonIterator = new FieldsJsonIterator(reader);
            while (fieldsJsonIterator.hasNext()) {
                Fields fields = fieldsJsonIterator.next();
                System.out.println(fields);
                // Save object to DB
            }
        }
    }
}

class FieldsJsonIterator implements Iterator<Fields> {

    private final ObjectMapper mapper;
    private final JsonParser parser;

    public FieldsJsonIterator(Reader reader) throws IOException {
        mapper = new ObjectMapper();
        mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);

        parser = mapper.getFactory().createParser(reader);
        skipStart();
    }

    private void skipStart() throws IOException {
        while (parser.currentToken() != JsonToken.START_OBJECT) {
            parser.nextToken();
        }
    }

    @Override
    public boolean hasNext() {
        try {
            while (parser.currentToken() == null) {
                parser.nextToken();
            }
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }

        return parser.currentToken() == JsonToken.START_OBJECT;
    }

    @Override
    public Fields next() {
        try {
            return mapper.readValue(parser,FieldsWrapper.class).fields;
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

    private static final class FieldsWrapper {
        public Fields fields;
    }
}

class Fields {

    private String cnit;

    @JsonProperty("puissance_maximale")
    private BigDecimal maximumPower;

    @JsonProperty("champ_v9")
    private String fieldV9;

    @JsonProperty("boite_de_vitesse")
    private String gearbox;

    // add other required properties

    // getters,setters,toString
}

上面的代码显示:

Fields{cnit='MMB76K3BQJ41',maximumPower=110.0,fieldV9='70/220*2006/96EURO4',gearbox='A 5'}
Fields{cnit='M10MCDVPF15Z219',maximumPower=95.0,fieldV9='"715/2007*566/2011EURO5',gearbox='A 7'}
Fields{cnit='M10MCDVP027V654',maximumPower=150.0,fieldV9='715/2007*692/2008EURO5',gearbox='A 7'}
Fields{cnit='M10MCDVPG137264',maximumPower=120.0,gearbox='M 6'}
Fields{cnit='MVV4912QN718',maximumPower=210.0,fieldV9='null',gearbox='A 6'}
Fields{cnit='MMB76K3B2K88',gearbox='A 5'}
Fields{cnit='M10MCDVP012N140',maximumPower=80.0,gearbox='M 6'}
Fields{cnit='MJN5423PU123',maximumPower=88.0,gearbox='M 6'}
Fields{cnit='M10MCDVP376T303',fieldV9='"715/2007*692/2008EURO5',gearbox='M 6'}
Fields{cnit='MMB53H3B5Z93',gearbox='M 6'}
Fields{cnit='MPE1403E4834',maximumPower=81.0,gearbox='M 5'}
Fields{cnit='M10MCDVP018J905',gearbox='M 6'}
Fields{cnit='M10MCDVPG112904',maximumPower=100.0,gearbox='M 6'}
Fields{cnit='M10MCDVP015R723',gearbox='A 5'}
...

Gson

我们可以使用Gson做同样的事情。示例实现如下所示:

class FieldsJsonIterator implements Iterator<Fields> {

    private final Gson mapper;
    private final JsonReader parser;

    public FieldsJsonIterator(Reader reader) throws IOException {
        mapper = new GsonBuilder().create();

        parser = mapper.newJsonReader(reader);
        skipStart();
    }

    private void skipStart() throws IOException {
        parser.beginArray();
    }

    @Override
    public boolean hasNext() {
        try {
            return parser.hasNext();
        } catch (IOException e) {
            throw new IllegalStateException(e);
        }
    }

    @Override
    public Fields next() {
        return ((FieldsWrapper) mapper.fromJson(parser,FieldsWrapper.class)).fields;
    }

    private static final class FieldsWrapper {
        public Fields fields;
    }
}

class Fields {

    private String cnit;

    @SerializedName("puissance_maximale")
    private BigDecimal maximumPower;

    @SerializedName("champ_v9")
    private String fieldV9;

    @SerializedName("boite_de_vitesse")
    private String gearbox;

    // getters,toString
}

用法和输出应与Jackson相同。

另请参阅:

本文链接:https://www.f2er.com/3153531.html

大家都在问