如何在Java中按行将CSV文件拆分为不同的CSV文件?

我有一个可以读取CSV文件的类,但是当文件大时,程序会引发public class Mapsactivity extends Fragmentactivity implements OnmapReadyCallback { private GoogleMap mMap; public Location myLocation; public LatLng currentLocation; public double latitude; public double longitude; FusedLocationProviderClient fusedLocationProviderClient; private static final int REQUEST_CODE = 101; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_maps); // Obtain the SupportMapFragment and get notified when the map is ready to be used. SupportMapFragment mapFragment = (SupportMapFragment) getSupportFragmentManager() .findFragmentById(R.id.map); mapFragment.getMapAsync(this); currentLocation = new LatLng(0,0); fusedLocationProviderClient = LocationServices.getFusedLocationProviderClient(this); } private void getcurrentLocation() { if (activityCompat.checkSelfPermission(this,android.Manifest.permission.accESS_FINE_LOCATION) != PackageManager.PERMISSION_GRANTED && activityCompat.checkSelfPermission(this,android.Manifest.permission.accESS_COARSE_LOCATION) != PackageManager.PERMISSION_GRANTED) { activityCompat.requestPermissions(this,new String[]{Manifest.permission.accESS_FINE_LOCATION},REQUEST_CODE); return; } Task<Location> task = fusedLocationProviderClient.getLastLocation(); task.addOnSuccessListener(new OnSuccessListener<Location>() { @Override public void onSuccess(Location location) { if(location != null){ myLocation = location; double lat = location.getLatitude(); double lon = location.getLongitude(); Log.v("Mapsactivity","Lat: " + latitude + " Long: " + longitude); latitude = lat; longitude = lon; LatLng current = new LatLng(latitude,longitude); currentLocation = current; } } }); } @Override public void onmapReady(GoogleMap googleMap) { if (activityCompat.checkSelfPermission(this,REQUEST_CODE); return; } getcurrentLocation(); Log.v("Mapsactivity","Latitude: " + latitude + " Longitude: " + longitude); mMap = googleMap; mMap.addMarker(new MarkerOptions().position(currentLocation).title("Your location")); mMap.animateCamera(CameraUpdateFactory.newLatLngzoom(currentLocation,10)); } @Override public void onRequestPermissionsResult(int requestCode,@NonNull String[] permissions,@NonNull int[] grantResults) { ... } } 错误,因此我需要将该文件拆分为多个部分,然后根据行大小将行转移到其他文件中。

例如; 我有一个500 000行的文件,我将其划分为5个文件乘以100 000行。因此,我有5个文件,由10万行组成,以便我阅读它们。

我找不到方法,所以如果我看到示例代码行,那就太好了。

laosizhxy 回答:如何在Java中按行将CSV文件拆分为不同的CSV文件?

public static void splitLargeFile(final String fileName,final String extension,final int maxLines,final boolean deleteOriginalFile) {

    try (Scanner s = new Scanner(new FileReader(String.format("%s.%s",fileName,extension)))) {
        int file = 0;
        int cnt = 0;
        BufferedWriter writer = new BufferedWriter(new FileWriter(String.format("%s_%d.%s",file,extension)));

        while (s.hasNext()) {
            writer.write(s.next() + System.lineSeparator());
            if (++cnt == maxLines && s.hasNext()) {
                writer.close();
                writer = new BufferedWriter(new  FileWriter(String.format("%s_%d.%s",++file,extension)));
                cnt = 0;
            }
        }
        writer.close();
    } catch (Exception e) {
        e.printStackTrace();
    }

    if (deleteOriginalFile) {
        try {
            File f = new File(String.format("%s.%s",extension));
            f.delete();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
,

如果您使用的是Linux,并且可以先通过脚本运行CSV,则可以使用“ split”:

$ split -l 100000 big.csv small-

这将生成名为small-aa,small-ab,small-ac的文件...如果需要,可将其重命名为csv:

$ for a in small-*; do 
    mv $a $a.csv;                # rename split files to .csv 
    java MyCSVProcessor $a.csv;  # or just process them anyways 
done

尝试以下其他选项:

$ split -h

-a –suffix-length=N use suffixes of length N (default 2)
-b –bytes=SIZE put SIZE bytes per output file
-C –line-bytes=SIZE put at most SIZE bytes of lines per output file
-d –numeric-suffixes use numeric suffixes instead of alphabetic
-l –lines=NUMBER put NUMBER lines per output file

但是,这不能很好地缓解您的问题-CSV阅读器模块内存不足的原因是,它要么在拆分文件之前将整个文件读入内存,要么这样做是将处理后的输出保存在内存中。为了使您的代码更具可移植性和通用性,您应该考虑一次处理一行-然后自己逐行拆分输入。 (来自https://stackabuse.com/reading-and-writing-csvs-in-java/

BufferedReader csvReader = new BufferedReader(new FileReader(pathToCsv));
while ((row = csvReader.readLine()) != null) {
    String[] data = row.split(",");
    // do something with the data
}
csvReader.close();

使用上述代码的注意事项是带引号的逗号将被视为新列-如果CSV数据包含带引号的逗号,则必须添加一些其他处理。

当然,如果您确实要使用现有代码,并且只想拆分文件,则可以修改上面的内容:

import java.io.*;

public class split {

    static String CSVFile="test.csv";
    static String row;
    static BufferedReader csvReader;
    static PrintWriter csvWriter;

    public static void main(String[] args) throws IOException {   

    csvReader = new BufferedReader(new FileReader(CSVFile));

    int line = 0;
    while ((row = csvReader.readLine()) != null) {
       if (line % 100000 == 0) {  // maximum lines per file
          if (line>0) { csvWriter.close(); }
          csvWriter = new PrintWriter("cut-"+Integer.toString(line)+CSVFile);
       }
       csvWriter.println(row);
        // String[] data = row.split(",");
        // do something with the data
       line++;
    }
    csvWriter.close();
    csvReader.close();

    }
}

我在FileWriter或BufferedWriter上方选择了PrintWriter,因为它会自动打印相关换行符-并且我认为它已被缓冲... 20年来我没有用Java编写任何东西,所以我敢打赌,您可以在上述内容上进行改进。

本文链接:https://www.f2er.com/2631820.html

大家都在问