逐列迭代2D数据,用Java处理和存储列标题

我的文本文件很大,我想遍历各列,同时比较上一个值和下一个值,然后将与它们关联的列标题存储在列表中,以备后用。请给我一些有关如何有效解决此问题的建议。以下是到目前为止所做的事情,尝试使用“ for循环”已经过去了!谢谢。

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Scanner;

public class Projections {

    public static void main(String[] args) {
        String fileName= "study_panel.csv";
        File file= new File(fileName);

        // 2-dimensional list of strings
        List<List<String>> lines = new ArrayList<>();
        Scanner inputStream;
        try{
            inputStream = new Scanner(file);

            while(inputStream.hasnext()){
                String line= inputStream.next();
                String[] values = line.split(",");
                // Adds the currently parsed line to the 2-dimensional string list
                lines.add(Arrays.asList(values));
            }

            //Compare specific elements in the list
            String svalue = lines.get(3).get(1);
            String svalue2 = lines.get(3).get(2);
            if(svalue.equals(svalue2)){
                System.out.println("No recombination");
                //store column`s header in list
            }
            else{
                System.out.println("Recombination");
                //store column`s header in list
            }

            inputStream.close();
        }catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        // Iterate through the 2-dimensional data and store column headers
        int lineNo = 0;
        for(List<String> line: lines) {
            int columnNo = 0;
            String previousValue=None;
            String newValue;

            for (String value: line) {

                //Compare column elements in the 2-dimensional data

                if(previousValue.equals(newValue)){
                    System.out.println("No recombination");
                    //store column`s header in list
                }
                else{
                    System.out.println("Recombination");
                    //store column`s header in list
                }
              // System.out.println("Individual " + lineNo + " Site " + columnNo + ": " + value);
                columnNo++;
            }
            lineNo++;
        }


    }
}

1。样本研究数据

ID,S1_577905,S1_1066894,S1_1293038,S1_1491834
ind1,A,A
ind2,B,B
ind3,A
ind4,B
ind5,H,A
ind6,-,B
ind7,H

  1. 样本参考数据
ID,S1_570493,S1_592115,S1_604416,S1_614892,S1_618220,S1_636801,S1_654822,S1_655362,S1_723787,S1_723892,S1_858753,S1_867194,S1_923829,S1_925667,S1_1009779,S1_1009843,S1_1010052,S1_1010123,S1_1010298,S1_1010403,S1_1029733,S1_1039046,S1_1040024,S1_1044174,S1_1044355,S1_1049540,S1_1049657,S1_1050097,S1_1050995,S1_1126726,S1_1166956,S1_1177001,S1_1185437,S1_1188610,S1_1191450,S1_1195593,S1_1195669,S1_1195782,S1_1197394,S1_1207757,S1_1207893,S1_1211271,S1_1211343,S1_1223120,S1_1223377,S1_1237046,S1_1251020,S1_1280051,S1_1280124,S1_1284151,S1_1308043,S1_1340776,S1_1341385,S1_1363675,S1_1363753,S1_1407704,S1_1410354,S1_1431655,S1_1433696,S1_1490941,S1_1507081
A,T,C,G,G
B,T

  1. 样本预期结果
ID,S1_1507081
ind1,G
ind2,T
ind3,G
ind4,G
ind5,-
ind6,T
ind7,-

wangdaren1 回答:逐列迭代2D数据,用Java处理和存储列标题

假设您不想使用CSV库(无论如何您的csv看起来都很简单),我尝试更新您的代码。

public static void main(String[] args) {
        String fileName= "study_panel.csv";
        File file= new File(fileName);

        // 2-dimensional list of strings
        List<List<String>> lines = new ArrayList<>();
        List<String> header = null; //Lets store the header in a seperate list
        Map<Integer,List<String>> recombinationM = new HashMap<>();
        Map<Integer,List<String>> noRecombinationM = new HashMap<>();

        Scanner inputStream;
        try{
            inputStream = new Scanner(file);

            while(inputStream.hasNext()){
                String line= inputStream.next();
                String[] values = line.split(",");

                if (header == null){
                    header= Arrays.asList(values);
                    continue;//go to the next line as header is read
                }
                // Adds the currently parsed line to the 2-dimensional string list
                lines.add(Arrays.asList(values));
            }
            inputStream.close();
        }catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        // Iterate through the 2-dimensional data and store column headers


        for (int i=0; i<lines.size(); i++) {
            List<String> recombinationHdr = new ArrayList<>();
            List<String> noRecombinationHdr = new ArrayList<>();
            for (int j=0; j<lines.get(i).size()-1; j++) {
                //Comparison
                if (lines.get(i).get(j).equals(lines.get(i).get(j + 1))) {
                    System.out.println("No recombination");
                    noRecombinationHdr.add(header.get(j));//To store the current header
                    //hdrs.add(header.get(j+1)); // To store the next header
                } else {
                    System.out.println("Recombination");
                    recombinationHdr.add(header.get(j));//To store the current header
                    //recombinationHdr.add(header.get(j+1)); // To store the next header
                }
            }
            recombinationM.put(i,recombinationHdr);
            noRecombinationM.put(i,noRecombinationHdr);
        }
        //Print maps
        System.out.println("== No Recombination ==");
        for (Map.Entry<Integer,List<String>> entry : noRecombinationM.entrySet()){
           System.out.println("Line: " + entry.getKey() + " - " + entry.getValue().toString());
        }

        System.out.println("== Recombination ==");
        for (Map.Entry<Integer,List<String>> entry : recombinationM.entrySet()){
           System.out.println("Line: " + entry.getKey() + " - " + entry.getValue());
        }
    }

我介绍了标头列表List,其中存储了CSV(列)的第一行,因此它与存储在行列表中的其余行分开。我介绍了两个重组标头和无重组标头的输出映射。映射键是行号,映射的值是标题的字符串列表。

在代码的主要部分中,有读取CSV并将其插入两个列表(标题和行)的“扫描器”部分。第二部分是List迭代和检查。我不确定我是否正确理解了根据下一个/上一个值比较值的含义,并且我假设您的意思是比较在列的当前索引和下一个索引之间位于同一行:{{1 }}  因此对于第i行,它将j值与下一个值j + 1进行比较。

基于上述评估,行和标题if (lines.get(i).get(j).equals(lines.get(i).get(j + 1))) {存储在重组/ noRecombination映射中。

您的样本结果如下:

header.get(j)

如果您不想比较第一列(ID),则可以从j = 1开始第二个循环。

本文链接:https://www.f2er.com/3154037.html

大家都在问