使用Shell将Large CSV转换为固定记录的多个JSON数组（例如100条记录json数组）

2024-05-03 • 问答

如何通过SHELL脚本或命令行将大型CSV转换为固定记录集的JSON数组（包含100条记录的JSON数组）？

例如。输入的CSV文件：

identifier,type,locale
91617676848,msisdn,es_ES
91652560975,es_ES
91636563675,es_ES

预期输出：

1.json  (json array having 100 array records)
  [
  {
    "identifier": "91617676848","type": "msisdn","locale": "es_ES"
  },.
  .
  .
  .
  {
    "identifier": "91652560975","locale": "es_ES"
  }
  ]


  2.json (json array having 100 array records)
  [
  {
    "identifier": "91636563675",.
  .
  .
  .

  {
    "identifier": "91636563999","locale": "es_ES"
  }
  ]

我创建了一个简单的php脚本（我称之为converter.php）。

您可以按如下方式调用：php converter.php test.csv，其中test.csv包含默认的csv数据，其中第一行为标头。

<?php
    // open the file passed as parameter
    // Ex: php converter.php test.csv
    // Where test.csv contains the data passed on question
    if (($handle = fopen($argv[1],'r')) !== false) {
            $count = 0;
            $lines = [];
            while (($data = fgetcsv($handle,','\'')) !== false) {
                    if ($count == 0) {
                        $headers = $data;
                    } else {
                        $lines[] = array_combine($headers,$data);
                    }
                    $count++;
            }
            // Here,separate in array of arrays with 100 elements on each
            // On test i used 2 on second parameter of array_chunk to test with your toy data
            $groups = array_chunk($lines,100);
            foreach ($groups as $key => $group) {
                    file_put_contents('json_data-'.$key.'.json',json_encode($group));
            }
    }

我在本地运行，并且用两个元素将文件分开进行测试，结果导致两个文件保存在本地，名为json_data-<key>.json

结果在这里：

json_data-0.json：

[ {“ identifier”：“ 91617676848”，“ type”：“ MSISDN”，“ locale”：“ es_ES”}，{“ identifier”：“ 91652560975”，“ type”：“ MSISDN”，“ locale”：“ es_ES” } ]
json_data-1.json：

[ {“标识符”：“ 91636563675”，“类型”：“ MSISDN”，“语言环境”：“ es_ES”} ]

请尝试使用awk解决方案：

awk -v bs=10 '
    NR == 1 {
        cols = split($0,header,",")
        next
    }
    {
        if ((NR - 1) % bs == 1) {
            file = sprintf("%d.json",++n)
            print "[\n  {" > file
        } else {
            print ",\n  {" >> file
        }
        split($0,a,")
        for (i = 1; i <= cols; i++) {
            printf("    \"%s\": \"%s\"",header[i],a[i]) >> file
            print (i < cols) ? "," : "" >> file
        }
        printf "%s","  }" >> file
    }
    (NR - 1) % bs == 0 {
        print "\n]" >> file
        close(file)
    }
    END {
        if ((NR - 1) % bs != 0) print "\n]" >> file
    }
' input.csv

变量bs每个文件中包含多个数组。
它逐行处理输入文件，并具有许多条件分支来生成正确的json文件。叹气。

使用bash实现，可以通过重复切片从文件（2-101，102-201，...）到文件结尾的行范围来完成任务。下面的代码使用sed提取行，并使用csvjson将每个块格式化为JSON。

您可以替换任何喜欢的工具（csv到json的替代品很少）。

所需的代码稍微冗长些。

#! /bin/sh
csv=$1
lines=$(wc -l < $csv)
blocks=$((1+(lines-1)/100))
for (( i=1 ; i <= blocks ; i++ )) ; do
    sed -ne "1p;$((i*100-98)),$((i*100+1))p" $csv | csvjson -i2 > $i.json
done

假设文件大小合理，则重新处理输入文件不会产生太多开销

使用Shell将Large CSV转换为固定记录的多个JSON数组（例如100条记录json数组）

ssywzssywz 回答：使用Shell将Large CSV转换为固定记录的多个JSON数组（例如100条记录json数组）

大家都在问