如何将索引值附加到从复杂类型的Spark RDD生成的嵌套JSON中?

我有一个复杂的RDD变量,称为 receipts ,类型为RDD [Array [Test]]

case class Test(
    header: TestHeader,number: String,amount: Double,description: String)
case class TestHeader(
    id: Long,description: String,barcode: Option[String],date: LocalDate,currency: String,vendorId: Long,vendorSiteId: Long,source: String,payGroup: String,creationDate: LocalDate)

Test和TestHeader的toJSON函数定义为:

def toJSON: JSONObject = {
    new JSONObject()
      .put("header",header.toJSON)
      .put("number",number)
      .put("amount",amount)
      .put("description",description)
  }

我能够输出一个JSON,但是现在所有收据都被粘贴到一个json中,所以我不知道它来自哪个收据。我需要附加一个索引号。

用于输出JSON的代码

receipts
      .flatMap(receipt => receipt.map(test => test.toJSON.toString))
      .saveAsTextFile("s3://test/")

当前输出

{"number":"2","amount":100,"header":{"date":"2019-09-30","vendorSiteId":12345,"description":"Some text","vendorId":123,"source":"Manual Entry","creationDate":"2019-10-15","payGroup":"ABCD","number":"B201909","currency":"JPY","id":999999,"barcode":"1111111"},"description":"some text"}
{"number":"1","amount":200,"vendorSiteId":768,"description":"some text","vendorId":345,"id":99999,"barcode":"11111"},"amount":300,"header":{"date":"2019-10-12","vendorSiteId":567,"vendorId":987,"source":"test","creationDate":"2019-10-12","payGroup":"KDP","number":"b1935b6859a196d6b5e7d68b95c209d4649d645f","currency":"USD","id":951574663,"barcode":"None"},"description":"some text"}

预期产量

{"blocks":{"1":[{"number":"2","description":"some text"},{"number":"1","description":"some text"}],"2":{"number":"1","description":"some text"}}}

任何指针将不胜感激。预先感谢!

guotenglin534577320 回答:如何将索引值附加到从复杂类型的Spark RDD生成的嵌套JSON中?

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3100243.html

大家都在问