在Scalacheck中生成递归结构

我正在尝试为称为Row的递归数据类型生成一个生成器。行是名为Val的列表,其中Val是原子Bin或嵌套的Row

这是我的代码:

package com.dtci.data.anonymize.parquet

import java.nio.charset.StandardCharsets
import org.scalacheck.Gen

object TestApp extends App {

  sealed trait Val
  case class Bin(bytes: Array[Byte]) extends Val
  object Bin {
    def from_string(str: String): Bin = Bin(str.getBytes(StandardCharsets.UTF_8))
  }
  case class Row(flds: List[(String,Val)]) extends Val

  val gen_bin = Gen.alphaStr.map(Bin.from_string)
  val gen_field_name = Gen.alphaLowerStr
  val gen_field = Gen.zip(gen_field_name,gen_val)
  val gen_row = Gen.nonEmptyListOf(gen_field).map(Row.apply)
  def gen_val: Gen[Val] = Gen.oneOf(gen_bin,gen_row)

  gen_row.sample.get.flds.foreach( fld => println(s"${fld._1} --> ${fld._2}"))
}

它崩溃并带有以下堆栈跟踪:

Exception in thread "main" java.lang.NullPointerException
    at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
    at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
    at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
    at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
    at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
    at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
    at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
    at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
    at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
    at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
    at org.scalacheck.Gen$.$anonfun$sequence$2(Gen.scala:492)
    at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:168)
    at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:164)
    at scala.collection.immutable.List.foldLeft(List.scala:79)
    at org.scalacheck.Gen$.$anonfun$sequence$1(Gen.scala:490)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
    at org.scalacheck.Gen.$anonfun$flatMap$2(Gen.scala:84)
    at org.scalacheck.Gen$R.flatMap(Gen.scala:243)
    at org.scalacheck.Gen$R.flatMap$(Gen.scala:240)
    at org.scalacheck.Gen$R$$anon$3.flatMap(Gen.scala:228)
    at org.scalacheck.Gen.$anonfun$flatMap$1(Gen.scala:84)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen$.$anonfun$sized$1(Gen.scala:551)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen$$anon$1.$anonfun$doApply$1(Gen.scala:110)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$1.doApply(Gen.scala:109)
    at org.scalacheck.Gen.$anonfun$map$1(Gen.scala:79)
    at org.scalacheck.Gen$Parameters.useInitialSeed(Gen.scala:318)
    at org.scalacheck.Gen$$anon$5.doApply(Gen.scala:255)
    at org.scalacheck.Gen.sample(Gen.scala:154)

我的代码有什么问题,什么对我来说是最好的自我诊断方法?

作为注释,我已经看到有关Gen.oneOf严格并且需要Gen.lzy用于递归结构的评论。但是,如果在我的代码中将gen_val的定义包装在Gen.lzy(...)内,那么我将得到堆栈溢出,而不是当前的空指针异常。

iCMS 回答:在Scalacheck中生成递归结构

首先,请小心使用object Main extends App。我发现它的字段初始化语义不如带有行后行语义的普通main显而易见:

object Main {
  def main(args: Array[String]): Unit = {...}
}

NullPointerException可能有问题。

通常,可以通过仔细检查字段的初始化顺序并将某些(或全部)val标记为lazy来解决此问题。

StackOverflowError的产生是由于生成的数据结构太深。

通常,当您处理任何类型的递归时,请始终考虑基本情况(当递归停止时)和步骤(最终会导致基本情况

在您的特定情况下,我们可以利用Gen.sizedGen.resize来负责生成“大”元素的方式(请查看docs和google以获取更多信息)。

package com.dtci.data.anonymize.parquet

import java.nio.charset.StandardCharsets
import org.scalacheck.Gen

object Main extends App {

  sealed trait Val
  case class Bin(bytes: Array[Byte]) extends Val
  object Bin {
    def from_string(str: String): Bin = Bin(str.getBytes(StandardCharsets.UTF_8))
  }
  case class Row(flds: List[(String,Val)]) extends Val

  val gen_bin = Gen.alphaStr.map(Bin.from_string)
  val gen_field_name = Gen.alphaLowerStr
  val gen_field = Gen.zip(gen_field_name,gen_val)
  val gen_row = Gen.sized(size => Gen.resize(size / 2,Gen.nonEmptyListOf(gen_field).map(Row.apply)))

  def gen_val: Gen[Val] = Gen.sized { size =>
    if (size <= 0) {
      gen_bin
    } else {
      Gen.oneOf(gen_bin,gen_row)
    }
  }

  gen_row.sample.get.flds.foreach(fld => println(s"${fld._1} --> ${fld._2}"))
}
本文链接:https://www.f2er.com/1951740.html

大家都在问