在运行foreachPartition时引发错误

我创建了一个alter table语句列表。当我在循环中执行alter语句时,会花费更多时间。

我想提高查询的性能。所以我遵循以下方法,在列表上创建了RDD并执行了rdd.foreachPartition。我的意图是在工作节点上并行运行每个alter语句。我的错误越来越少了。

由以下原因引起:java.io.IOException:org.apache.spark.SparkException:无法获得broadcast_16的broadcast_16_piece0

我的方法正确吗?还是我错过了什么?

下面是代码。

scala> list1.foreach(println)
ALTER TABLE stk000h.csd_item_visit_dly_spark ADD IF NOT EXISTS PARTITION (rpt_dt='2018-01-01',op_cmpny_cd='WMT.COM',tenant_ste_cd='US_ANDROID') LOCATION '/user/hive/userdbs/stk000h.db/csd_item_visit_dly_spark/rpt_dt=2018-01-01/op_cmpny_cd=WMT.COM/tenant_ste_cd=US_ANDROID'
ALTER TABLE stk000h.csd_item_visit_dly_spark ADD IF NOT EXISTS PARTITION (rpt_dt='2018-01-01',tenant_ste_cd='US_CORESITE') LOCATION '/user/hive/userdbs/stk000h.db/csd_item_visit_dly_spark/rpt_dt=2018-01-01/op_cmpny_cd=WMT.COM/tenant_ste_cd=US_CORESITE'
.
.
.
.

sccala>val rdd=spark.sparkContext.parallelize(list1).repartition(200)

scala> rdd.foreachPartition({ x=>
     | x.foreach {e=>
     | println(e)
     | val conf = new SparkConf().setMaster("local[2]").set("spark.cleaner.ttl","10000").set("spark.scheduler.mode","FAIR")
     | val ctx = new SparkContext(conf)
     | val sqlContext = new SQLContext(ctx)
     | sqlContext.sql(e)
     | }
     | })
warning: there was one deprecation warning; re-run with -deprecation for details
20/03/31 06:38:20 INFO SparkContext: Starting job: foreachPartition at <console>:68
20/03/31 06:38:20 INFO DAGScheduler: Registering RDD 31 (repartition at <console>:63)
20/03/31 06:38:20 INFO DAGScheduler: Got job 15 (foreachPartition at <console>:68) with 200 output partitions
20/03/31 06:38:20 INFO DAGScheduler: Final stage: ResultStage 27 (foreachPartition at <console>:68)
20/03/31 06:38:20 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 26)
20/03/31 06:38:20 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 26)
20/03/31 06:38:20 INFO DAGScheduler: Submitting ShuffleMapStage 26 (MapPartitionsRDD[31] at repartition at <console>:63),which has no missing parents
20/03/31 06:38:20 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 3.0 KB,free 912.3 MB)
20/03/31 06:38:20 WARN DFSClient: Slow ReadProcessor read fields took 46978ms (threshold=30000ms); ack: seqno: 104 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 611418 flag: 0 flag: 0 flag: 0,targets: [DatanodeInfoWithStorage[10.72.153.11:1019,DS-3c71a15e-03e6-4642-ac2b-77fd71f3d5cb,DISK],DatanodeInfoWithStorage[10.72.150.27:1019,DS-b6c0678b-e097-4faa-bb24-ddc6cda17b65,DatanodeInfoWithStorage[10.72.150.24:1019,DS-bfd85c92-2bb6-4a65-bf59-b661f01605bd,DISK]]
20/03/31 06:38:20 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 1928.0 B,free 912.3 MB)
20/03/31 06:38:20 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 10.24.84.156:40819 (size: 1928.0 B,free: 912.3 MB)
20/03/31 06:38:20 INFO SparkContext: Created broadcast 17 from broadcast at DAGScheduler.scala:1006
20/03/31 06:38:20 INFO DAGScheduler: Submitting 140 missing tasks from ShuffleMapStage 26 (MapPartitionsRDD[31] at repartition at <console>:63) (first 15 tasks are for partitions Vector(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14))
20/03/31 06:38:20 INFO YarnScheduler: Adding task set 26.0 with 140 tasks
20/03/31 06:38:20 INFO TaskSetManager: Starting task 0.0 in stage 26.0 (TID 1969,osec400684.wal-mart.com,executor 21,partition 0,PROCESS_LOCAL,4949 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 1.0 in stage 26.0 (TID 1970,osec400925.wal-mart.com,executor 9,partition 1,4950 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 2.0 in stage 26.0 (TID 1971,osec401251.wal-mart.com,executor 27,partition 2,4948 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 3.0 in stage 26.0 (TID 1972,osec400288.wal-mart.com,executor 23,partition 3,4947 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 4.0 in stage 26.0 (TID 1973,osec401186.wal-mart.com,executor 3,partition 4,5070 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 5.0 in stage 26.0 (TID 1974,osec400442.wal-mart.com,executor 5,partition 5,4948 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 6.0 in stage 26.0 (TID 1975,osec401331.wal-mart.com,executor 7,partition 6,4947 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 7.0 in stage 26.0 (TID 1976,osec400899.wal-mart.com,executor 12,partition 7,4949 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 8.0 in stage 26.0 (TID 1977,oser405773.wal-mart.com,executor 14,partition 8,4950 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 9.0 in stage 26.0 (TID 1978,osec400603.wal-mart.com,executor 11,partition 9,5066 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 10.0 in stage 26.0 (TID 1979,osec400223.wal-mart.com,executor 32,partition 10,4949 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 11.0 in stage 26.0 (TID 1980,oser405320.wal-mart.com,executor 10,partition 11,4950 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 12.0 in stage 26.0 (TID 1981,partition 12,4950 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 13.0 in stage 26.0 (TID 1982,partition 13,5065 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 14.0 in stage 26.0 (TID 1983,partition 14,4952 bytes)
20/03/31 06:38:20 INFO TaskSetManager: Starting task 15.0 in stage 26.0 (TID 1984,partition 15,4951 bytes)
20/03/31 06:38:20 WARN TaskSetManager: Lost task 13.0 in stage 26.0 (TID 1982,executor 5): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_17_piece0 of broadcast_17
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1310)
    at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206)
    at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
    at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
    at org.apache.spark.broadcast.TorrentBroadcast.getvalue(TorrentBroadcast.scala:96)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:86)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_17_piece0 of broadcast_17
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:178)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:150)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:150)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:150)
    at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:222)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
    ... 12 more
zuochang0529 回答:在运行foreachPartition时引发错误

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/2545904.html

大家都在问