错误PythonUDFRunner:Python worker意外退出(崩溃)

我正在运行一个调用udfs的PySpark作业。我知道udfs的内存不好,并且由于序列化/反序列化而变慢,但是由于情况的缘故,我们必须使用。

数据集为60GB,并且分区良好,群集具有240GB内存。作业可以很好地读取它并执行spark功能,但在它开始调用python udfs并出现以下错误时,它总是会失败。起初我以为是内存问题,所以我增加了节点和执行程序的内存,但问题仍然存在。该错误是什么意思,我该如何解决?

执行人日志

19/11/03 05:05:52 ERROR PythonUDFRunner: Python worker exited unexpectedly (crashed)
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:210)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:71)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasnext(PythonRunner.scala:406)
    at org.apache.spark.InterruptibleIterator.hasnext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$12.hasnext(Iterator.scala:440)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processnext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasnext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStagecodegenExec$$anonfun$13$$anon$1.hasnext(WholeStagecodegenExec.scala:636)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
19/11/03 05:05:52 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
19/11/03 05:05:52 ERROR PythonUDFRunner: This may have been caused by a prior exception:
java.net.SocketException: Broken pipe (Write failed)
    at java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
    at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
    at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:212)
    at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
    at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
    at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$2.writeIteratorToStream(PythonUDFRunner.scala:50)
    at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:345)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
    at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:194)
19/11/03 05:05:52 ERROR Executor: Exception in task 479.0 in stage 2.0 (TID 1177)
org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:486)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator$$anonfun$3.applyOrElse(PythonRunner.scala:475)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:86)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasnext(PythonRunner.scala:406)
    at org.apache.spark.InterruptibleIterator.hasnext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$12.hasnext(Iterator.scala:440)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processnext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasnext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStagecodegenExec$$anonfun$13$$anon$1.hasnext(WholeStagecodegenExec.scala:636)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:71)
    ... 20 more
19/11/03 05:05:52 ERROR Executor: Exception in task 478.0 in stage 2.0 (TID 1176)
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:210)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:71)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasnext(PythonRunner.scala:406)
    at org.apache.spark.InterruptibleIterator.hasnext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$12.hasnext(Iterator.scala:440)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processnext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasnext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStagecodegenExec$$anonfun$13$$anon$1.hasnext(WholeStagecodegenExec.scala:636)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
19/11/03 05:05:52 ERROR Executor: Exception in task 381.0 in stage 2.0 (TID 1079)
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:210)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:71)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasnext(PythonRunner.scala:406)
    at org.apache.spark.InterruptibleIterator.hasnext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$12.hasnext(Iterator.scala:440)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processnext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasnext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStagecodegenExec$$anonfun$13$$anon$1.hasnext(WholeStagecodegenExec.scala:636)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
19/11/03 05:05:52 ERROR Executor: Exception in task 472.0 in stage 2.0 (TID 1170)
java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:210)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:71)
    at org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.read(PythonUDFRunner.scala:64)
    at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasnext(PythonRunner.scala:406)
    at org.apache.spark.InterruptibleIterator.hasnext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$$anon$12.hasnext(Iterator.scala:440)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processnext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasnext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStagecodegenExec$$anonfun$13$$anon$1.hasnext(WholeStagecodegenExec.scala:636)
    at scala.collection.Iterator$$anon$11.hasnext(Iterator.scala:409)
    at org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:187)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
19/11/03 05:05:52 INFO Executor: Not reporting error to driver during JVM shutdown.
19/11/03 05:05:52 INFO Executor: Not reporting error to driver during JVM shutdown.
19/11/03 05:05:52 INFO Executor: Not reporting error to driver during JVM shutdown.
19/11/03 05:05:52 INFO Executor: Not reporting error to driver during JVM shutdown.
dongshaobo 回答:错误PythonUDFRunner:Python worker意外退出(崩溃)

该错误可能是由各种问题引起的。您必须首先找到根本原因。

首先请确保您在udf的最高级别上有一个try-except-block,并在其中记录异常。

第二次在驱动程序和工作程序代码(udf)中尽早注册import faulthandler; faulthandler.enable()之类的处理程序。如果原因是分段错误,则处理程序将打印一个堆栈跟踪,在日志中可见。

这两种方法都可以帮助您了解根本问题。

本文链接:https://www.f2er.com/3127787.html

大家都在问