Pyspark主题建模任务失败;无法解释错误日志

以下几行代码。我将包含更多内容,但我怀疑该错误是由于我的环境而不是代码引起的。紧接着this tutorial,但我使用的是不同的数据和不同版本的Spark。

def topic_render(topic,vocabArray):
    terms = topic[0]
    result = []
    for i in range(0,5):
        term = vocabArray[terms[i]]
        result.append(term)
    return result

lda_model = LDA.train(result_tfidf[['index','features']]
            .rdd.mapValues(Vectors.fromml)
            .map(list),k=10,maxIterations=100)
topicIndices = spark.sparkContext.parallelize(lda_model.describeTopics(maxTermsPerTopic = 5))
#The above line passes
topics_final = topicIndices.map(lambda topic: topic_render(topic,vocabArray)).collect()
#Crashes on this line; error log incomprehensible

下面是日志输出的几行(这确实很长,并且大部分只是重复此部分)。很难理解发生了什么问题-我认为我不需要在hadoop二进制路径中使用winutils二进制文件,也不需要本机-hadoop库中的东西,因为每次我在Spark中执行任何操作时都会看到这些错误,并且从未引起过以前有问题。

19/11/03 16:21:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/03 16:21:14 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR,use setLogLevel(newLevel).
19/11/03 16:21:21 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
[Stage 0:>                                                         (0 + 4) / 56]19/11/03 16:21:32 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.security.SecurityUtil.getauthenticationmethod(SecurityUtil.java:611)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
        at org.apache.hadoop.security.UserGroupInformation.getcurrentUser(UserGroupInformation.java:634)
        at org.apache.spark.util.Utils$$anonfun$getcurrentusername$1.apply(Utils.scala:2422)
        at org.apache.spark.util.Utils$$anonfun$getcurrentusername$1.apply(Utils.scala:2422)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.getcurrentusername(Utils.scala:2422)
        at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
        at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runmain(SparkSubmit.scala:774)
        at org.apache.spark.deploy.SparkSubmit.doRunmain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/03 16:21:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR,use setLogLevel(newLevel).
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
[Stage 0:>                                                         (0 + 4) / 56]19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
[Stage 0:>                                                         (0 + 4) / 56]
ief111111 回答:Pyspark主题建模任务失败;无法解释错误日志

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3169923.html

大家都在问