pycharm上pyspark项目的文件夹结构:
TEST
TEST (marked as sources root)
com
earl
test
pysprk
utils
utilities.py
test_main.py
test_main.py具有:
from _ast import arg
__author__ = "earl"
from pyspark.sql.functions import to_json,struct,lit
from com.earl.test.pyspark.utils.utilities import *
import sys
utilities.py具有:
__author__ = "earl"
from py4j.protocol import Py4JJavaError
from pyspark.sql import SparkSession
import sys
在PyCharm上,我通过运行test_main.py
执行代码,该代码绝对可以正常工作。从utilities.py
调用函数并完美执行。我在PyCharm上将Run -> Edit Configurations -> Parameters
设置为D:\Users\input\test.json localhost:9092
,并分别使用sys.argv[1]
和sys.argv[2]
,
火花提交命令:
spark-submit --master local --conf spark.sparkContext.setLogLevel=WARN --name test D:\Users\earl\com\earl\test\pyspark\test_main.py --files D:\Users\earl\com\test\pyspark\utils\utilities.py D:\Users\input\test.json localhost:9092
错误:
Traceback (most recent call last):
File "D:\Users\earl\com\earl\test\pyspark\test_main.py",line 5,in <module>
from com.earl.test.pyspark.utils.utilities import *
ModuleNotFoundError: No module named 'com'