SparkRDD.foreach()无法使用BlobClient将文件上传到Azure Blob存储-身份验证错误

我陷入了一个非常奇怪的问题。我有一个火花RDD(键值),想将RDD的每个条目存储到Azure Blob存储上的单独文件中。代码如下:

from azure.storage.blob import (
   BlobServiceclient,BlobClient
)

def save_blob(kv):
    CONNECTION_STRING = 'DefaultEndpointsProtocol=https;accountName=example-account;accountKey=very-very-secret;EndpointSuffix=core.windows.net'
    CLIENT = 'client'
    INDEX_DIR = f'{CLIENT}/index/'
    CONTAINER_NAME = 'rawdata'

    blob_service_client = BlobServiceclient.from_connection_string(CONNECTION_STRING)
    blob_client = blob_service_client.get_blob_client(CONTAINER_NAME,f'{INDEX_DIR}{kv[0]}.json')
    return blob_client.upload_blob(kv[1],blob_type="BlockBlob")

我已将其原始字符串中的所有变量都移到save_blob()函数中,以确保在工作程序上不存在变量绝对没有问题。如果我理解正确,则应该在此处设置所有值。

执行以下操作完全正常,将创建一个blob文件。

kv = (0,'{"id": "whatever"}')
save_blob(kv)

但是,一旦我在RDD上运行它,它的数据格式为TUPLE[int,DICT[str,any]]],就运行以下代码:

my_rdd.foreach(lambda kv: save_blob(kv))

给出以下形式的身份验证错误:

azure.storage.blob._generated.models._models_py3.StorageErrorException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
....
azure.core.exceptions.ClientAuthenticationError: Server failed to authenticate the request. 
Make sure the value of Authorization header is formed correctly including the signature.
RequestId:5a6d54c5-701e-00fa-46b8-953d31000000
Time:2019-11-07T22:11:57.3636552Z
ErrorCode:AuthenticationFailed
Error:None
AuthenticationErrorDetail:The MAC signature found in the HTTP request '0p+pmV5/FyugC0RJKnKwn+DwBOMEFXJ6BuGMiVNU38k=' is not the same as any computed signature. Server used following string to sign: 'PUT


971

application/octet-stream



*


x-ms-blob-type:BlockBlob
x-ms-client-request-id:9cffe814-01ab-11ea-aa40-00163e786b5f
x-ms-date:Thu,07 Nov 2019 22:11:57 GMT
x-ms-version:2019-02-02
/example-container/path/client_20d%2Findex%2F0.json'.

任何人都知道为什么我似乎能够运行代码,只要它不是在RDD中发生的?我很茫然...

谢谢! -汤姆

PS:使用rdd.toDF("column1","column2")将RDD转换为DataFrame然后再使用df.write.PartitionBy("column1").json(...)不起作用,因为嵌套字典似乎无法正确转换。

lkjgasdlkjfsd 回答:SparkRDD.foreach()无法使用BlobClient将文件上传到Azure Blob存储-身份验证错误

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3141275.html

大家都在问