我正在尝试根据以下github存储库重现以下架构:https://github.com/Azure/cortana-intelligence-price-optimization
问题是与ADF链接的部分,因为在指南中它使用了旧版的ADF:我不知道如何在ADF v2中映射单个活动的“输入”和“输出”属性,因此他们指向数据集。
管道执行的spark活动只不过是执行python脚本,因此我认为它应该将数据写入我已经定义的数据集中。
这是指南内的ADF V1管道的json,我无法复制:
"activities": [
{
"type": "HDInsightSpark","typeProperties": {
"rootPath": "adflibs","entryFilePath": "Sales_Data_Aggregation_2.0_blob.py","arguments": [ "modelsample" ],"getDebugInfo": "Always"
},"outputs": [
{
"name": "BlobStoreAggOutput"
}
],"policy": {
"timeout": "00:30:00","concurrency": 1,"retry": 1
},"scheduler": {
"frequency": "Hour","interval": 1
},"name": "AggDataSparkJob","description": "Submits a Spark Job","linkedServiceName": "HDInsightLinkedService"
},