除少数节点外，如何运行管道？

2024-05-21 • 问答

我想为不同的文件运行管道，但是其中一些不需要所有已定义的节点。我怎样才能通过他们？

modular pipelines在这里有帮助吗？您可以构建两个管道，一个管道仅由两个“可选”节点组成，另一个不包含，然后您可以返回默认管道即这两个管道的总和。像这样的东西：

def create_pipelines(**kwargs):
    two_node_pipeline = Pipeline(node(),node())
    rest_of_pipeline = Pipeline(node(),node(),node())

    return {
        "rest_of_pipeline": rest_of_pipeline,"__default__": two_node_pipeline + rest_of_pipeline,}

然后，您可以执行kedro run --pipeline rest_of_pipeline来在没有这两个节点的情况下运行管道，或者执行kedro run以在另外两个节点的情况下运行管道。

否则，我想如果您修改kedro_cli或ProjectContext或run.py，不管它是什么，都应该很容易自己添加--except功能。我可能会考虑这样做...

根据toposort，Kedro将自动对节点进行排序，请参见先前的答案：How to run the nodes in sequence as declared in kedro pipeline?

要过滤出管道的几行内容，您可以简单地从python内部过滤管道列表，我最喜欢的方法是使用列表理解。

按名称

nodes_to_run = [node for node in pipeline.nodes if 'dont_run_me' not in node.name]
run(nodes_to_run,io)

按标签

nodes_to_run = [node for node in pipeline.nodes if 'dont_run_tag' not in node.tags]
run(nodes_to_run,io)

可以按与流水线节点绑定的任何属性（名称，输入，输出，short_name，标签）进行过滤

如果需要在生产中或从命令行以这种方式运行管道，则可以标记管道以使用标记运行，也可以在click.option函数内部添加自定义run kedro_cli.py，然后在标志为True时运行此过滤器。

注意假设您已将管道以pipeline的形式加载到内存中，并将目录以io的形式加载到内存中

您还可以使用--to-nodes CLI选项：kedro run --to-nodes node1,node2。在内部，这将称为pipeline.to_nodes("node1","node2")-method docs。请注意，您仍然需要标识必须运行的节点的“最终”列表。

除少数节点外，如何运行管道？

zzlxf85 回答：除少数节点外，如何运行管道？

大家都在问