从文档中
You can use windowing with fixed-size data sets in bounded PCollections. However,note that windowing considers only the implicit timestamps attached to each element of a PCollection,and data sources that create fixed data sets (such as TextIO) assign the same timestamp to every element. This means that all the elements are by default part of a single,global window.
To use windowing with fixed data sets,you can assign your own timestamps to each element. To assign timestamps to elements,use a ParDo transform with a DoFn that outputs each element with a new timestamp (for example,the WithTimestamps transform in the Beam SDK for Java).
以下是如何为边界数据集定义窗口的示例-https://beam.apache.org/get-started/wordcount-example/#unbounded-and-bounded-datasets
还从文档中一次读取了有界数据集,并按如下所述分配了全局窗口
The bounded (or unbounded) nature of your PCollection affects how Beam processes your data. A bounded PCollection can be processed using a batch job,which might read the entire data set once,and perform processing in a job of finite length. An unbounded PCollection must be processed using a streaming job that runs continuously,as the entire collection can never be available for processing at any one time.
我相信要解决您的问题,您可以尝试为有界集合设置一个窗口,然后尝试在Cloud Dataflow上运行它以查看是否可行。您还可以参考光束功能矩阵,以从Spark Runner角度查看支持的功能。
本文链接:https://www.f2er.com/2944116.html