从包含CSV文件的文件夹中过滤Power Query中的多个条件的方法

我需要纠正或建议查询的帮助,我正在从CSV格式的文件夹中获取数据。 请提前警告:我不知道如何在短时间内写这篇文章。

首先要提供的信息是

  • 工具仅限于Power Query,Excel,VBA

  • 数据查询每月运行一次,因此加载时间较长不是一个大问题,尽管通常更短的时间更可取

  • 我选择了Power Query方法,因为必须在另一个Excel文件中使用源数据,但是要使用不同的规则集(这是我当前问题的一部分)。
  • 我的代码的基本问题是它可以运行很长时间,需要满足大量条件,并且出于其他原因/工具/文件,我必须使用类似的方法。我希望人们只需按“刷新”即可获取所需的信息。

说明:

我在一个文件夹中的CSV文件中有数据源。命名约定不存在,因为有多个人负责从系统中导出数据。因此,我在PQ中使用了文件夹选项。

数据大小当前约为400-600 MB。列的名称可能正在更改,这是M代码中要绕过的第一行。

我的主要奋斗是:

有几个条件需要实现。我不想编写多个if语句,因为代码会变得很丑陋,而且条件的数量只有十分之一,而且跨越多列。因此,我实现了一个转换表(我们称其为TT),该表具有所有可以使用过滤的列,该TT的最后一列是所有列的串联。如果在某种情况下我不在乎其中一列,则用通配符“ *”填充它。

所以TT可能看起来像这样:

| PC | CLIENT | FN  | TC | STRING      |

|----|--------|-----|----|-------------|

| 11 | *      | NEW | AC | 11*NEWAC    |

| 47 | 000001 | NEW | *  | 47000001NEW*|

等...

PC是PoC,FN是FUNCTION,TC是交易代码(在下面的代码中)。

然后在代码中,我将通配符替换为来自PQ的适当列的值,并检查TT中是否包含来自PQ中相同列的串联字符串(将最后一列制成列表)。 下面的代码可用于更简单的解决方案,但是它经过了硬编码,因为我想知道是否有可能。

数据更新后,我运行VBA宏以将数据附加到“数据库”表中(ofc检查是否存在现有值),以便可以最大程度地减少数据负载。因此,使用了第一部分代码。 基本上,我可以将代码分为三部分:

  1. 基本转换:从文件夹加载,删除非常规名称,并检查其他文件夹是否包含相同的命名文件以最小化加载。
  2. 过滤数据:包括将PQ表与TT表合并,用正确的列替换通配符,然后创建过滤字符串以检查串联的PQ表中的文本是否至少包含TT列表中的一个值。
  3. 对使用过的数据进行最终转换以获取我需要的信息(主要是关于从市场上延迟清算)

带有注释的整个M代码

let
    /*Here starts basic data transformation to limit errors in CSV files due to 
    different conventions */
    Source = Folder.Files(source),#"Uppercased Text1" = Table.TransformColumns(Source,{{"Name",Text.Upper,type text}}),#"Merged Queries2" = Table.nestedJoin(#"Uppercased Text1",{"Name"},q_Archive,"q_Archive",JoinKind.LeftAnti),#"Added Custom" = Table.AddColumn(#"Merged Queries2","Data",each Csv.Document(File.Contents([Folder Path] & "\" & [Name]),[Delimiter=";",Encoding = 1252,QuoteStyle = QuoteStyle.None])),#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Data"}),#"Added Custom1" = Table.AddColumn(#"Removed Other Columns","Table",each Table.PromoteHeaders([Data])),#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Table"}),#"Added Custom2" = Table.AddColumn(#"Removed Other Columns1","Upper",each Table.TransformColumnNames([Table],Text.Upper)),#"Removed Other Columns2" = Table.SelectColumns(#"Added Custom2",{"Upper"}),#"Expanded Upper" = Table.ExpandTableColumn(#"Removed Other Columns2",{"19A AMOUNT","19A CURRENCY CODE","35B ISIN","CLIENT","EXP.SETTL.DATE","FUNCTION","INSTR.ID","MESSAGE FUNCTION","POC","RECEPTION DATE","SETTL.AMOUNT","SETTL.CUR.","TRANSactION CODE"},"TRANSactION CODE"}),#"Renamed Columns1" = Table.RenameColumns(#"Expanded Upper",{{"SETTL.AMOUNT","SETTL.AMOUNT2"},{"SETTL.CUR.","SETTL.CUR.2"},{"19A CURRENCY CODE","19A CURRENCY CODE2"},"19A AMOUNT2"}}),#"Added Custom10" = Table.AddColumn(#"Renamed Columns1","19A AMOUNT",each if[SETTL.AMOUNT2]=null then [19A AMOUNT2] else [SETTL.AMOUNT2]),#"Added Custom11" = Table.AddColumn(#"Added Custom10",each if [SETTL.CUR.2] = null then [19A CURRENCY CODE2] else [SETTL.CUR.2]),#"Renamed Columns" = Table.RenameColumns(#"Added Custom11",{{"FUNCTION","FUNCTION2"}}),#"Added Custom8" = Table.AddColumn(#"Renamed Columns",each if[FUNCTION2]=null then [MESSAGE FUNCTION] else[FUNCTION2]),#"Removed Other Columns3" = Table.SelectColumns(#"Added Custom8",{"35B ISIN","TRANSactION CODE","FUNCTION"}),#"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns3",{"POC","19A CURRENCY CODE"}),#"Replaced Value" = Table.ReplaceValue(#"Reordered Columns","""","",Replacer.ReplaceText,"35B ISIN"}),#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","=",#"Uppercased Text" = Table.TransformColumns(#"Replaced Value1",{{"POC",type text},{"CLIENT",{"FUNCTION",{"TRANSactION CODE",#"Filtered Rows" = Table.SelectRows(#"Uppercased Text",each ([FUNCTION] = "NEWM")),#"Merged Queries" = Table.nestedJoin(#"Filtered Rows",{"POC"},tbl_setup_pocList,{"PocList"},"tbl_setup_pocList",JoinKind.Inner),#"Removed Columns" = Table.RemoveColumns(#"Merged Queries",{"tbl_setup_pocList"}),/* Here ends the data transformation part
   and the part for list transformations start*/
        #"Added condition" = Table.AddColumn(#"Removed Columns","COND",each (
            ((Table.FromRecords({
                [PC = List.ReplaceValue(Table.Column(tbl_filtering_string,"POC"),"*",[POC],Replacer.ReplaceText),CL = List.ReplaceValue(Table.Column(tbl_filtering_string,"CLIENT"),[CLIENT],FN = List.ReplaceValue(Table.Column(tbl_filtering_string,"FUNCTION"),[FUNCTION],TC = List.ReplaceValue(Table.Column(tbl_filtering_string,"TRANSactION CODE"),[TRANSactION CODE],Replacer.ReplaceText)]}
            ))))),#"Expanded COND" = Table.ExpandTableColumn(#"Added condition",{"PC","CL","FN","TC"},"TC"}),#"Added Custom3" = Table.AddColumn(#"Expanded COND","Test",each (List.Combine(
            {
                {_[PC]},{_[CL]},{_[FN]},{_[TC]}
            }
        ))),#"Expanded Test" = Table.AddColumn(#"Added Custom3","Test2",each (Table.FromColumns(_[Test],null))),#"Removed Columns2" = Table.RemoveColumns(#"Expanded Test","TC","Test"}),#"Added Custom4" = Table.AddColumn(#"Removed Columns2","String",each Table.ToList([Test2],Combiner.CombineTextByDelimiter(""))),#"Removed Columns3" = Table.RemoveColumns(#"Added Custom4",{"Test2"}),#"Added Custom6" = Table.AddColumn(#"Removed Columns3","CONTAIN_STR",each [POC]&[CLIENT]&[FUNCTION]&[TRANSactION CODE]),#"Added Custom5" = Table.AddColumn(#"Added Custom6","Cond",each List.Contains(_[String],[CONTAIN_STR])),#"Filtered Rows1" = Table.SelectRows(#"Added Custom5",each ([Cond] = false)),/*Here the code for filtering ends and final transformations occur */

        #"Removed Columns4" = Table.RemoveColumns(#"Filtered Rows1",{"String","Cond"}),#"Merged Queries1" = Table.nestedJoin(#"Removed Columns4",tbl_setup_exotics,{"Exotic_PoC"},"tbl_setup_exotics",JoinKind.LeftOuter),#"Expanded tbl_setup_exotics" = Table.ExpandTableColumn(#"Merged Queries1",{"Exotic_PoC"}),#"Replaced Value2" = Table.ReplaceValue(#"Expanded tbl_setup_exotics",null,"Non Exotic",Replacer.ReplaceValue,#"Removed Errors" = Table.RemoveRowsWithErrors(#"Replaced Value2",{"EXP.SETTL.DATE","RECEPTION DATE"}),#"Changed Type" = Table.TransformColumnTypes(#"Removed Errors",{{"EXP.SETTL.DATE",type date},{"RECEPTION DATE",type date}}),#"Added Custom7" = Table.AddColumn(#"Changed Type","RD",each (if [Exotic_PoC] <> "Non Exotic" then Date.AddDays([RECEPTION DATE],1)else [RECEPTION DATE])),#"Filtered Rows2" = Table.AddColumn(#"Added Custom7","LB",each if [RD]>=[EXP.SETTL.DATE] then "Late" else "Not"),#"Added Custom9" = Table.AddColumn(#"Filtered Rows2","DAYS_LATE",each [RD]-[EXP.SETTL.DATE]),#"Inserted Year" = Table.AddColumn(#"Added Custom9","Year",each Date.Year([EXP.SETTL.DATE]),Int64.Type),#"Inserted Month" = Table.AddColumn(#"Inserted Year","Month",each Date.Month([EXP.SETTL.DATE]),#"Changed Type1" = Table.TransformColumnTypes(#"Inserted Month",{{"19A AMOUNT",type number}}),#"Grouped Rows" = Table.Group(#"Changed Type1",{"Year","LB"},{{"Count",each Table.RowCount(_),type number},{"Countervalue",each List.Sum([19A AMOUNT]),{"ISIN",each Text.Combine([35B ISIN],";"),{"INSTR.ID",each Text.Combine([INSTR.ID],#"Merged Queries3" = Table.nestedJoin(#"Grouped Rows","19A CURRENCY CODE"},q_Xrates,"Currency"},"q_Xrates",#"Expanded q_Xrates" = Table.ExpandTableColumn(#"Merged Queries3",{"Rate"},{"Rate"}),#"Replaced Value3" = Table.ReplaceValue(#"Expanded q_Xrates",1,#"Added Col" = Table.AddColumn(#"Replaced Value3","CV",each [Countervalue]/[Rate]),#"Remove Countervalue" = Table.RemoveColumns(#"Added Col",{"Countervalue"})
    in
        #"Remove Countervalue"

问题

  1. 我知道这种方法听起来过于复杂,但可以使其正常工作(不幸的是,刷新需要很长时间)。但是真的好吗?考虑到开始提到的有限的工具使用方法,还有其他选择吗?
  2. 如何使这段代码更好?我相信它可能会部分重现功能,但是由于我是PQ的初学者,所以我无法想象如何。
  3. 对于相同的源数据,如何使用相同的方法却具有更大的复杂性?您可以理解,随着更多列添加到过滤字符串中。
  4. 您还有其他建议吗?

结束评论

  • 我现在非常绝望,有时我的书面文字可能令人困惑。
  • 提供任何一种Visio图表来以更图形化的方式显示我的逻辑(我对此比较熟悉)以及关系概述,我没有任何问题。
  • 我也没有提供匿名数据的问题(因为它可能是部分机密的)。如果您需要该服务,请参阅首选服务。
  • 如果我被推向正确的方向,我不介意编写代码。对于该问题,#1是优先事项。因此,从根本上讲,这种好方法是否可以针对另一个相同但更复杂的目的轻松调整?

非常感谢您的光临。

*/ MK */
coder002 回答:从包含CSV文件的文件夹中过滤Power Query中的多个条件的方法

如果要执行此操作,我将编写一个将过滤条件表编译为函数的函数,然后将其与Table.SelectRows一起应用。

// Compile the condition table into a function that can be applied in row filtering.
filterCondition = compileFilterConditionTable(tbl_filtering_string),#"Filtered Rows" = Table.SelectRows(#"Table after Preceding Steps",filterCondition)

这看起来更容易追踪步骤吗?

下面是将条件表编译成逻辑函数的函数示例代码。我不确定这对您的情况是否正确,因为我不完全了解该要求。

compileFilterConditionTable =

    let compileFilterConditionTable = (filterConditionTable as table) as function =>
            let recordConditions = List.Transform(
                    Table.ToRecords(filterConditionTable),compileFilterConditionRecord)
            in applyCombine(recordConditions,List.AnyTrue),compileFilterConditionRecord = (cond as record) as function =>
            let fieldNameValues = List.Transform(
                    Record.FieldNames(cond),each [Name = _,Value = Record.Field(cond,Name)]
                ),fieldConditions = List.Transform(fieldNameValues,compileFieldCondition)
            in applyCombine(fieldConditions,List.AllTrue),compileFieldCondition = (fieldNameValue as record) as function =>
            let name = fieldNameValue[Name],value = fieldNameValue[Value]
            in
                if value = "*" then (record as record) as logical => true
                else (record as record) as logical => Record.Field(record,name) = value,applyCombine = (functions as list,combiner as function) as function =>
            (value) => combiner(List.Transform(functions,(f) => f(value)))

    in compileFilterConditionTable

无论如何,M是一种功能编程语言,因此以功能方式思考和编码它会有所帮助。将整个逻辑分为几个小部分,这样每个小部分都将很容易理解。将您的代码编写为可重用的小函数,然后将它们组合以构建整体。

本文链接:https://www.f2er.com/3084520.html

大家都在问