具有各种列条件的Pandas Dataframe sum函数

此函数基于5个不同的标准(开始日期,结束日期,资金,帐户和分析)对数据框中的行求和:

df = pd.DataFrame(
    [
    ['02-09-2019',20190902,20.00,'F1','B1','I2'],['23-09-2019',20190923,237.36,'I1'],['15-11-2019',20191115,200.00,['16-11-2019',20191116,2045.00,['05-05-2020',20200505,205.00,'F2','B2',],columns= ['Datestr','Datenum','Cost','Fund','account','Analysis'])


def per_sum(startdate,enddate,fund,account,analysis):
    return df[(df.Datenum > startdate) &
              (df.Datenum < enddate) &
              (df.Fund == fund) &
              (df.account == account) &
              (df.Analysis == analysis)
              ].Cost.sum()


per_sum(20190000,20200000,'I1')

如果未提供资金,帐户或分析数据,我该如何调整此功能以继续总成本。

例如: 如果我想在所有资金和帐户中找到总的分析“ I2”。

这种事情不起作用:

per_sum(20190000,'','I2')

谢谢

liqzone 回答:具有各种列条件的Pandas Dataframe sum函数

想法由|链接为bitwise OR个新概念,以按空格进行比较:

def per_sum(startdate,enddate,fund,account,analysis):
    return df[(df.Datenum > startdate) &
              (df.Datenum < enddate) &
              ((df.Fund == fund) | (fund == '')) &
              ((df.Account == account) | (account == '')) &
              ((df.Analysis == analysis) | (analysis == ''))
              ].Cost.sum()

print(per_sum(20190000,20200000,'',''))
2502.36

print(per_sum(20190000,'I2'))
2065.0

编辑:

如果还希望使用日期时间进行过滤,一种可能的解决方案是添加if-else语句以更改日期开始和结束日期时间:

def per_sum(startdate,analysis):
    startdate = -np.inf if startdate == '' else startdate
    enddate = np.inf if enddate == '' else enddate
    return df[(df.Datenum > startdate) &
              (df.Datenum < enddate) &
              ((df.Fund == fund) | (fund == '')) &
              ((df.Account == account) | (account == '')) &
              ((df.Analysis == analysis) | (analysis == ''))
              ].Cost.sum()

print(per_sum('',''))
2707.36
,

这可能不是很优雅,但透明且万无一失:

def per_sum_2(startdate,fund = None,account=None,analysis=None):

    df2 = df[(df.Datenum > startdate) &
              (df.Datenum < enddate) ]
    if not fund is None:
        df2 = df2[df2.Fund == fund]
    if not account is None:
        df2 = df2[df2.Account == account]
    if not analysis is None:
        df2 = df2[df2.Analysis == analysis]

    return df2.Cost.sum()

per_sum_2(20190000,analysis='I2')

2065.0
,

此:

per_sum(20190000,'I2') 

不起作用,因为“”不是适合该列所有案例/值的通配符。您可以使用正则表达式来匹配列值的所有值。

您可以将函数声明更改为包含各列的默认值,因此,当您要忽略它们时,在调用函数时不要给它提供参数。

def per_sum(startdate,fund='somepattern',account='otherpattern',analysis):
本文链接:https://www.f2er.com/2796785.html

大家都在问