我有一个类型的数据框:
date TICKER x1 x2 ... Z Y month x3
0 1999-12-31 A UN Equity 52.1330 51.9645 ... 0.0052 NaN 12 NaN
1 1999-12-31 AA UN Equity 92.9415 92.8715 ... 0.0052 NaN 12 NaN
2 1999-12-31 ABC UN Equity 3.6843 3.6539 ... 0.0052 NaN 12 NaN
3 1999-12-31 ABF UN Equity 22.0625 21.9375 ... 0.0052 NaN 12 NaN
4 1999-12-31 ABM UN Equity 10.2188 10.1250 ... 0.0052 NaN 12 NaN
我想从 'Y ~ x1 + x2:x3'
组 ['TICKER','year','month']
的公式 statsmodels.formula.api as smf
运行 OLS 回归(年份是此处未出现的列)来自 data.groupby(['TICKER','month']).apply(lambda x: smf.ols(formula='Y ~ x1 + x2:x3',data=x))
。因此我使用:
IndexError: tuple index out of range
但是,我收到以下错误:
Traceback (most recent call last):
File "<input>",line 1,in <module>
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\pandas\core\groupby\groupby.py",line 894,in apply
result = self._python_apply_general(f,self._selected_obj)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\pandas\core\groupby\groupby.py",line 928,in _python_apply_general
keys,values,mutated = self.grouper.apply(f,data,self.axis)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\pandas\core\groupby\ops.py",line 238,in apply
res = f(group)
File "<input>",in <lambda>
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\model.py",line 195,in from_formula
mod = cls(endog,exog,*args,**kwargs)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\regression\linear_model.py",line 872,in __init__
super(OLS,self).__init__(endog,missing=missing,File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\regression\linear_model.py",line 703,in __init__
super(WLS,line 190,in __init__
super(Regressionmodel,**kwargs)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\model.py",line 237,in __init__
super(Likelihoodmodel,line 77,in __init__
self.data = self._handle_data(endog,missing,hasconst,File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\model.py",line 101,in _handle_data
data = handle_data(endog,**kwargs)
File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\data.py",line 672,in handle_data
return klass(endog,exog=exog,hasconst=hasconst,File "C:\Users\xxxx\PycharmProjects\non_parametric\venv\lib\site-packages\statsmodels\base\data.py",line 71,in __init__
arrays,nan_idx = self.handle_missing(endog,line 247,in handle_missing
if combined_nans.shape[0] != nan_mask.shape[0]:
IndexError: tuple index out of range
知道为什么吗?
完整的 tracebakc 是
{{1}}