我的scikit学习管道是线性回归,但是我的问题域永远都不应允许输出高于某个特定值,因此我想限制输出。我知道我可以先运行我的模型,然后执行y = np.where(y > MAX_VALUE,MAX_VALUE,y)
,但是我可能还要执行更多类似的自定义逻辑步骤,因此我想将其封装到管道中。 sklearn可以做到吗?
我尝试编写自己的自定义估算器,如下所示,该错误给出了All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough'
,因此管道似乎不允许在LinearRegression
步骤之后加上步骤。有办法吗?
import sklearn.base
class ClipoutputToMaxValue(sklearn.base.BaseEstimator,sklearn.base.RegressorMixin):
'''custom step to put a ceiling on the output 'y' value```
def fit(self,X,y):
self.max_y = y.max()
return self
def predict(self,y):
y = np.where(y > self.max_y,self.max_y,y)
return y
model = sklearn.pipeline.Pipeline(
[('scaler',sklearn.preprocessing.StandardScaler()),('model',sklearn.linear_model.LinearRegression()),('clipper',ClipoutputToMaxValue()) #THIS IS WHAT I WANT BUT
#SKLEARN DOEsn'T ALLOW ME TO DO THIS
])
x = np.arange(10).reshape(10,-1)
y = x
model.fit(x,y)
model.predict(x)
但是由于以下错误而失败:
TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LinearRegression(copy_X=True,fit_intercept=True,n_jobs=None,normalize=False)' (type <class 'sklearn.linear_model.base.LinearRegression'>) doesn't
还有其他方法吗?