Python Pipeline Custom Transformer

2024-05-20 • 问答

我正在尝试编写一个自定义转换器，以便在管道中使用以预处理数据。

这是我正在使用的代码（来源-不是我写的）。它接收一个数据框，缩放要素，然后返回一个数据框：

class DFStandardScaler(BaseEstimator,TransformerMixin):

    def __init__(self):

        self.ss = None

    def fit(self,X,y=None):

        self.ss = StandardScaler().fit(X)
        return self

    def transform(self,X):

        Xss = self.ss.transform(X)
        Xscaled = pd.DataFrame(Xss,index=X.index,columns=X.columns)
        return Xscaled

我拥有的数据既具有分类功能又具有连续性。显然，转换器不会转换分类特征（“性”）。当我将此管道与下面的数据框配合使用时，它会引发错误，因为它试图缩放“性别”中的分类标签：

     sex  length  diameter  height  whole_weight  shucked_weight  \
0      M   0.455     0.365   0.095        0.5140          0.2245   
1      M   0.350     0.265   0.090        0.2255          0.0995   
2      F   0.530     0.420   0.135        0.6770          0.2565   
3      M   0.440     0.365   0.125        0.5160          0.2155   
4      I   0.330     0.255   0.080        0.2050          0.0895   
5      I   0.425     0.300   0.095        0.3515          0.1410

如何将分类/连续特征列表传递到转换器中，以便缩放适当的特征？还是以某种方式更好地编码转换器内部的特征类型检查？

class ColumnSelector(BaseEstimator,TransformerMixin): def __init__(self,columns: list): self.cols = columns def fit(self,X,y=None): return self def transform(self,y=None): return X.loc[:,self.cols]

selector = ColumnSelector(['length','diameter','height','whole_weight','shucked_weight']) pipe = pipeline.make_pipeline( selector,DFStandardScaler() ) pipe2 = pipeline.make_pipeline(#some steps for the sex column) full_pipeline = pipeline.make_pipeline( pipeline.make_union( pipe,pipe2 ),#some other step )

Python Pipeline Custom Transformer

wushuangbaoying 回答：Python Pipeline Custom Transformer

大家都在问