我使用逻辑回归和spark-ml管道训练一个简单的CrossValidatorModel.我可以预测新数据,但我想超越黑盒子并对系数进行一些分析
val lr = new LogisticRegression(). setFitIntercept(true). setMaxIter(maxIter). setElasticNetParam(alpha). setStandardization(true). setFamily("binomial"). setWeightCol("weight"). setFeaturesCol("features"). setLabelCol("response") val assembler = new VectorAssembler(). setInputCols(Array("feat1","feat2")). setOutputCol("features") val modelPipeline = new Pipeline(). setStages(Array(assembler,lr)) val evaluator = new BinaryClassificationEvaluator() .setLabelCol("response")
然后我定义了一个参数网格,我在网格上训练以获得最佳模型和AUC
val paramGrid = new ParamGridBuilder(). addGrid(lr.regParam,lambdas). build() val pipeline = new CrossValidator(). setEstimator(modelPipeline). setEvaluator(evaluator). setEstimatorParamMaps(paramGrid). setNumFolds(nfolds) val cvModel = pipeline.fit(train)
如何获得最佳逻辑回归模型的系数(beta)?
解决方法
提取最佳模型:
val bestModel = cvModel.bestModel match { case pm: PipelineModel => Some(pm) case _ => None }
查找逻辑回归模型:
val lrm = bestModel .map(_.stages.collect { case lrm: LogisticRegressionModel => lrm }) .flatMap(_.headOption)
提取系数:
lrm.map(m => (m.intercept,m.coefficients))
快速和脏的等价物:
val lrm: LogisticRegressionModel = cvModel .bestModel.asInstanceOf[PipelineModel] .stages .last.asInstanceOf[LogisticRegressionModel] (lrm.intercept,lrm.coefficients)