我无法使用R中的一个类生成分类的混淆矩阵

我试图在Kaggle(https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)中的数据集上理解和实现R中的一个类分类。

当尝试打印混淆矩阵时,出现错误:

Error in! All.equal (nrow (data),ncol (data)): invalid type argument

我在做什么错了?

library(caret)
library(dplyr)
library(e1071)
library(NLP)
library(tm)
library(data.table)

ds = read.csv('C:/Users/hugos/Desktop/FS Dataset/Health/data_cancer.csv',header = TRUE)

mycols <- c("id","diagnosis","radius_mean","texture_mean","perimeter_mean","area_mean","smoothness_mean","compactness_mean","concavity_mean","concave.points_mean","symmetry_mean","fractal_dimension_mean","radius_se","texture_se","perimeter_se","area_se","smoothness_se","compactness_se","concavity_se","concave.points_se","symmetry_se","fractal_dimension_se","radius_worst","texture_worst","perimeter_worst","area_worst","smoothness_worst","compactness_worst","concavity_worst","concave.points_worst","symmetry_worst","fractal_dimension_worst")

#Convert to numeric
setDT(ds)[,(mycols) := lapply(.SD,as.numeric),.SDcols = mycols]

#Convert classification to logical
data <- ds[,.(id,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave.points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave.points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave.points_worst,symmetry_worst,fractal_dimension_worst,diagnosis = ds$diagnosis == "TRUE")]

dataclean <- na.omit(data)

#Separating train and test
inTrain<-createDataPartition(1:nrow(dataclean),p=0.7,list=FALSE)
train<- dataclean[inTrain]
test <- dataclean[-inTrain]


svm.model<-svm(diagnosis ~ id+radius_mean+texture_mean+perimeter_mean+area_mean+smoothness_mean+compactness_mean+concavity_mean+concave.points_mean+symmetry_mean+fractal_dimension_mean+radius_se+texture_se+perimeter_se+area_se+smoothness_se+compactness_se+concavity_se+concave.points_se+symmetry_se+fractal_dimension_se+radius_worst+texture_worst+perimeter_worst+area_worst+smoothness_worst+compactness_worst+concavity_worst+concave.points_worst+symmetry_worst+fractal_dimension_worst,data = train,type='one-classification',trControl = fitControl,nu=0.10,scale=TRUE,kernel="radial",metric = "ROC")

#Perform predictions 
svm.predtrain<-predict(svm.model,train)
svm.predtest<-predict(svm.model,test)

confTrain <- table(Predicted=svm.predtrain,Reference=train$diagnosis[as.integer(names(svm.predtrain))])
confTest <- table(Predicted=svm.predtest,Reference=test$diagnosis[as.integer(names(svm.predtest))])

confusionmatrix(confTest,positive='TRUE')

print(confTrain)
print(confTest)
lijing_966 回答:我无法使用R中的一个类生成分类的混淆矩阵

您的问题在这条线上:

#Convert classification to logical
data <- ds[,.(id,radius_mean,...,diagnosis = ds$diagnosis == "TRUE")]

我假设您正在使用R版本4.0,因为read.csv函数的默认行为是现在将字符列转换为因子。该命令:

#Convert to numeric
setDT(ds)[,(mycols) := lapply(.SD,as.numeric),.SDcols = mycols]

然后将所有诊断转换为NA,因为它们分别是代表恶性和良性的“ M”或“ B”。

因此,请确保在导入数据时将字符串转换为因子。

ds = read.csv('.../data_cancer.csv',header = TRUE,stringsAsFactors = TRUE)
str(ds)
'data.frame':   569 obs. of  33 variables:
 $ id                     : int  842302 842517 84300903 84348301 84358402 843786 844359 ...
 $ diagnosis              : Factor w/ 2 levels "B","M": 2 2 2 2 2 2 2 2 2 2 ...

我想这将需要一些人来适应R的这种新行为。 将分类转换为逻辑的命令应为:

data <- ds[,diagnosis = diagnosis == 2)] # or  == 1 ?

然后使其余所有命令起作用。

confusionMatrix(confTest,positive='TRUE')

Confusion Matrix and Statistics

         Reference
Predicted FALSE TRUE
    FALSE    10    8  # Note these numbers may change
    TRUE    100   50

               Accuracy : 0.3571          
                 95% CI : (0.2848,0.4346)
    No Information Rate : 0.6548          
    P-Value [Acc > NIR] : 1               

                  Kappa : -0.0342         

 Mcnemar's Test P-Value : <2e-16          

            Sensitivity : 0.86207         
            Specificity : 0.09091         
         Pos Pred Value : 0.33333         
         Neg Pred Value : 0.55556         
             Prevalence : 0.34524         
         Detection Rate : 0.29762         
   Detection Prevalence : 0.89286         
      Balanced Accuracy : 0.47649         

       'Positive' Class : TRUE
本文链接:https://www.f2er.com/2337630.html

大家都在问