使用不同方法对基因表达数据进行聚类的正确数量是多少？

2024-05-19 • 问答

我有一个标准化的基因表达数据，我想为其寻找最佳数目的簇。准备数据后，使用mclust包

我正在使用以下内容对数据进行排名。

ranked.exprs <- probe_ranking(input=exp_file,probe_number=2000,probe_num_selection="Fixed_Probe_Num",data.exp=genes,method="SD_Rank")

使用

#Calculate number of clusters 
cluster_num <- number_clusters(data.exp=genes,Fixed=NULL,gap_statistic=TRUE)
# I get 8 

#checking silhouette values for kmeans
resukm <-fviz_nbclust(ranked.exprs,FUNcluster = kmeans,method = c("silhouette"),diss = NULL,k.max = 10,nboot = 10,verbose = interactive(),barfill = "steelblue",barcolor = "steelblue",linecolor = "steelblue",print.summary = TRUE)
#results to 2 clusters

#running the gap statistic using hierarchical clustering 
gap_stat <- clusGap(genes,FUN=hcut,K.max = 10,B = 50)
#results to 2 clusters

#checking silhouette values for hierarchical clustering 
resuhie <-fviz_nbclust(genes,FUNcluster = hcut,print.summary = TRUE)
#results to 2 clusters

获得两个不同的数字8和2的原因可能是什么？我的数据也包含缺失值。

使用不同方法对基因表达数据进行聚类的正确数量是多少？

q00120994 回答：使用不同方法对基因表达数据进行聚类的正确数量是多少？

大家都在问