R样本直到满足条件

2024-06-02 • 问答

所以我有以下数据框：

structure(list(V1 = c(45L,17L,28L,26L,18L,41L,20L,23L,31L,48L,32L,30L,11L,26L)),.Names = "V1",row.names = c("24410","26526","26527","43264","63594","125630","148318","245516","269500","293171","301217","400294","401765","520084","545501","564914","742654"),class = "data.frame")

行名表示宗地，V1显示可以从中提取的每个宗地的示例数。我想要的是从每个包裹中抽取一个与可用示例数量成比例的样本，最后下落，最终每个包裹总共有400个示例。想法是不对一个包裹相对于另一个包裹进行过度采样。

正在进行采样的数据集为here。

到目前为止，代码如下：

df <- read.csv('/data/samplefrom.csv')
df.training <- data.frame()
n <- 400

for(crop in sort(unique(df$code_surveyed))){
  for (bbch_stage in sort(unique(df$bbch))) {
    df.int <- df[df$bbch==bbch_stage & df$code_surveyed == crop,]
    df.int <- df.int[!is.na(df.int$name),]
    rawnum <- nrow(df[df$bbch==bbch_stage & df$code_surveyed == crop,])
    if(rawnum >= n){
      df.bbch.slected<-df[df$bbch==bbch_stage & df$code_surveyed == crop,]
      df.bbch.slected.sampled<-df.bbch.slected[sample(nrow(df.bbch.slected),n),] #(round(n_bbch*length(which(df$bbch==bbch_stage))))),]
      df.training<-rbind(df.training,df.bbch.slected.sampled)
    }

  }
}

这是为每种作物+ bbch_stage组合随机抽取400个示例（将其理解为复合变量）。一切都很好，但我希望能够控制示例来自哪个宗地（变量objectid）。本质上，采样时需要额外的过滤步骤。

我尝试使用while和repeat语句以及stratified中的devtools函数进行了几次尝试，但是似乎都没有产生出米之后。

df.training<-data.frame() for (crop in unique(df$code)) { df.crop.slected<-df[df$code==crop,] df.crop.slected.sampled <- data.frame() while(nrow(df.crop.slected.sampled) < 400){ for(parcel in 1:length(unique(df.crop.slected$objectid))){ df.crop.slected.pacel <- df.crop.slected[df.crop.slected$objectid == unique(df.crop.slected$objectid)[parcel],] df.crop.slected.pacel <- df.crop.slected.pacel[sample(nrow(df.crop.slected.pacel),1),] if(! df.crop.slected.pacel$name %in% df.crop.slected.sampled$name){ df.crop.slected.sampled <- rbind(df.crop.slected.sampled,df.crop.slected.pacel) } } } df.training<-rbind(df.training,df.crop.slected.sampled) }

R样本直到满足条件

dasfasdwqeee 回答：R样本直到满足条件

大家都在问