基于多个分组因子减去值

我有一个数据集中的磷浓度为17天(浓度是累积的,因此从第1天到第102天在所有情况下都增加)。有22种不同的处理方式(列= Trmt)。每个Trmt有3个等级(等级= X,Y,Z)。每个级别2次测量,每个Trmt共6次。

我的目标是使用ggplot2通过浓度(y轴)绘制3天的日线图(x轴;数字)。数据应按Trmt,级别和天分组,总共进行51次测量(3行x 17天)。

我的数据如下:

structure(list(Trmt = structure(c(2L,2L,1L,4L,3L,6L,5L,8L,7L,10L,9L,12L,11L,14L,13L,16L,15L,18L,17L,20L,19L,22L,21L,21L),.Label = c("A01nF","A01yT","A02nF","A02yT","A03nF","A03yT","A04nF","A04yT","A05nF","A05yT","A06nF","A06yT","A07nF","A07yT","A08nF","A08yT","A10nF","A10yT","A11nF","A11yT","A13nF","A13yT"),class = "factor"),Level = structure(c(1L,3L),.Label = c("X","Y","Z"),Day1 = c(3L,4L),Day2 = c(10L,7L),Day4 = c(11L,14L),Day7 = c(19L,20L),Day10 = c(24L,23L,25L,24L,Day13 = c(29L,29L,26L,27L,30L,28L,26L),Day18 = c(32L,31L,32L,34L,33L,35L,35L),Day23 = c(39L,40L,38L,37L,36L,39L,38L),Day28 = c(42L,43L,44L,42L,45L,41L,44L),Day35 = c(50L,50L,48L,46L,49L,47L,46L),Day42 = c(52L,51L,53L,54L,55L,52L,52L),Day52 = c(59L,57L,56L,58L,59L,60L,60L),Day62 = c(67L,65L,68L,69L,70L,66L,67L,70L),Day72 = c(74L,74L,71L,75L,72L,73L,75L),Day82 = c(76L,78L,79L,77L,80L,76L,77L),Day92 = c(85L,84L,85L,83L,82L,81L,85L),Day102 = c(89L,88L,90L,87L,89L,86L,88L)),class = "data.frame",row.names = c(NA,-132L))

所需的库: tidyr,plyr,ggplot2

到目前为止,我采取的步骤是:

将数据转换为长格式(df =数据集名称):

    Fig1 <- gather(df,day,phosphorus,Day1:Day102,factor_key=TRUE)

将因子日更改为数字

    df$day2 <-revalue(df$day,c("Day1"="1","Day2"="2","Day4"="4","Day7"="7","Day10"="10","Day13"="13","Day18"="18","Day23" = "23","Day28" = "28","Day35" = "35","Day42" = "42","Day52" = "52","Day62" = "62","Day72" = "72","Day82" = "82",Day92" = "92","Day102" = "102"))

df$day3 <- as.numeric(as.character(df$day2))    

按Trmt,级别和第3天分组

 GroupedDF <- df %>% group_by(Trmt,Level,day3)
 GroupedCO2M <- GroupedDF %>% summarise(disp = mean(phosphorus)) 

我现在想通过考虑Trmt和Level来减去值,从而将行数从102减少到51。我想从相应的'nF'格中减去'yT'Trmt格,对于每个Level( X,Y和Z)。例如,从A01nf_X减去A01yT_X,从A01nf_Y减去A01yT_Y,从A01nf_Z减去A01yT_Z,依此类推。这应该总共得到51分,每个等级17分。

以下是我的想法:

基于多个分组因子减去值

非常感谢您的任何建议。

eywe0686 回答:基于多个分组因子减去值

感谢您分享数据。您发布的数据有点长,因此可能无法完全复制和粘贴

您的数据具有广泛的格式,您需要查找相似组之间的每个测量值的平均值(按天,水平,治疗定义)。因此,我们可以采用多种格式进行处理:

tmp <- Data %>% group_by(Trmt,Level) %>% summarise_all(mean)
    > head(tmp)
# A tibble: 6 x 19
# Groups:   Trmt [2]
  Trmt  Level  Day1  Day2  Day4  Day7 Day10 Day13 Day18 Day23 Day28 Day35 Day42
  <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A01nF X       3.5   8    12    19    23    29.5  32.5  36.5  42      50  53  
2 A01nF Y       4.5   9.5  13    17.5  21    28    32.5  36    43.5    48  54.5
3 A01nF Z       1     8.5  13.5  18.5  22.5  28.5  33    37.5  43      49  51.5
4 A01yT X       2.5   8.5  11    19.5  22.5  28    31.5  38    43      50  52.5
5 A01yT Y       2.5   7.5  13.5  17    22    29.5  31    38.5  43.5    49  52.5
6 A01yT Z       3     7    14.5  18    23    28    33    38    43.5    48  54 

这将为您提供每个Trmt,Level的平均值,并且每一列(天)分别是平均值。下一步是在Trmt下定义2个子组(对于A01,A02 ..为nF和yT),为此,我们可以引入一个名为“站点”的子组,该子组是没有nF,yT的Trmt。一旦使用此“站点”和级别对data.frame进行分组,第一行将始终为nF,第二行将始终为yT,因此,以该分组中所有“天”列的差异为准。所以我们这样做:

    # need to ungroup Trmt to remove it later
    tmp <- tmp%>% ungroup(Trmt) %>% 
    mutate(site = sub("[yn][TF]","",Trmt)) %>% 
    select(-Trmt) %>% 
    group_by(site,Level) %>% 
    summarize_all(diff)

现在您有了每种治疗,每种水平和每天的nF-yT值

> head(tmp)
# A tibble: 6 x 19
# Groups:   site [2]
  site  Level  Day1  Day2  Day4  Day7 Day10 Day13 Day18 Day23 Day28 Day35 Day42
  <chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A01   X      -1     0.5  -1     0.5  -0.5  -1.5  -1     1.5   1     0    -0.5
2 A01   Y      -2    -2     0.5  -0.5   1     1.5  -1.5   2.5   0     1    -2  
3 A01   Z       2    -1.5   1    -0.5   0.5  -0.5   0     0.5   0.5  -1     2.5
4 A02   X       1.5   1     1.5   1    -1    -1.5   2    -1.5  -1.5  -1     2  
5 A02   Y       0.5   0    -1.5  -1     0.5   1.5  -0.5  -3    -1.5   0     1  
6 A02   Z       4     2     1     0.5   1.5   0     2.5   0.5   0.5   1.5   0 

最后一部分是绘图。我们将其转换为long,然后也将“ Day”(天的数字形式)制成。

plotdf <- gather(tmp,day,Diff,Day1:Day102,factor_key=TRUE) %>%
mutate(Day=as.numeric(sub("Day",day)))
# and plot

ggplot(plotdf,aes(x=Day,y=Diff,col=Level,shape=Level)) + geom_line() + geom_point() + facet_wrap(~site) + scale_color_manual(values=c("grey10","grey40","grey80"))

enter image description here

上图显示了每个站点的差异。对于差异是所有网站的平均值:

meandf <- plotdf %>% group_by(Level,Day) %>% summarize(Diff=mean(Diff))
ggplot(meandf,shape=Level)) + geom_line() + geom_point() + scale_color_manual(values=c("grey10","grey80"))

enter image description here 示例数据集,为Day1,Day2和Day4子集

Data <- structure(list(Trmt = structure(c(2L,2L,1L,4L,3L,6L,5L,8L,7L,10L,9L,12L,11L,14L,13L,16L,15L,18L,17L,20L,19L,22L,21L,21L
),.Label = c("A01nF","A01yT","A02nF","A02yT","A03nF","A03yT","A04nF","A04yT","A05nF","A05yT","A06nF","A06yT","A07nF","A07yT","A08nF","A08yT","A10nF","A10yT","A11nF","A11yT","A13nF","A13yT"),class = "factor"),Level = structure(c(1L,3L),.Label = c("X","Y","Z"),Day1 = c(3L,4L),Day2 = c(10L,7L),Day4 = c(11L,14L)),class = "data.frame",row.names = c(NA,-132L))
本文链接:https://www.f2er.com/3147282.html

大家都在问