我可以在箱形图中可视化“另一个变量”的平均值吗?

我有以下数据,显示在不同的高中背景下取得的学分百分比:

我可以在箱形图中可视化“另一个变量”的平均值吗?

我的代码如下:

ggplot(fulldata,aes(x=fct_reorder(gymnasiegrov,PERC_CREDIT,.fun = median,na.rm=T),y=PERC_CREDIT))+geom_boxplot()+coord_flip()

由于年龄可能是一个令人困惑的变量,因此我被要求添加有关每个组/箱线图的平均年龄的信息。

这实际上可以完成吗(使用geom_text或类似方法),还是我必须以其他方式可视化该信息?

平均年龄值应在每组的连接中显示。他们不必叠加在情节上。只要显示正确的顺序,就可以显示其旁边的值(例如,如果我可以说服R markdown在同一页上显示表格和箱线图),那将是完全可以接受的。

数据摘要:

structure(list(start_date = structure(c(17776,17776,17776),class = "Date"),PERC_CREDIT = c(56.2962962962963,69.6296296296296,1.48148148148148,60,0),gymnasiegrov = structure(c(11L,9L,6L,13L,4L),.Label = c("medieprogrammet/medieproduktion","Hotell- och Restaurang","komvux","teknikprogrammet","specialutformat program","naturvetenskapliga programmet","ekonomiprogrammet/ ekonomi","bygg,el,fordon,hantverk,sjöfart,industriteknik","ekonomiprogrammet/ juridik","Oklart","samhällsvetenskapliga programmet","Handels- och administrationsprogrammet","estetiska programmet","friskoleprogram","samhälls- och ekonomiprogrammet"
),class = c("ordered","factor")),ålder = structure(c(20,20,19,32,27,26),class = "difftime",units = "days")),row.names = c(NA,-6L),groups = structure(list(start_date = structure(17776,.rows = list(1:6)),-1L),class = c("tbl_df","tbl","data.frame"),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"))

大数据摘录:

structure(list(start_date = structure(c(17776,16.2962962962963,93.3333333333333,45.1851851851852,71.1111111111111,5.18518518518519,65.1851851851852,86.6666666666667,84.4444444444444,97.037037037037,85.1851851851852,83.7037037037037,80,57.037037037037,61.4814814814815,80.7407407407407,34.8148148148148,44.4444444444444,70.3703703703704,76.2962962962963,14.0740740740741,94.8148148148148,94.0740740740741,95.5555555555556,100,79.2592592592593,28.1481481481481,55.5555555555556,22.962962962963,47.4074074074074,50.3703703703704,51.8518518518518,88.1481481481482,82.2222222222222,45.9259259259259,37.7777777777778,6.66666666666667,25.9259259259259,34.0740740740741,8.88888888888889,102.222222222222,33.3333333333333,48.8888888888889,97.7777777777778,78.5185185185185,27.4074074074074,82.962962962963,72.5925925925926,68.8888888888889,60.7407407407407,46.6666666666667,85.9259259259259,77.7777777777778,53.3333333333333,12.5925925925926,23.7037037037037,77.7777777777778),4L,3L,8L,7L,5L,12L,11L,2L,14L,10L,1L,15L,10L),"samhälls- och ekonomiprogrammet"
    ),26,23,22,25,24,21,29,39,34,33,30,28,47,21),-154L),.rows = list(1:154)),"data.frame"))
qq22643008 回答:我可以在箱形图中可视化“另一个变量”的平均值吗?

您只需在地块旁边标出平均年龄即可。

library(ggpubr) # for ggarrange

fulldata$age <- as.numeric(fulldata$ålder)

# your plot
g1 <- ggplot(fulldata,aes(x=fct_reorder(gymnasiegrov,PERC_CREDIT,.fun = median,na.rm=T),y = PERC_CREDIT)) + geom_boxplot() + coord_flip()

# age mean plot 
g2 <- ggplot(fulldata) + stat_summary(aes(x = fct_reorder(gymnasiegrov,y = age),fun.data = "mean_se") + coord_flip() + 
  theme(axis.text.y = element_blank(),# remove y axis labels since the're long
        axis.title.y = element_blank())# and the same as the first.

ggarrange(g1,g2,ncol=2,widths = c(.65,.35))

enter image description here

本文链接:https://www.f2er.com/3101441.html

大家都在问