ivot_wider问题“ values_from中的值未唯一标识;输出将包含列表项”

我的数据如下:

# A tibble: 6 x 4
  name          val time          x1
  <chr>       <dbl> <date>     <dbl>
1 C Farolillo     7 2016-04-20  51.5
2 C Farolillo     3 2016-04-21  56.3
3 C Farolillo     7 2016-04-22  56.3
4 C Farolillo    13 2016-04-23  57.9
5 C Farolillo     7 2016-04-24  58.7
6 C Farolillo     9 2016-04-25  59.0

我正在尝试使用pivot_wider函数来扩展基于name列的数据。我使用以下代码:

yy <- d %>% 
  pivot_wider(.,names_from = name,values_from = val)

哪个会给我以下警告消息:

Warning message:
Values in `val` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(val = list)` to suppress this warning.
* Use `values_fn = list(val = length)` to identify where the duplicates arise
* Use `values_fn = list(val = summary_fun)` to summarise duplicates

输出如下:

       time       x1        out1    out2 
    2016-04-20  51.50000    <dbl>   <dbl>
2   2016-04-21  56.34615    <dbl>   <dbl>
3   2016-04-22  56.30000    <dbl>   <dbl>
4   2016-04-23  57.85714    <dbl>   <dbl>
5   2016-04-24  58.70968    <dbl>   <dbl>
6   2016-04-25  58.96774    <dbl>   <dbl>

我知道here提到了该问题,并建议使用摘要统计信息来解决该问题。但是我有时间序列数据,因此不想使用汇总统计信息,因为每天都有一个值(而不是多个值)。

我知道问题是因为val列中有重复项(即在上面的示例中7次出现了3次。

关于如何pivot_wider和克服此问题的任何建议?

数据:

    d <- structure(list(name = c("C Farolillo","C Farolillo","Plaza Eliptica","Plaza Eliptica"),val = c(7,3,7,13,9,20,19,4,5,2,6,16,10,11,8,14,32,25,31,34,26,33,35,43,22,21,48,47,27,23,17,24,28,25),time = structure(c(16911,16912,16913,16914,16915,16916,16917,16918,16919,16920,16921,16922,16923,16924,16925,16926,16927,16928,16929,16930,16931,16932,16933,16934,16935,16936,16937,16938,16939,16940,16941,16942,16943,16944,16945,16946,16947,16948,16949,16950,16951,16952,16953,16954,16955,16956,16957,16958,16959,16960,16911,16960),class = "Date"),x1 = c(51.5,56.3461538461538,56.3,57.8571428571429,58.7096774193548,58.9677419354839,64.4615384615385,61.9310344827586,60.3214285714286,59.4137931034483,59.5806451612903,57.3448275862069,64.0333333333333,70.15625,71.3636363636364,62.8125,56.4375,56.4516129032258,51.741935483871,52.84375,53.09375,52.969696969697,54,54.3870967741936,60.3870967741936,64.4516129032258,66.2903225806452,68.2333333333333,69.7741935483871,70.5806451612903,73.8275862068966,72.8181818181818,64.6764705882353,64.4838709677419,68.7741935483871,62.1764705882353,68.969696969697,70.1935483870968,59.6774193548387,59.9677419354839,63.125,67.5882352941177,71.4705882352941,73.8529411764706,76.1935483870968,72.6451612903226,76.0645161290323,76.4193548387097,81.7741935483871,85.0645161290323,51.5,85.0645161290323)),class = c("tbl_df","tbl","data.frame"),row.names = c(NA,-102L))
litongfei1208 回答:ivot_wider问题“ values_from中的值未唯一标识;输出将包含列表项”

为每个name创建一个唯一的标识符行,然后使用pivot_wider

library(dplyr)

d %>%
  group_by(name) %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_wider(names_from = name,values_from = val) %>%
  select(-row)

# A tibble: 51 x 4
#   time          x1 `C Farolillo` `Plaza Eliptica`
#   <date>     <dbl>         <dbl>            <dbl>
# 1 2016-04-20  51.5             7               32
# 2 2016-04-21  56.3             3               25
# 3 2016-04-22  56.3             7               31
# 4 2016-04-23  57.9            13               34
# 5 2016-04-24  58.7             7               26
# 6 2016-04-25  59.0             9               33
# 7 2016-04-26  64.5            20               35
# 8 2016-04-27  61.9            19               43
# 9 2016-04-28  60.3             4               22
#10 2016-04-29  59.4             5               22
# … with 41 more rows
,

通常是错误

Warning message:
Values in `val` are not uniquely identified; output will contain list-cols.

最常见的原因是数据中的重复行(不包括val列之后),而不是val列中的重复行。

which(duplicated(d))
# [1] 14 65

OP的数据似乎有两个重复的行,这导致了此问题。删除重复的行也可以避免该错误。

yy <- d %>% distinct() %>% pivot_wider(.,names_from = name,values_from = val)
yy
# A tibble: 50 x 4
   time          x1 `C Farolillo` `Plaza Eliptica`
   <date>     <dbl>         <dbl>            <dbl>
 1 2016-04-20  51.5             7               32
 2 2016-04-21  56.3             3               25
 3 2016-04-22  56.3             7               31
 4 2016-04-23  57.9            13               34
 5 2016-04-24  58.7             7               26
 6 2016-04-25  59.0             9               33
 7 2016-04-26  64.5            20               35
 8 2016-04-27  61.9            19               43
 9 2016-04-28  60.3             4               22
10 2016-04-29  59.4             5               22
# ... with 40 more rows
,

此问题是由以下事实引起的:要传播/扩展的数据具有重复的标识符。尽管以上两种建议(即使用mutate(row = row_number())从行号创建唯一的人工ID或仅过滤distinct行)都可以使您进行更广泛的旋转,但它们会更改表的结构,这很可能会有一个逻辑上的组织性问题,下次您尝试将其加入其中时就会出现。

使用显式使用id_cols参数是一种更好的做法,以确保您实际上想在旋转后必须具有唯一性,如果遇到问题,请首先重新组织原始表。当然,您可能会找到过滤到不同行或添加新ID的原因,很可能希望避免在代码的早期重复。

,

我想,您的数据集中的复制是无意间发生的。 line13 / 14是完全相同的观察结果。只需更正数据集即可。 您可以查看d和yy数据集以查看有问题的观察结果。

本文链接:https://www.f2er.com/3109863.html

大家都在问