如何在R data.frame中添加按组的唯一值计数

我希望通过对第二个variables进行分组来统计唯一值的数量，然后将计数添加到现有data.frame中作为新列。例如，如果现有的数据框如下所示：

color type 1 black chair 2 black chair 3 black sofa 4 green sofa 5 green sofa 6 red sofa 7 red plate 8 blue sofa 9 blue plate 10 blue chair

我想为每种color添加数据中存在的唯一types的数量：

  color type unique_types 1 black chair 2 2 black chair 2 3 black sofa 2 4 green sofa 1 5 green sofa 1 6 red sofa 2 7 red plate 2 8 blue sofa 3 9 blue plate 3 10 blue chair 3

我希望能使用ave ，但似乎无法find一个简单的方法，不需要很多行。我有> 100,000行，所以我也不知道效率是多么重要。

这个问题有点类似于这个问题：计数每个组的观察值/行数，并将结果添加到数据框

使用ave （因为你特别要求）：

 within(df, { count <- ave(type, color, FUN=function(x) length(unique(x)))})

确保type是字符向量而不是因素。

既然你也说你的数据是巨大的，速度/性能可能因此是一个因素，我也build议data.table解决scheme。

 require(data.table) setDT(df)[, count := uniqueN(type), by = color] # v1.9.6+ # if you don't want df to be modified by reference ans = as.data.table(df)[, count := uniqueN(type), by = color]

uniqueN在uniqueN中实现，是一个更快的等效length(unique(.)) 。另外它也可以和data.frames / data.tables一起使用。

其他解决scheme

使用plyr：

 require(plyr) ddply(df, .(color), mutate, count = length(unique(type)))

使用aggregate ：

 agg <- aggregate(data=df, type ~ color, function(x) length(unique(x))) merge(df, agg, by="color", all=TRUE)

这是dplyr包的一个解决scheme – 它有n_distinct()作为length(unique())的包装。

 df %>% group_by(color) %>% mutate(unique_types = n_distinct(type))

这也可以在vector化的情况下实现，无需通过组合unique的table或tabulate

如果df$color是factor ，那么

或

 table(unique(df)$color)[as.character(df$color)] # black black black green green red red blue blue blue # 2 2 2 1 1 2 2 3 3 3

要么

 tabulate(unique(df)$color)[as.integer(df$color)] # [1] 2 2 2 1 1 2 2 3 3 3

如果df$color是character那么只是

 table(unique(df)$color)[df$color]

如果df$color是一个integer那么只是

 tabulate(unique(df)$color)[df$color]

如何在R data.frame中添加按组的唯一值计数

如何检查一个表是否包含Lua中的一个元素？

如何通过关系显示has_many的唯一logging？

MySQL：selectN行，但在一列中只有唯一的值

在UNIX shell脚本中从列表中select唯一或不同的值

独特的行，考虑两列，在R，没有秩序

在R中绘制多条线（数据序列），每条线都有独特的颜色

.NET独特的对象标识符

假设GUID始终是唯一的，是否安全？

SQL – 只在一列上select不同的名称