使用:=在data.table中,按组分组多个列
使用data.table
分配给多个列的最佳方式是什么? 例如:
f <- function(x) {c("hi", "hello")} x <- data.table(id = 1:10)
我想要做这样的事情(当然这个语法是不正确的):
x[ , (col1, col2) := f(), by = "id]
并扩大,我可能有很多名称在variables列(如column_names
),我想这样做:
x[ , col_names := another_f(), by = "id", with = FALSE]
什么是正确的方式来做这样的事情?
这现在在R-Forge的v1.8.3中有效。 感谢您的突出!
x <- data.table(a = 1:3, b = 1:6) f <- function(x) {list("hi", "hello")} x[ , c("col1", "col2") := f(), by = a][] # ab col1 col2 # 1: 1 1 hi hello # 2: 2 2 hi hello # 3: 3 3 hi hello # 4: 1 4 hi hello # 5: 2 5 hi hello # 6: 3 6 hi hello x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][] # ab col1 col2 mean sum # 1: 1 1 hi hello 2.5 5 # 2: 2 2 hi hello 3.5 7 # 3: 3 3 hi hello 4.5 9 # 4: 1 4 hi hello 2.5 5 # 5: 2 5 hi hello 3.5 7 # 6: 3 6 hi hello 4.5 9 mynames = c("Name1", "Longer%") x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a] # ab col1 col2 mean sum Name1 Longer% # 1: 1 1 hi hello 2.5 5 10 15 # 2: 2 2 hi hello 3.5 7 14 21 # 3: 3 3 hi hello 4.5 9 18 27 # 4: 1 4 hi hello 2.5 5 10 15 # 5: 2 5 hi hello 3.5 7 14 21 # 6: 3 6 hi hello 4.5 9 18 27
x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same # ab col1 col2 mean sum Name1 Longer% # 1: 1 1 hi hello 2.5 5 10 15 # 2: 2 2 hi hello 3.5 7 14 21 # 3: 3 3 hi hello 4.5 9 18 27 # 4: 1 4 hi hello 2.5 5 10 15 # 5: 2 5 hi hello 3.5 7 14 21 # 6: 3 6 hi hello 4.5 9 18 27 x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][] # same # ab col1 col2 mean sum Name1 Longer% # 1: 1 1 hi hello 2.5 5 10 15 # 2: 2 2 hi hello 3.5 7 14 21 # 3: 3 3 hi hello 4.5 9 18 27 # 4: 1 4 hi hello 2.5 5 10 15 # 5: 2 5 hi hello 3.5 7 14 21 # 6: 3 6 hi hello 4.5 9 18 27 x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][] # same # ab col1 col2 mean sum Name1 Longer% # 1: 1 1 hi hello 2.5 5 10 15 # 2: 2 2 hi hello 3.5 7 14 21 # 3: 3 3 hi hello 4.5 9 18 27 # 4: 1 4 hi hello 2.5 5 10 15 # 5: 2 5 hi hello 3.5 7 14 21 # 6: 3 6 hi hello 4.5 9 18 27
- 为什么data.table通过引用更新名称(DT),即使我分配给另一个variables?
- 如何使用data.table:=高效地计算坐标对之间的距离
- 是否有可能使用R data.table函数foverlaps来查找两个表中的重叠范围的交集?
- 如何重新sortingdata.table列(不复制)
- 为什么data.tables的X 连接不允许完整的外连接或左连接?
- 在data.table中过滤掉重复/非唯一的行
- 为什么pandas在python合并比data.table合并R?
- 使用!= <某些非NA>子集化data.table也不包括NA
- 如何根据子串匹配来selectR data.table行(一个像SQL一样)