确定链接在一起的链接剧集的组

把这个简单的链接ID数据框：

test <- data.frame(id1=c(10,10,1,1,24,8),id2=c(1,36,24,45,300,11)) > test id1 id2 1 10 1 2 10 36 3 1 24 4 1 45 5 24 300 6 8 11

我现在想把所有链接的ID组合在一起。通过“链接”，我的意思是沿着链接链接，使一个组中的所有ID都被标记在一起。一种分支结构。即：

 Group 1 10 --> 1, 1 --> (24,45) 24 --> 300 300 --> NULL 45 --> NULL 10 --> 36, 36 --> NULL, Final group members: 10,1,24,36,45,300 Group 2 8 --> 11 11 --> NULL Final group members: 8,11

现在我大概知道我想要的逻辑，但不知道如何优雅地实现它。我正在考虑recursion使用match或%in%去每个分支，但是这次真的难倒了。

我将追逐的最终结果是：

 result <- data.frame(group=c(1,1,1,1,1,1,2,2),id=c(10,1,24,36,45,300,8,11)) > result group id 1 1 10 2 1 1 3 1 24 4 1 36 5 1 45 6 1 300 7 2 8 8 2 11

Bioconductor包RBGL （一个到BOOST图库的R接口）包含一个函数connectedComp() ，它可以在图中标识连接的组件 – 正是你想要的。

（要使用这个function，你首先需要安装graphics和RBGL包，可以在这里和这里find 。）

 library(RBGL) test <- data.frame(id1=c(10,10,1,1,24,8),id2=c(1,36,24,45,300,11)) ## Convert your 'from-to' data to a 'node and edge-list' representation ## used by the 'graph' & 'RBGL' packages g <- ftM2graphNEL(as.matrix(test)) ## Extract the connected components cc <- connectedComp(g) ## Massage results into the format you're after ld <- lapply(seq_along(cc), function(i) data.frame(group = names(cc)[i], id = cc[[i]])) do.call(rbind, ld) # group id # 1 1 10 # 2 1 1 # 3 1 24 # 4 1 36 # 5 1 45 # 6 1 300 # 7 2 8 # 8 2 11

这是Josh在向正确的方向推动之后发现的另一个答案。这个答案使用igraph包。对于那些正在search和浏览这个答案的人来说，我的test数据集在图论中被称为“边界列表”或“邻接列表”（ http://en.wikipedia.org/wiki/Graph_theory ）

 library(igraph) test <- data.frame(id1=c(10,10,1,1,24,8 ),id2=c(1,36,24,45,300,11)) gr.test <- graph.data.frame(test) links <- data.frame(id=unique(unlist(test)),group=clusters(gr.test)$membership) links[order(links$group),] # id group #1 10 1 #2 1 1 #3 24 1 #5 36 1 #6 45 1 #7 300 1 #4 8 2 #8 11 2

不使用包：

 # 2 sets of test data mytest <- data.frame(id1=c(10,10,3,1,1,24,8,11,32,11,45),id2=c(1,36,50,24,45,300,11,8,32,12,49)) test <- data.frame(id1=c(10,10,1,1,24,8),id2=c(1,36,24,45,300,11)) grouppairs <- function(df){ # from wide to long format; assumes df is 2 columns of related id's test <- data.frame(group = 1:nrow(df),val = unlist(df)) # keep moving to next pair until all same values have same group i <- 0 while(any(duplicated(unique(test)$val))){ i <- i+1 # get group of matching values matches <- test[test$val == test$val[i],'group'] # change all groups with matching values to same group test[test$group %in% matches,'group'] <- test$group[i] } # renumber starting from 1 and show only unique values in group order test$group <- match(test$group, sort(unique(test$group))) unique(test)[order(unique(test)$group), ] } # test grouppairs(test) grouppairs(mytest)

确定链接在一起的链接剧集的组

select指定date范围内的所有月份，包括具有0个值的月份

如何在xslt元素上应用group

使用lodash .groupBy。如何为分组输出添加自己的密钥？

如何selectMySQL中每个组的第一行？

按值分组

带有NaN（缺失）值的groupby列

如何在同一个select语句中使用count和group

MySQL查询GROUP BY日/月/年

在MySQL中按月份和年份分组

join两个SELECT语句结果

确定链接在一起的链接剧集的组

select指定date范围内的所有月份，包括具有0个值的月份

如何在xslt元素上应用group

使用lodash .groupBy。 如何为分组输出添加自己的密钥？

如何selectMySQL中每个组的第一行？

按值分组

带有NaN（缺失）值的groupby列

如何在同一个select语句中使用count和group

MySQL查询GROUP BY日/月/年

在MySQL中按月份和年份分组

join两个SELECT语句结果

使用lodash .groupBy。如何为分组输出添加自己的密钥？