将数据框的列拆分为多个列

我想收集表格的数据

before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) attr type 1 1 foo_and_bar 2 30 foo_and_bar_2 3 4 foo_and_bar 4 6 foo_and_bar_2

并在上面的“ type ”列上使用split()来得到如下所示的结果：

  attr type_1 type_2 1 1 foo bar 2 30 foo bar_2 3 4 foo bar 4 6 foo bar_2

我想出了一些令人难以置信的复杂的事情，涉及到某种forms的工作，但我从那以后就错了。成为最好的方法似乎太复杂了。我可以像下面那样使用strsplit ，但是不清楚如何将它们回到数据框中的2列。

 > strsplit(as.character(before$type),'_and_') [[1]] [1] "foo" "bar" [[2]] [1] "foo" "bar_2" [[3]] [1] "foo" "bar" [[4]] [1] "foo" "bar_2"

感谢任何指针。我还没有完全掌握R列表。

使用stringr::str_split_fixed

 library(stringr) str_split_fixed(before$type, "_and_", 2)

另一个select是使用新的tidyr包。

 library(dplyr) library(tidyr) before <- data.frame( attr = c(1, 30 ,4 ,6 ), type = c('foo_and_bar', 'foo_and_bar_2') ) before %>% separate(type, c("foo", "bar"), "_and_") ## attr foo bar ## 1 1 foo bar ## 2 30 foo bar_2 ## 3 4 foo bar ## 4 6 foo bar_2

还有另一种方法：使用rbind ：

 before <- data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) out <- strsplit(as.character(before$type),'_and_') do.call(rbind, out) [,1] [,2] [1,] "foo" "bar" [2,] "foo" "bar_2" [3,] "foo" "bar" [4,] "foo" "bar_2"

并结合：

 data.frame(before$attr, do.call(rbind, out))

5年后添加强制data.table解决scheme

 library(data.table) ## v 1.9.6+ setDT(before)[, paste0("type", 1:2) := tstrsplit(type, "_and_")] before # attr type type1 type2 # 1: 1 foo_and_bar foo bar # 2: 30 foo_and_bar_2 foo bar_2 # 3: 4 foo_and_bar foo bar # 4: 6 foo_and_bar_2 foo bar_2

我们也可以通过添加type.convert和fixed参数来确保生成的列具有正确的types并提高性能（因为"_and_"不是真正的正则expression式）

 setDT(before)[, paste0("type", 1:2) := tstrsplit(type, "_and_", type.convert = TRUE, fixed = TRUE)]

请注意，使用“[”的sapply可以用来提取这些列表中的第一个或第二个项目，所以：

 before$type_1 <- sapply(strsplit(as.character(before$type),'_and_'), "[", 1) before$type_2 <- sapply(strsplit(as.character(before$type),'_and_'), "[", 2) before$type <- NULL

这里是一个gsub方法：

 before$type_1 <- gsub("_and_.+$", "", before$type) before$type_2 <- gsub("^.+_and_", "", before$type) before$type <- NULL

这里是与aniko的解决scheme相同的一行，但使用hadley的stringr包：

 do.call(rbind, str_split(before$type, '_and_'))

要添加到选项，你也可以像我splitstackshape::cSplit使用我的splitstackshape::cSplit函数：

 library(splitstackshape) cSplit(before, "type", "_and_") # attr type_1 type_2 # 1: 1 foo bar # 2: 30 foo bar_2 # 3: 4 foo bar # 4: 6 foo bar_2

一个简单的方法是使用sapply()和[函数：

 before <- data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) out <- strsplit(as.character(before$type),'_and_')

例如：

 > data.frame(t(sapply(out, `[`))) X1 X2 1 foo bar 2 foo bar_2 3 foo bar 4 foo bar_2

sapply()的结果是一个matrix，需要转换并转换回数据框。然后是一些简单的操作，产生你想要的结果：

 after <- with(before, data.frame(attr = attr)) after <- cbind(after, data.frame(t(sapply(out, `[`)))) names(after)[2:3] <- paste("type", 1:2, sep = "_")

在这一点上， after是你想要的

 > after attr type_1 type_2 1 1 foo bar 2 30 foo bar_2 3 4 foo bar 4 6 foo bar_2

这是一个基本的R一个class轮，重叠了一些以前的解决scheme，但返回一个data.frame与正确的名称。

 out <- setNames(data.frame(before$attr, do.call(rbind, strsplit(as.character(before$type), split="_and_"))), c("attr", paste0("type_", 1:2))) out attr type_1 type_2 1 1 foo bar 2 30 foo bar_2 3 4 foo bar 4 6 foo bar_2

它使用strsplit来分解variables， data.frame使用do.call / rbind把数据放回到data.frame中。额外的增量改进是使用setNames将variables名称添加到data.frame。

如果你想坚持使用strsplit()另一种方法是使用unlist()命令。这里有一个解决scheme。

 tmp <- matrix(unlist(strsplit(as.character(before$type), '_and_')), ncol=2, byrow=TRUE) after <- cbind(before$attr, as.data.frame(tmp)) names(after) <- c("attr", "type_1", "type_2")

从R版本3.4.0开始，您可以使用utils包中的strcapture() （包含在基本R安装中），将输出绑定到其他列上。

 out <- strcapture( "(.*)_and_(.*)", as.character(before$type), data.frame(type_1 = character(), type_2 = character()) ) cbind(before["attr"], out) # attr type_1 type_2 # 1 1 foo bar # 2 30 foo bar_2 # 3 4 foo bar # 4 6 foo bar_2

主题几乎枯竭，我想虽然提供一个稍微更一般的版本的解决scheme，你不知道输出列的数量，先验。所以例如你有

 before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2', 'foo_and_bar_2_and_bar_3', 'foo_and_bar')) attr type 1 1 foo_and_bar 2 30 foo_and_bar_2 3 4 foo_and_bar_2_and_bar_3 4 6 foo_and_bar

我们不能使用dplyr separate()因为我们不知道split之前结果列的个数，于是我创build了一个函数，它使用stringr来拆分一个列，给定了生成的模式和名称前缀列。我希望使用的编码模式是正确的。

 split_into_multiple <- function(column, pattern = ", ", into_prefix){ cols <- str_split_fixed(column, pattern, n = Inf) # Sub out the ""'s returned by filling the matrix to the right, with NAs which are useful cols[which(cols == "")] <- NA cols <- as.tibble(cols) # name the 'cols' tibble as 'into_prefix_1', 'into_prefix_2', ..., 'into_prefix_m' # where m = # columns of 'cols' m <- dim(cols)[2] names(cols) <- paste(into_prefix, 1:m, sep = "_") return(cols) }

然后我们可以在dplyrpipe道中使用split_into_multiple ，如下所示：

 after <- before %>% bind_cols(split_into_multiple(.$type, "_and_", "type")) %>% # selecting those that start with 'type_' will remove the original 'type' column select(attr, starts_with("type_")) >after attr type_1 type_2 type_3 1 1 foo bar <NA> 2 30 foo bar_2 <NA> 3 4 foo bar_2 bar_3 4 6 foo bar <NA>

然后我们可以用gather来整理…

 after %>% gather(key, val, -attr, na.rm = T) attr key val 1 1 type_1 foo 2 30 type_1 foo 3 4 type_1 foo 4 6 type_1 foo 5 1 type_2 bar 6 30 type_2 bar_2 7 4 type_2 bar_2 8 6 type_2 bar 11 4 type_3 bar_3

这个问题是相当古老的，但我会添加我觉得目前最简单的解决scheme。

 library(reshape2) before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2')) newColNames <- c("type1", "type2") newCols <- colsplit(before$type, "_and_", newColNames) after <- cbind(before, newCols) after$type <- NULL after

 tp <- c("ac","def","ghi","mn") temp = strsplit(as.character(tp),'-') x=c(); y=c(); z=c(); #tab=data.frame() #tab= cbind(tab,c(x,y,z)) for(i in 1:length(temp) ) { l = length(temp[[i]]); if(l==2) { x=c(x,temp[[i]][1]); y=c(y,"NA") z=c(z,temp[[i]][2]); df= as.data.frame(cbind(x,y,z)) }else { x=c(x,temp[[i]][1]); y=c(y,temp[[i]][2]); z=c(z,temp[[i]][3]); df= as.data.frame(cbind(x,y,z)) } }

将数据框的列拆分为多个列

如何将逗号分隔的string转换为ArrayList？

有没有一个函数在PL / SQL中拆分string？

将列表拆分成更小的列表

使用C ++ 11分割string

迭代string的最优雅的方法

在Oracle中将string拆分为多行

Javastring分割为“。”（点）

如何在Ruby中分隔分隔string并将其转换为数组？

在Java中将string拆分为长度相等的子string

在Python中将以分号分隔的string拆分为字典