R与多个无序拆分参数strsplit?

给定一个string

test_1<-"abc def,ghi klm" test_2<-"abc, def ghi klm" 

我希望获得

 "abc" "def" "ghi" 

但是,使用strsplit,必须知道string中的分割值的顺序,因为strsplit使用第一个值做第一个分割,第二个做第二个…然后再循环。

但是这不是:

 strsplit(test_1, c(",", " ")) strsplit(test_2, c(" ", ",")) strsplit(test_2, split=c("[:punct:]","[:space:]"))[[1]] 

我正在寻找拆分string,无论我在哪里find我的任何分裂值在一个单一的步骤。

其实strsplit使用grep模式:

 > strsplit(test_1, "\\, |\\,| ") [[1]] [1] "abc" "def" "ghi" "klm" > strsplit(test_2, "\\, |\\,| ") [[1]] [1] "abc" "def" "ghi" "klm" 

如果不使用\\,\\, (注意SO不显示的额外空间),您将得到一些字符(0)值。 如果我写了下面的话,可能会更清楚:

 > strsplit(test_2, "\\,\\s|\\,|\\s") [[1]] [1] "abc" "def" "ghi" "klm" 

@Fojtasek是如此正确:使用字符类通常简化了任务,因为它创build了一个隐式的逻辑OR:

 > strsplit(test_2, "[, ]+") [[1]] [1] "abc" "def" "ghi" "klm" > strsplit(test_1, "[, ]+") [[1]] [1] "abc" "def" "ghi" "klm" 

你可以去strsplit(test_1, "\\W")

如果你不喜欢正则expression式,你可以多次调用strsplit()

 strsplits <- function(x, splits, ...) { for (split in splits) { x <- unlist(strsplit(x, split, ...)) } return(x[!x == ""]) # Remove empty values } strsplits(test_1, c(" ", ",")) # "abc" "def" "ghi" "klm" strsplits(test_2, c(" ", ",")) # "abc" "def" "ghi" "klm" 

更新后的示例

 strsplits(test_1, c("[[:punct:]]","[[:space:]]")) # "abc" "def" "ghi" "klm" strsplits(test_2, c("[[:punct:]]","[[:space:]]")) # "abc" "def" "ghi" "klm" 

但是,如果你打算使用正则expression式,那么你可以使用@Dinin的方法:

 strsplit(test_1, "[[:punct:][:space:]]+")[[1]] # "abc" "def" "ghi" "klm" strsplit(test_2, "[[:punct:][:space:]]+")[[1]] # "abc" "def" "ghi" "klm" 
  test_1<-"abc def,ghi klm" test_2<-"abc, def ghi klm" key_words <- c("abc","def","ghi") matches <- str_c(key_words, collapse ="|") str_extract_all(test_1, matches) str_extract_all(test_2, matches)