如何修剪R中的前导和尾随空白？

我在data.frame中引入和尾随空白有一些麻烦。例如，我喜欢根据一定的条件来查看data.frame中的特定row ：

 > myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)] [1] codeHelper country dummyLI dummyLMI dummyUMI [6] dummyHInonOECD dummyHIOECD dummyOECD <0 rows> (or 0-length row.names)

我想知道为什么我没有得到预期的产出，因为奥地利明显存在于我的数据data.frame 。查看我的代码历史记录，并试图找出哪里出了错我试过：

 > myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)] codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD 18 AUT Austria 0 0 0 0 1 dummyOECD 18 1

所有我在命令中改变的是奥地利之后的一个额外的空白。

显然会出现更多烦人的问题。例如，当我喜欢根据国家/地区栏合并两个框架。一个数据data.frame使用"Austria "而另一个框架使用"Austria" 。匹配不起作用。

有没有一种很好的方法来在屏幕上“显示”空白，以便我知道这个问题？
我可以删除R中的前导和尾随空白吗？

到目前为止，我曾经写过一个简单的Perl脚本来删除空格，但是如果我可以在R里面做某事，那将会很好。

可能最好的办法是在读取数据文件时处理尾随的空格。如果使用read.csv或read.table ，则可以设置参数strip.white=TRUE 。

如果以后要清理字符串，可以使用下列其中一个函数：

 # returns string w/o leading whitespace trim.leading <- function (x) sub("^\\s+", "", x) # returns string w/o trailing whitespace trim.trailing <- function (x) sub("\\s+$", "", x) # returns string w/o leading or trailing whitespace trim <- function (x) gsub("^\\s+|\\s+$", "", x)

要在myDummy$country使用这些功能myDummy$country ：

  myDummy$country <- trim(myDummy$country)

为了“显示”你可以使用的空白：

  paste(myDummy$country)

它会显示用引号（“）围起来的字符串，使得空格更容易被识别。

从R 3.2.0起，引入了一个新的函数来消除前导/尾随空格：

 trimws()

请参阅： http : //stat.ethz.ch/R-manual/R-patched/library/base/html/trimws.html

要操纵空格，请在stringr包中使用str_trim（）。该软件包的手册日期为2013年2月15日，位于CRAN。该函数还可以处理字符串向量。

 install.packages("stringr", dependencies=TRUE) require(stringr) example(str_trim) d4$clean2<-str_trim(d4$V2)

（信用评论：R. Cotton）

一个简单的函数来删除前导和尾随的空白：

 trim <- function( x ) { gsub("(^[[:space:]]+|[[:space:]]+$)", "", x) }

用法：

 > text = " foo bar baz 3 " > trim(text) [1] "foo bar baz 3"

ad1）要查看空格，您可以使用修改的参数直接调用print.data.frame ：

 print(head(iris), quote=TRUE) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 "5.1" "3.5" "1.4" "0.2" "setosa" # 2 "4.9" "3.0" "1.4" "0.2" "setosa" # 3 "4.7" "3.2" "1.3" "0.2" "setosa" # 4 "4.6" "3.1" "1.5" "0.2" "setosa" # 5 "5.0" "3.6" "1.4" "0.2" "setosa" # 6 "5.4" "3.9" "1.7" "0.4" "setosa"

另请参阅?print.data.frame其他选项。

使用grep或grepl来查找带有空格和子空间的观察数据以摆脱它们。

 names<-c("Ganga Din\t","Shyam Lal","Bulbul ") grep("[[:space:]]+$",names) [1] 1 3 grepl("[[:space:]]+$",names) [1] TRUE FALSE TRUE sub("[[:space:]]+$","",names) [1] "Ganga Din" "Shyam Lal" "Bulbul"

我宁愿添加答案作为评论给用户56，但仍不能这样写作为一个独立的答案。通过gdata包中的trim（）函数也可以去除前导和尾随空白：

 require(gdata) example(trim)

用法示例：

 > trim(" Remove leading and trailing blanks ") [1] "Remove leading and trailing blanks"

另一种选择是使用stringi软件包中的stri_trim函数，该函数默认删除stringi和结尾的空格：

 > x <- c(" leading space","trailing space ") > stri_trim(x) [1] "leading space" "trailing space"

要仅删除前导空格，请使用stri_trim_left 。只删除结尾的空格，使用stri_trim_right 。当你想删除其他的前导或尾随字符，你必须指定与pattern = 。

另请参阅?stri_trim了解更多信息。

如果在输入之间有多个空格，则会出现另一个相关问题：

 > a <- " a string with lots of starting, inter mediate and trailing whitespace "

然后，您可以使用正则表达式将split字符串拆分为“真实”令牌：

 > strsplit(a, split=" +") [[1]] [1] "" "a" "string" "with" "lots" [6] "of" "starting," "inter" "mediate" "and" [11] "trailing" "whitespace"

请注意，如果在（非空）字符串的开头处有一个匹配项，则输出的第一个元素是“”“，但是如果在字符串末尾有一个匹配项，则输出与随着比赛被删除。

我创建了一个trim.strings ()函数来修剪前导和/或尾随空格，如下所示：

 # Arguments: x - character vector # side - side(s) on which to remove whitespace # default : "both" # possible values: c("both", "leading", "trailing") trim.strings <- function(x, side = "both") { if (is.na(match(side, c("both", "leading", "trailing")))) { side <- "both" } if (side == "leading") { sub("^\\s+", "", x) } else { if (side == "trailing") { sub("\\s+$", "", x) } else gsub("^\\s+|\\s+$", "", x) } }

为了说明，

 a <- c(" ABC123 456 ", " ABC123DEF ") # returns string without leading and trailing whitespace trim.strings(a) # [1] "ABC123 456" "ABC123DEF" # returns string without leading whitespace trim.strings(a, side = "leading") # [1] "ABC123 456 " "ABC123DEF " # returns string without trailing whitespace trim.strings(a, side = "trailing") # [1] " ABC123 456" " ABC123DEF"

 myDummy[myDummy$country == "Austria "] <- "Austria"

在此之后，你需要强制R不承认“奥地利”作为一个级别。假设你也有“美国”和“西班牙”这样的级别：

 myDummy$country = factor(myDummy$country, levels=c("Austria", "USA", "Spain"))

比最高的投票反应少一点吓人，但它应该仍然有效。

最好的方法是修剪（）

下面的代码将把这个函数应用到整个数据框

mydataframe < – data.frame（lapply（mydataframe，trimws），stringsAsFactors = FALSE）

如何修剪R中的前导和尾随空白？

VBA：testingstring是否以string开头？

（冒号）GNU Bash内build的目的是什么？

不要在多个bash脚本中显示pushd / popd栈（安静的pushd / popd）

Python总结，为什么不是string？

在Python中如何实现'is'关键字？

我如何阅读有关内置zsh命令的文档？

Java对Integer，Float，Double，Long有可变types吗？

Ruby数组each_slice_with_index？

当一个简单的方法是使用if-else时，为什么我们要使用__builtin_expect