在R中嵌套ifelse语句
我在这里是新来的,在R我是初学者。我在Windows7上使用最新的R 3.0.1。
我仍然在学习如何将SAS代码翻译成R,并得到警告。 我需要明白我犯的错误。 我想做的是创造一个总结和区分一个人口的大陆,海外,外国人的变数。 我有一个数据库与2个variables:
- id国籍:
idnat
(法语,外国人),
如果idnat
是法语的话:
- id出生地:
idbp
(大陆,殖民地,海外)
我想将idnat
和idbp
的信息idbp
到一个名为idnat2
的新variables中:
- 状态:k(大陆,海外,外国人)
所有这些variables都使用“字符types”。
列idnat2中的预期结果:
idnat idbp idnat2 1 french mainland mainland 2 french colony overseas 3 french overseas overseas 4 foreign foreign foreign
这是我想要在R中翻译的SAS代码:
if idnat = "french" then do; if idbp in ("overseas","colony") then idnat2 = "overseas"; else idnat2 = "mainland"; end; else idnat2 = "foreigner"; run;
这是我在R的尝试:
if(idnat=="french"){ idnat2 <- "mainland" } else if(idbp=="overseas"|idbp=="colony"){ idnat2 <- "overseas" } else { idnat2 <- "foreigner" }
我收到这个警告:
Warning message: In if (idnat=="french") { : the condition has length > 1 and only the first element will be used
我被build议使用“嵌套ifelse
”,而不是它的简单性,但得到更多的警告:
idnat2 <- ifelse (idnat=="french", "mainland", ifelse (idbp=="overseas"|idbp=="colony", "overseas") ) else (idnat2 <- "foreigner")
根据警告消息,长度大于1,所以只考虑第一个括号之间的内容。 对不起,但我不明白这个长度与这里有什么关系? 谁知道我错了?
如果您正在使用任何电子表格应用程序,则有一个基本的函数if()
with syntax:
if(<condition>, <yes>, <no>)
语法与R中的ifelse()
完全相同:
ifelse(<condition>, <yes>, <no>)
if()
在电子表格应用程序中的唯一区别是R ifelse()
是vector化的(将vector作为input,并将输出返回给vector)。 考虑以下电子表格应用程序中的公式比较和R中的一个例子,如果a> b,则返回1,否则返回0。
在电子表格中:
ABC 1 3 1 =if(A1 > B1, 1, 0) 2 2 2 =if(A2 > B2, 1, 0) 3 1 3 =if(A3 > B3, 1, 0)
在R:
> a <- 3:1; b <- 1:3 > ifelse(a > b, 1, 0) [1] 1 0 0
ifelse()
可以以多种方式嵌套:
ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>)) ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>) ifelse(<condition>, ifelse(<condition>, <yes>, <no>), ifelse(<condition>, <yes>, <no>) ) ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>) ) )
要计算列idnat2
您可以:
df <- read.table(header=TRUE, text=" idnat idbp idnat2 french mainland mainland french colony overseas french overseas overseas foreign foreign foreign" ) with(df, ifelse(idnat=="french", ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign") )
R文档
什么是the condition has length > 1 and only the first element will be used
? 让我们来看看:
> # What is first condition really testing? > with(df, idnat=="french") [1] TRUE TRUE TRUE FALSE > # This is result of vectorized function - equality of all elements in idnat and > # string "french" is tested. > # Vector of logical values is returned (has the same length as idnat) > df$idnat2 <- with(df, + if(idnat=="french"){ + idnat2 <- "xxx" + } + ) Warning message: In if (idnat == "french") { : the condition has length > 1 and only the first element will be used > # Note that the first element of comparison is TRUE and that's whay we get: > df idnat idbp idnat2 1 french mainland xxx 2 french colony xxx 3 french overseas xxx 4 foreign foreign xxx > # There is really logic in it, you have to get used to it
我还可以使用if()
吗? 是的,你可以,但语法不是很酷:)
test <- function(x) { if(x=="french") { "french" } else{ "not really french" } } apply(array(df[["idnat"]]),MARGIN=1, FUN=test)
如果您熟悉SQL,则还可以在sqldf
软件包中使用CASE
语句 。
尝试如下所示:
# some sample data idnat <- sample(c("french","foreigner"),100,TRUE) idbp <- rep(NA,100) idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE) # recoding out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland", ifelse(idbp %in% c("overseas","colony"),"overseas", "foreigner")) cbind(idnat,idbp,out) # check result
你的困惑来自于SAS和R如何处理其他结构。 在R中, if
和else
不是vector化,意思是它们检查单个条件是否为真(即, if("french"=="french")
工作)并且不能处理多个逻辑(即, if(c("french","foreigner")=="french")
不起作用)和R给你警告你收到。
相比之下, ifelse
是vector化的,所以它可以把你的向量(akainputvariables)和每个元素的逻辑条件进行testing,就像你在SAS中习惯的那样。 另一种方法是用if
和else
语句来构build一个循环(正如你在这里所做的那样),但是vector化的ifelse
方法将会更有效率,并且通常涉及更less的代码。
如果没有if
和ifelse
你可以创buildvectorifelse
。
functionreplace
可以用来replace所有出现的"colony"
与"overseas"
:
idnat2 <- replace(idbp, idbp == "colony", "overseas")
如果数据集包含许多行,则使用data.table
而不是嵌套的ifelse()
与查找表连接可能更有效。
提供了下面的查找表
lookup
idnat idbp idnat2 1: french mainland mainland 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign
和一个样本数据集
library(data.table) n_row <- 10L set.seed(1L) DT <- data.table(idnat = "french", idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE)) DT[idbp == "foreign", idnat := "foreign"][]
idnat idbp 1: french colony 2: french colony 3: french overseas 4: foreign foreign 5: french mainland 6: foreign foreign 7: foreign foreign 8: french overseas 9: french overseas 10: french mainland
那么我们可以在join时进行更新 :
DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][]
idnat idbp idnat2 1: french colony overseas 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign 5: french mainland mainland 6: foreign foreign foreign 7: foreign foreign foreign 8: french overseas overseas 9: french overseas overseas 10: french mainland mainland
将SQL CASE语句与dplyr和sqldf包一起使用:
数据
df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", "french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", "idbp"), class = "data.frame", row.names = c(NA, -4L))
sqldf
library(sqldf) sqldf("SELECT idnat, idbp, CASE WHEN idbp IN ('colony', 'overseas') THEN 'overseas' ELSE idbp END AS idnat2 FROM df")
dplyr
library(dplyr) df %>% mutate(idnat2 = case_when(.$idbp == 'mainland' ~ "mainland", .$idbp %in% c("colony", "overseas") ~ "overseas", TRUE ~ "foreign"))
产量
idnat idbp idnat2 1 french mainland mainland 2 french colony overseas 3 french overseas overseas 4 foreign foreign foreign
使用data.table,解决scheme是:
DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))]
ifelse
是vector化的。 if-else
不是。 在这里,DT是:
idnat idbp 1 french mainland 2 french colony 3 french overseas 4 foreign foreign
这给了:
idnat idbp idnat2 1: french mainland mainland 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign
- IntelliSense不适用于Visual Studio 2012中的JavaScript
- Visual Studio 2012 Express突然“与此版本的Windows不兼容”?
- Python中多个级别的“collection.defaultdict”
- 在SQL Server中执行嵌套的case语句逻辑的最佳方法
- Angular UI路由器 – 如何访问从父模板传递的嵌套命名视图中的参数?
- Visual Studio 2012将无法启动
- 在Visual Studio 2012中删除工具>创buildGUID?
- 如何设置在JavaScript中的string名称的对象属性(的..的对象属性)?
- 我在哪里可以findVS2012的主题