将一列添加到data.frame

我有下面的data.frame。我想添加一个列，按照第1列（ h_no ）对数据进行分类，即h_no 1,2,3,4的第一个序列是class 1， h_no （1到7）的第二个序列是class 2如最后一栏所示。

 h_no h_freq h_freqsq 1 0.09091 0.008264628 1 2 0.00000 0.000000000 1 3 0.04545 0.002065702 1 4 0.00000 0.000000000 1 1 0.13636 0.018594050 2 2 0.00000 0.000000000 2 3 0.00000 0.000000000 2 4 0.04545 0.002065702 2 5 0.31818 0.101238512 2 6 0.00000 0.000000000 2 7 0.50000 0.250000000 2 1 0.13636 0.018594050 3 2 0.09091 0.008264628 3 3 0.40909 0.167354628 3 4 0.04545 0.002065702 3

您可以使用各种技术为您的数据添加一列。下面的引用来自相关帮助文本的“详细信息”部分[[.data.frame 。

dataframe可以用几种模式索引。当[和[[与单个向量索引（ x[i]或x[[i]] ），它们将dataframe索引为列表。

 my.dataframe["new.col"] <- a.vector my.dataframe[["new.col"]] <- a.vector

$的data.frame方法将x视为一个列表

 my.dataframe$new.col <- a.vector

当[和[[与两个索引（ x[i, j]和x[[i, j]] ）一起使用时，它们就像索引一个matrix

 my.dataframe[ , "new.col"] <- a.vector

由于data.frame的方法假定如果您不指定是否使用列或行，它将假定您是指列。

对于你的例子，这应该工作：

 # make some fake data your.df <- data.frame(no = c(1:4, 1:7, 1:5), h_freq = runif(16), h_freqsq = runif(16)) # find where one appears and from <- which(your.df$no == 1) to <- c((from-1)[-1], nrow(your.df)) # up to which point the sequence runs # generate a sequence (len) and based on its length, repeat a consecutive number len times get.seq <- mapply(from, to, 1:length(from), FUN = function(x, y, z) { len <- length(seq(from = x[1], to = y[1])) return(rep(z, times = len)) }) # when we unlist, we get a vector your.df$group <- unlist(get.seq) # and append it to your original data.frame. since this is # designating a group, it makes sense to make it a factor your.df$group <- as.factor(your.df$group) no h_freq h_freqsq group 1 1 0.40998238 0.06463876 1 2 2 0.98086928 0.33093795 1 3 3 0.28908651 0.74077119 1 4 4 0.10476768 0.56784786 1 5 1 0.75478995 0.60479945 2 6 2 0.26974011 0.95231761 2 7 3 0.53676266 0.74370154 2 8 4 0.99784066 0.37499294 2 9 5 0.89771767 0.83467805 2 10 6 0.05363139 0.32066178 2 11 7 0.71741529 0.84572717 2 12 1 0.10654430 0.32917711 3 13 2 0.41971959 0.87155514 3 14 3 0.32432646 0.65789294 3 15 4 0.77896780 0.27599187 3 16 5 0.06100008 0.55399326 3

轻松：您的数据框是A.

 b <- A[,1] b <- b==1 b <- cumsum(b)

然后你得到列b。

如果我正确地理解了这个问题，你想要检测何时h_no不增加，然后增加class 。（我要走过我如何解决这个问题，最后还有一个独立的function。）

加工

我们现在只关心h_no列，所以我们可以从数据框中提取：

 > h_no <- data$h_no

我们希望检测何时h_no不上升，当连续元素之间的差值是负数或零时，我们可以通过计算出来。 R提供了diff函数，它给了我们不同的向量：

 > d.h_no <- diff(h_no) > d.h_no [1] 1 1 1 -3 1 1 1 1 1 1 -6 1 1 1

一旦我们有了这个，find那些不积极的东西是一件简单的事情：

 > nonpos <- d.h_no <= 0 > nonpos [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE [13] FALSE FALSE

在R中， TRUE和FALSE基本上与1和0相同，所以如果我们得到nonpos的累加和，它将在（几乎）适当的点上增加1。 cumsum函数（这基本上是diff的相反）可以做到这一点。

 > cumsum(nonpos) [1] 0 0 0 1 1 1 1 1 1 1 2 2 2 2

但是，有两个问题：数字太小; 而且，我们错过了第一个元素（第一个类中应该有四个元素）。

第一个问题简单解决： 1+cumsum(nonpos) 。第二个只需要在向量的前面添加1 ，因为第一个元素总是在类1 ：

  > classes <- c(1, 1 + cumsum(nonpos)) > classes [1] 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3

现在，我们可以使用cbind将其重新附加到我们的数据框架上（通过使用class=语法，我们可以给列标题）：

  > data_w_classes <- cbind(data, class=classes)

data_w_classes现在包含结果。

最后结果

我们可以将这些线条压缩在一起，并将其全部包装到一个函数中，以便于使用：

 classify <- function(data) { cbind(data, class=c(1, 1 + cumsum(diff(data$h_no) <= 0))) }

或者，因为这个class是有意义的因素：

 classify <- function(data) { cbind(data, class=factor(c(1, 1 + cumsum(diff(data$h_no) <= 0)))) }

您可以使用以下任一function：

 > classified <- classify(data) # doesn't overwrite data > data <- classify(data) # data now has the "class" column

（这个解决这个问题的方法是很好的，因为它避免了显式的迭代，这通常被推荐用于R，并且避免了生成大量的中间向量和列表等等。而且它也很整洁，可以写在一行:)

除了罗马的答案，这样的事情可能会更简单。请注意，我没有testing它，因为我现在没有权限访问R。

 # Note that I use a global variable here # normally not advisable, but I liked the # use here to make the code shorter index <<- 0 new_column = sapply(df$h_no, function(x) { if(x == 1) index = index + 1 return(index) })

该函数遍历n_ho的值并始终返回当前值所属的类别。如果检测到值1 ，我们增加全局variablesindex并继续。

 Data.frame[,'h_new_column'] <- as.integer(Data.frame[,'h_no'], breaks=c(1, 4, 7))

将一列添加到data.frame

加工

最后结果

成对重复从数据框中删除

重新排列数据框到一个表，相反的“融化”

如何用R数据框中的零代替NA值？

Rdataframe中的行数基于组

如何将Vector分割成列 – 使用PySpark

在子集数据框中放置因子水平

省略包含NA的特定列的行

如何在Spark DataFrame中添加一个常量列？

添加新行到数据框，在特定的行索引，不附加？

合并不相等的数据框并用0replace缺失的行