select列中包含像“hsa ..”这样的string的行（部分string匹配）

我有一个371MB的文本文件包含微RNA数据。本质上，我只想select那些有关人类microRNA信息的行。

我使用read.table读取了文件。通常，我会用sqldf完成我想要的 – 如果它有一个“like”语法（从<>select*，其中miRNA如'hsa'）。不幸的是 – sqldf不支持这种语法。

我怎样才能做到这一点在R？我已经看了一下周围的计算器，并没有看到我怎样才能做一个部分string匹配的例子。我甚至安装了stringr软件包 – 但并不完全符合我的需求。

我想要做的就是这样的 – 所有selecthsa- *的行。

selectedRows <- conservedData[, conservedData$miRNA %like% "hsa-"]

这当然是不正确的语法。

有人可以帮我吗？非常感谢您的阅读。

阿斯达

使用grep()来search你正在匹配的string。下面是mtcars数据集的一个例子，我们在这里匹配行名称包含“Merc”的所有行：

 mtcars[grep("Merc", rownames(mtcars)), ] mpg cyl disp hp drat wt qsec vs am gear carb # Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 # Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 # Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 # Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 # Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 # Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 # Merc 450SLC 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3

另外一个例子，使用iris数据集searchstringosa ：

 irisSubset <- iris[grep("osa", iris$Species), ] head(irisSubset) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa

对于你的问题尝试：

 selectedRows <- conservedData[grep("hsa-", conservedData$miRNA), ]

试试str_detect()包中的str_detect() ，它检测string中模式的存在或不存在。

下面是一个也包含dplyr包中的%>% pipe和filter()的方法：

 library(stringr) library(dplyr) CO2 %>% filter(str_detect(Treatment, "non")) Plant Type Treatment conc uptake 1 Qn1 Quebec nonchilled 95 16.0 2 Qn1 Quebec nonchilled 175 30.4 3 Qn1 Quebec nonchilled 250 34.8 4 Qn1 Quebec nonchilled 350 37.2 5 Qn1 Quebec nonchilled 500 35.3 ...

这将过滤处理variables包含子string“non”的行的样本CO2数据集（R附带）。您可以调整是否str_detectfind固定的匹配或使用正则expression式 – 请参阅stringr包的文档。

LIKE应该在sqlite中工作：

 require(sqldf) df <- data.frame(name = c('bob','robert','peter'),id=c(1,2,3)) sqldf("select * from df where name LIKE '%er%'") name id 1 robert 2 2 peter 3

select列中包含像“hsa ..”这样的string的行（部分string匹配）

有没有一个正则expression式的味道，让我可以计算*和+操作符匹配的重复次数？

JavaScript的正则expression式循环所有匹配

Python的re.search和re.match有什么区别？

匹配所有开始的类select器？

函数eregi（）已被弃用

R中的条件合并/replace

如何指定“空格或string的结尾”和“空格或string的开始”？

如何根据值testing多个variables？

正则expression式包含一个字或另一个字

用正则expression式匹配数字 – 只有数字和逗号