R-Project没有适用于'meta'类的对象的适用方法
我试图运行这个代码(Ubuntu 12.04,R 3.1.1)
# Load requisite packages library(tm) library(ggplot2) library(lsa) # Place Enron email snippets into a single vector. text <- c( "To Mr. Ken Lay, I'm writing to urge you to donate the millions of dollars you made from selling Enron stock before the company declared bankruptcy.", "while you netted well over a $100 million, many of Enron's employees were financially devastated when the company declared bankruptcy and their retirement plans were wiped out", "you sold $101 million worth of Enron stock while aggressively urging the company's employees to keep buying it", "This is a reminder of Enron's Email retention policy. The Email retention policy provides as follows . . .", "Furthermore, it is against policy to store Email outside of your Outlook Mailbox and/or your Public Folders. Please do not copy Email onto floppy disks, zip disks, CDs or the network.", "Based on our receipt of various subpoenas, we will be preserving your past and future email. Please be prudent in the circulation of email relating to your work and activities.", "We have recognized over $550 million of fair value gains on stocks via our swaps with Raptor.", "The Raptor accounting treatment looks questionable. a. Enron booked a $500 million gain from equity derivatives from a related party.", "In the third quarter we have a $250 million problem with Raptor 3 if we don't “enhance” the capital structure of Raptor 3 to commit more ENE shares.") view <- factor(rep(c("view 1", "view 2", "view 3"), each = 3)) df <- data.frame(text, view, stringsAsFactors = FALSE) # Prepare mini-Enron corpus corpus <- Corpus(VectorSource(df$text)) corpus <- tm_map(corpus, tolower) corpus <- tm_map(corpus, removePunctuation) corpus <- tm_map(corpus, function(x) removeWords(x, stopwords("english"))) corpus <- tm_map(corpus, stemDocument, language = "english") corpus # check corpus # Mini-Enron corpus with 9 text documents # Compute a term-document matrix that contains occurrance of terms in each email # Compute distance between pairs of documents and scale the multidimentional semantic space (MDS) onto two dimensions td.mat <- as.matrix(TermDocumentMatrix(corpus)) dist.mat <- dist(t(as.matrix(td.mat))) dist.mat # check distance matrix # Compute distance between pairs of documents and scale the multidimentional semantic space onto two dimensions fit <- cmdscale(dist.mat, eig = TRUE, k = 2) points <- data.frame(x = fit$points[, 1], y = fit$points[, 2]) ggplot(points, aes(x = x, y = y)) + geom_point(data = points, aes(x = x, y = y, color = df$view)) + geom_text(data = points, aes(x = x, y = y - 0.2, label = row.names(df)))
但是,当我运行它时,我得到这个错误(在td.mat <- as.matrix(TermDocumentMatrix(corpus))
行):
Error in UseMethod("meta", x) : no applicable method for 'meta' applied to an object of class "character" In addition: Warning message: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code
我不知道要看什么 – 所有的模块加载。
tm
(0.60)的最新版本使得它不能使用tm_map
函数来操作简单的字符值。 所以问题是你的tolower
步骤,因为这不是一个“规范”转换(请参阅getTransformations()
)。 只需更换它
corpus <- tm_map(corpus, content_transformer(tolower))
content_transformer
函数包装器会将所有内容都转换为语料库中正确的数据types。 您可以将content_transformer
与任何旨在操纵字符向量的函数一起使用,以使其可以在tm_map
pipe道中工作。
这有点旧了,但只是为了以后的谷歌search的目的:有一个替代解决scheme。 在corpus <- tm_map(corpus, tolower)
您可以使用corpus <- tm_map(corpus, PlainTextDocument)
,将其corpus <- tm_map(corpus, PlainTextDocument)
回正确的数据types。
我有同样的问题,最后来到一个解决scheme:
似乎在对其应用变换之后,语料库对象中的元信息被破坏。
我所做的只是在完成准备之后,在stream程的最后阶段重新创build语料库。 为了克服其他问题,我写了一个循环,以便将文本复制回我的数据框:
a<- list() for (i in seq_along(corpus)) { a[i] <- gettext(corpus[[i]][[1]]) #Do not use $content here! } df$text <- unlist(a) corpus <- Corpus(VectorSource(df$text)) #This action restores the corpus.
- 何时closuresJDBC中的Connection,Statement,PreparedStatement和ResultSet
- 在Windows上从Python 2.x中的命令行参数中读取Unicode字符