Error saying vector...
 
Notifications
Clear all

Error saying vector size cannot be NA when using R with data mining  

  RSS

Ganesh A
(@ganesh)
Noble Member
Joined: 4 months ago
Posts: 1170
31/03/2021 11:06 am  

I'm using R for data analytics and connected it with elasticsearch and retrieve a dataset of Shakespeare Complete Works.

library("elastic")
connect()
maxi <- count(index = 'shakespeare')
s <- Search(index = 'shakespeare',size=maxi)

dat <- s$hits$hits[[1]]$`_source`$text_entry
for (i in 2:maxi) {
  dat <- c(dat , s$hits$hits[[i]]$`_source`$text_entry)
}
rm(s)

After that I want to do a tf_idf matrix but apparently I can't since it uses too much memory (I have 4GB of RAM), here is my code:

library("tm")
myCorpus <- Corpus(VectorSource(dat))
myCorpus <- tm_map(myCorpus, content_transformer(tolower),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeNumbers),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removePunctuation),lazy = TRUE)
myCorpus <- tm_map(myCorpus, content_transformer(removeWords), stopwords("en"),lazy = TRUE)
myTdm <- TermDocumentMatrix(myCorpus,control = list(weighting = function(x) weightTfIdf(x, normalize = FALSE)))

myCorpus is around 400 Mb.

But then I do:

> m <- as.matrix(myTdm)
Error in vector(typeof(x$v), nr * nc) : vector size cannot be NA
In addition: Warning message:
In nr * nc : NAs produced by integer overflow

Quote
Anamika
(@anamika)
Noble Member
Joined: 4 months ago
Posts: 1179
31/03/2021 11:07 am  

You can use the removesparseterm function. 

Removes sparse terms from a document-term or term-document matrix.

something like this:

# NOT RUN {
 data("crude") 
tdm <- TermDocumentMatrix(crude) 
removeSparseTerms(tdm, 0.2) # }

ReplyQuote
Share: