1
I have a vector, called istrain, with names:
istrain = c("carri", "challeng", ...)
And I intend to turn them into columns of a dataframe, testSparse, which contains frequency of occurrence of words in comments, something like:
testSparse$cool = c(0,0,0,0,13,252,...)
testSparse$court= c(0,0,12,143,53,...)
the testSparse dataframe, after the operation, I would keep the columns:
testSparse$carri = c(0,0,0,0,0,...)
testSparse$challeng = c(0,0,0,0,0,...)
The manual mode is very time consuming, since the vector of new columns has more than 100 occurrences, someone has done or knows a package that does something similar, but faster?
Obs.: language R, are dataframes pre-processing for decision trees for text Mining, and the vector of new columns is the difference between the final corpus of training and testing, and the function caused by these modifications seeks to be more generic, so it can be applied to new text bases, using the same decision tree to check if the comment is offensive or not, but the new text base has words that have not been treated previously, and does not have some that already exist. the frequency of the new words, therefore, must be 0, and with these added can pass the new base and predict whether it is an offensive comment or not, among several classes of "offenses".