1
I have a large database 1800 rows and 50 columns.
Of these 50 columns 3 of them (Density, Biomass and BMI) are answers. I must compare one to one with the other variables. I am doing normality tests, ANOVA, Tukey etc. An exploratory analysis to describe the behavior between them. There is someone helping me but I would like to solve some problems that sometimes prevent me from moving forward alone and sometimes this person is very busy. Another detail is that I’m a beginner, but enthusiastic now with the R
.
LET’S GET TO THE PROBLEM....
I do all the tests with Density (at least 10) and when I change the response variable by Biomass o R
Get back to me with the Density occurrences.
We created a subset for the other response variables (Biomass and BMI).
But here comes the problem: in some cases I have variables that have less than 1% participation in the data and in addition an "N" less than 5.
I need to take this out of analysis and I don’t know how to do it on the created subset.
In the case of the first response variable (Density) we use the command: %in%
C ("variable to be removed from the analysis")
however in the case of subset the script changes and I don’t know how to insert this command or another.
Hi Herlon, put the code you are using to do this. The way your question looks is difficult to answer objectively. You need to look at it from here and understand how subsetting works in R. http://www.statmethods.net/management/subset.html
– Daniel Falbel
Dear Daniel, thank you for your willingness to try to help me and I apologize for the "mishap" in the information, I really asked the question at the time when I was trying to solve the problem. Let’s see if I can be clearer now, I’ll number. 1 - my base has 1700 rows and 50 columns (variables); 2 - of those 50 base 3 variables I should use as an answer. 4 - So I do the first tests with the answer variable 1 and everything I want to test (10 other variables at least); 5 - when I go to variable 2 and then to 3 I create the subsets, but then R uses everything from variable 1
– Herlon Nadolny
That is, variable 1 has 1700 data, variable 2 and 3, less than 1000. I hope you can understand now. Thank you
– Herlon Nadolny