How to insert the values and their frequency in a data.frame, from a set obtained by SAMPLE?

Asked

Viewed 304 times

2

First get a sequence of random values

set.seed(100)
estat <- sample(1:20, replace=TRUE)
estat
 [1]  7  6 12  2 10 10 17  8 11  4 13 18  6  8 16 14  5  8  8 14

The idea would be: 1 Would it be possible to impose on SAMPLE that the sum of the values obtained is 200 ? 2 sort values and their frequencies in table format

The intention is to set up a statistical table for simple calculation of Mean, Variance, SD, Mean Deviation, CV, Asymmetry and Kurtosis.

Thus, all results would be performed and saved in the table.

1 answer

2


Let X_1, X_2, ..., X_n be a sequence of numbers. Let X = X_1 + X_2 + ... + X_n. If I divide the value of each X_i by X, the sum X_1/X + X_2/X + ... + X_n/X will always have value 1. This is a normalization type. If I multiply each side of this equality by 200,.

So just apply this idea to R to get the desired result. I created a function called amostra who does this.

amostra <- function(x=1:20, size=20, replace=TRUE, limit=200){
  estat <- sample(x, size, replace=replace)
  estat <- round(estat/sum(estat)*limit)
  if (sum(estat) == limit){
    return(estat)
  } else {
    return(c(estat[1:(size-1)], limit-sum(estat[1:(size-1)])))
  }
}

x <- amostra(1:20, 20, limit=200)
x
[1]  4 12 12 13 12 13  2 12 11  2 14  7 12 17 12  3 11  5 11 15
sum(x)
[1] 200

This function has 4 arguments:

x: the possible values the sample can take (integers 1 to 20)

size: the sample size to be created (the default is 20)

replace: indicates (the default is to have replacement)

limit: total sum limit (default is 200)

Due to rounding problems, I did a little trick in the algorithm. It draws n elements from the sample and tests whether the sum is equal to limit. If equal, it returns the sample sought.

If different, the last element is determined by the formula limit-sum(estat[1:(size-1)]), which is the difference between the target sum and the sum of the n-1 first elements of the sample.

If this were not done, there would be no guarantee that the final sum of the elements would be equal to limit.

The command table order the values and their respective frequencies:

table(x)
x
 2  3  4  5  7 11 12 13 14 15 17 
 2  1  1  1  1  3  6  2  1  1  1 

From this, finally, it is possible to calculate the desired statistics, creating a data frame with the answers:

as.data.frame(table(x))
    x Freq
1   2    2
2   3    1
3   4    1
4   5    1
5   7    1
6  11    3
7  12    6
8  13    2
9  14    1
10 15    1
11 17    1
  • Marcus, thank you very much !!! Very good structuring and explanation. I will keep as CMD for posteriori. My idea was to be able to use the 'limiter' only to facilitate the Media calculations and the others. However, it needed to get the frequency data (as shown in the 2nd line of the output table(x)) as a new column. How to capture these results and put them into a data.frame, like a column ? xi fi 2 2 3 1

  • See the edition I made.

  • Marcus, perfect !!! I want to set up a whole statistical table for students, step by step, to store the history of the calculations performed, mounting column by column, being given the values of x and its frequencies.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.