How to insert the values and their frequency in a data.frame, from a set obtained by SAMPLE?

Question

How to insert the values and their frequency in a data.frame, from a set obtained by SAMPLE?

Asked 8 years, 6 months ago

Viewed 304 times

2

First get a sequence of random values

set.seed(100)
estat <- sample(1:20, replace=TRUE)
estat
 [1]  7  6 12  2 10 10 17  8 11  4 13 18  6  8 16 14  5  8  8 14

The idea would be: 1 Would it be possible to impose on SAMPLE that the sum of the values obtained is 200 ? 2 sort values and their frequencies in table format

The intention is to set up a statistical table for simple calculation of Mean, Variance, SD, Mean Deviation, CV, Asymmetry and Kurtosis.

Thus, all results would be performed and saved in the table.

1 answer

Browser other questions tagged r

You are not signed in. Login or sign up in order to post.

by Marcus Nunes • **17,915** points · Answer 1 · 2017-01-10T02:41:30+00:00

Let X_1, X_2, ..., X_n be a sequence of numbers. Let X = X_1 + X_2 + ... + X_n. If I divide the value of each X_i by X, the sum X_1/X + X_2/X + ... + X_n/X will always have value 1. This is a normalization type. If I multiply each side of this equality by 200,.

So just apply this idea to R to get the desired result. I created a function called amostra who does this.

amostra <- function(x=1:20, size=20, replace=TRUE, limit=200){
  estat <- sample(x, size, replace=replace)
  estat <- round(estat/sum(estat)*limit)
  if (sum(estat) == limit){
    return(estat)
  } else {
    return(c(estat[1:(size-1)], limit-sum(estat[1:(size-1)])))
  }
}

x <- amostra(1:20, 20, limit=200)
x
[1]  4 12 12 13 12 13  2 12 11  2 14  7 12 17 12  3 11  5 11 15
sum(x)
[1] 200

This function has 4 arguments:

x: the possible values the sample can take (integers 1 to 20)

size: the sample size to be created (the default is 20)

replace: indicates (the default is to have replacement)

limit: total sum limit (default is 200)

Due to rounding problems, I did a little trick in the algorithm. It draws n elements from the sample and tests whether the sum is equal to limit. If equal, it returns the sample sought.

If different, the last element is determined by the formula limit-sum(estat[1:(size-1)]), which is the difference between the target sum and the sum of the n-1 first elements of the sample.

If this were not done, there would be no guarantee that the final sum of the elements would be equal to limit.

The command table order the values and their respective frequencies:

table(x)
x
 2  3  4  5  7 11 12 13 14 15 17 
 2  1  1  1  1  3  6  2  1  1  1

From this, finally, it is possible to calculate the desired statistics, creating a data frame with the answers:

as.data.frame(table(x))
    x Freq
1   2    2
2   3    1
3   4    1
4   5    1
5   7    1
6  11    3
7  12    6
8  13    2
9  14    1
10 15    1
11 17    1