Calculating T-Student Probabilities in R

Asked

Viewed 1,695 times

4

I have the mean and standard deviation of my distribution:

mean = -0.49 ; sd=3.029041

How do I calculate the probability of y a standard deviation below the average using the T-Student with 85 degrees of freedom? It would be: P(y< média(y)-sd(y))?

1-pt(mean-sd, df = 85, lower.tail = FALSE)

Is that right or should I do it like this:

1-pt(((mean-sd-mean)/sd), df = 85, lower.tail = FALSE)

Edited:

I want to calculate that probability P(x< média(x)-sd(x)) using the table T-Student. As the média and the standard deviation is sample I should use the distribution T-Student, for this I must standardize: t = (X - meadia(x))/Sd(x) ~ t-Student. Correct?

Since it’s not about the sample media I don’t need to use the raiz of n. Right then in my case the X would be: X = média(x)-sd(x)

Standardizing for T-Student:

t = (X - media(x))/Sd(x) = (média(x)-sd(x) - media(X))/Sd(X)

That way what I want to calculate:

P((x< média(x)-sd(x)) = P (X - media(x))/Sd(x))< (média(x)-sd(x) - media(x))/Sd(x) = P (t < (média(x)-sd(x)-media(x))/Sd(x)))

Is this correct? How to do it in R?

  • This mean and this standard deviation are of the distribution or of a sample size equal to 86?

  • Sample size equal to 86.

1 answer

6


I should not make this calculation in any of these ways. The way it is formulated, the question does not seem to me to make much sense. The Student t distribution is always centered at zero (unless it is a noncentral Student t distribution, which does not seem to be the case). So, for your problem, you’ll always be calculating a probability that won’t be tied to estimating your sample’s average. This may not be apparent with a small average like this example, but increase the average value to 100, for example, and see what I’m talking about.

Sample mean has asymptotically normal distribution, with mean equal to µ and variance Σ 2/n, where µ is the population mean, Σ 2 is the population variance and n is the sample size. So it’s easy to see that we can use the normal distribution to calculate the probability of a random variable being a standard deviation below the mean and a standard deviation above.

set.seed(1234)
x <- rnorm(86, mean=.5, sd=.3) # amostra aleatoria

media <- mean(x) # estimador pontual da media
erro_padrao <- sd(x)/sqrt(length(x)) # estimador do erro padrao

media-erro_padrao # media - erro padrao
[1] 0.4683514
media+erro_padrao # media + erro padrao
[1] 0.5321069

pnorm(media-erro_padrao, mean=media, sd=erro_padrao, lower.tail=TRUE)
[1] 0.1586553
pnorm(media+erro_padrao, mean=media, sd=erro_padrao, lower.tail=FALSE)
[1] 0.1586553

The question isn’t very detailed, so I can’t be sure what your real reason for calculating these odds is. Maybe if there are more details about your real goal, the people in the forum will be able to help you a little more.


Complement after editing the question: for me, this problem still does not make sense. I may simply be having trouble understanding it, but I will try to explain it in items because I believe it cannot be solved this way.

  1. Where did the data analyzed come from? Saying that a distribution has an average -0.49 and standard deviation 3.029041 does not mean much. Is it symmetric to the average, for example? Does it have many outliers? Does it have bell shape? From U?

  2. Why use t distribution? Even if your data came from a sample, I would only use t if I had any suspicions about heavy tails in your distribution. In addition, the calculation of variable standardization is only defined for variables with approximately normal distribution. Even if your data has t distribution, the heavy tails of this distribution will influence this calculation because, well, your variable has t Student distribution and the standardization is not defined in this case.

  3. The formula (x-Mean(x))/sd(x) only works if x has an approximately normal distribution due to Central Limit Theorem. This theorem is only defined for random variables with asymptotically normal distribution. So I solved this problem in the way I presented earlier: the sample mean has asymptotically normal distribution, regardless of the distribution of the random variables

Is it possible to do it the way you are doing it? Yes, but it won’t be right. This proposed standardization does not exist for a t. Thus, you will get something similar to a z value, but that has no real meaning. After all, what does (x-Mean(x))/sd(x) mean in t? What is the distribution of this transformation? I don’t know if it’s t. I only know the case where x is normal or the case where we use the sample average.

If your data is normal, use the normal accumulated data directly. And it is not necessary to even standardize the variable, because it is possible to calculate these probabilities directly. Unless, of course, you want to find these values in a table. Then you can make this transformation smoothly.

  • Hi edited the question. Can you check? Thank you!

  • Thank you, Marcus. I think I’ll go back to the books. My idea was this: this is my table t student tabela_student=qt(1:99/100, df=length(thau)-1, lower.tail = TRUE, log.p = FALSE) . Once I assume that x has normal distribution and that the standard deviation has chi square distribution, I thought that by doing this above standardization I would "exit" the t distribution and could calular qlqer probability using this table.

  • 1

    Just one more correction: if the random variable x has normal distribution and a sample of size n has been taken, then what has chi-square distribution with (n-1) degrees of freedom is ((n-1)/sigma 2)*S 2, where S 2 is the estimator of variance sampling.

Browser other questions tagged

You are not signed in. Login or sign up in order to post.