Calculate the standard deviation of a vector

Asked

Viewed 6,982 times

1

I’m having trouble solving the following equation:

Equação

Here is the code:

#include <stdio.h>
#include <math.h>

int main(){
    float m, media, sigma, p;
    int vetor[10];
    media = 0;
    m = 0;
    sigma = 0;
    p = 0;
    for(int i = 0; i < 10; i++){
        printf("Digite um número: ");
        scanf("%d", &vetor[i]);
    }
    for(int i = 0; i < 10; i++){
        m = m + vetor[i];
    }
    media = m / 10.0;
    for(int i = 0; i < 10; i++){
        p = p + (vetor[i] - media);
    }
    sigma = sqrt((p * 1)/10);
    printf("Resultado d = %.2f\n", sigma);
}
  • Your formula, without squaring the individual variations of each element, should return zero. Easy to see: n times the average of n elements is the same thing as the sum of n elements. As you are putting the average as a positive factor and the element as a negative factor, the sum of n averages will give the sum of n elements, which cancel each other out

3 answers

4

The standard deviation formula has to be squared away, which is not in yours. See this formula withdrawal directly from wikipedia:

inserir a descrição da imagem aqui

In which (xi- x)² is raised to ²

In this formula the i begins in 1 and goes up to N which corresponds to your starting in 0 and goes up to N-1, thus not affecting the calculations.

Applying this correction to your code:

int main(){
    ...
    media = m / 10.0;
    for(i = 0; i < 10; i++){
        p = p + pow(vetor[i] - media,2); //agora quadrado aqui utilizando a função pow
    }
    sigma = sqrt(p/(10-1)); //dividir por 10-1 que faltava, ou 9 se quiser simplificar
    printf("Resultado d = %.2f\n", sigma);

    return 0;
}

See the example in Ideone

  • The denominator of the standard deviation in the formula is n, but in the code n-1. I have seen use of both forms for standard deviation and it seems to me that both are right, but there is a predilection for n-1 that I can’t explain

  • 1

    @Jeffersonquesado In relation to the indices of the sum makes no difference and ends up being more practical for programming start at 0. In relation to the denominator are two different versions. Which is the uncorrected version and the corrected version, so both are valid only represent different variants of standard deviation.

4

The formula is wrong. You must take the sum of the variation of the mean squared.

Here is your revised and simplified program:

#include <stdio.h>
#include <math.h>

#define QTD_ELEMENTOS 5

int main() {
    int vetor[QTD_ELEMENTOS];

    for (int i = 0; i < QTD_ELEMENTOS; i++) {
        //printf("Digite um número: ");
        scanf("%d", &vetor[i]);
    }

    int somatorio = 0;
    for (int i = 0; i < QTD_ELEMENTOS; i++) {
        somatorio += vetor[i];
    }

    float media = somatorio / (float) QTD_ELEMENTOS;

    float variacoes = 0;
    for (int i = 0; i < QTD_ELEMENTOS; i++) {
        float v = vetor[i] - media;
        variacoes += v * v;
    }

    float sigma = sqrt(variacoes / QTD_ELEMENTOS);
    printf("Resultado d = %.2f\n", sigma);
}

See here working on ideone.

  • On the standard deviation formula, I’ve seen both use as a denominator n (your case and @Lacobus’s) as n-1 (AP formula and @Isac code). I get the impression that both forms are valid, but it’s worth noting this. In many references I find more with n-1 than with n

2

Consider the following set containing 10 amostras:

{ 2, 3, 3, 4, 5, 6, 7, 8, 9, 10 }

First, we calculate the simple arithmetic mean of the samples from the assembly:

inserir a descrição da imagem aqui

We then calculate the deviation of all these samples from the average:

inserir a descrição da imagem aqui

Thus, we squared the deviation of each sample in relation to the average:

inserir a descrição da imagem aqui

With this, we are able to calculate the Variance:

inserir a descrição da imagem aqui

The standard deviation extracting the square root of variance:

inserir a descrição da imagem aqui

Follows a code able to calculate separately the "Average", the "Variance" and the "Standard Deviation" of a set of values:

#include <stdio.h>
#include <math.h>

#define MAXSIZE 10

double media( double s[], int n )
{
    double sum = 0.0;
    int i = 0;

    for( i = 0; i < n; i++ )
        sum += s[i];

    return sum / n;
}

double variancia( double s[], int n )
{
    double sum = 0.0;
    double dev = 0.0;
    double med = media( s, n );
    int i = 0;

    for( i = 0; i < n; i++ )
    {
        dev = s[i] - med;
        sum += (dev * dev);
    }

    return sum / n;
}

double desvio_padrao( double s[], int n  )
{
    double v = variancia( s, n );
    return sqrt( v );
}

int main( void )
{
    double vetor[ MAXSIZE ];
    int  i;

    for( i = 0; i < MAXSIZE; i++ )
    {
        printf("Digite um numero: ");
        scanf( "%lf", &vetor[i] );
    }

    printf("Media = %g\n", media( vetor, MAXSIZE ) );
    printf("Variancia = %g\n", variancia( vetor, MAXSIZE ) );
    printf("Desvio Padrao = %g\n", desvio_padrao( vetor, MAXSIZE ) );

    return 0;
}

Compiling:

$ gcc -lm desvio.c -o desvio

Testing:

Digite um numero: 2
Digite um numero: 3
Digite um numero: 3
Digite um numero: 4
Digite um numero: 5
Digite um numero: 6
Digite um numero: 7
Digite um numero: 8
Digite um numero: 9
Digite um numero: 10
Media = 5.7
Variancia = 6.81
Desvio Padrao = 2.6096
  • 1

    Some (not all) put the denominator of the standard deviation (and I believe it gives variance too) n-1 in place of only n, but the reason is obscure to me

  • 1

    @Jeffersonquesado: That denominator n - 1 that you mentioned is used to "compensate" the standard deviation when the mean value used is not the real mean, your samples may have been obtained from another experiment in an identical scenario. That one article explains well the reason.

  • 1

    Citing the article that mentioned "(...) but if you are calculating the Mean value of the data from the data itself (by summing the data & Dividing by n or using the button on the Calculator) use the n-1 version (...)", it would not make more sense in this case to use the N-1 ?

  • @Isac: I understand that they are two different equations: 1) desvio padrão populacional and 2) desvio padrão amostral. In the case of the answer, the equation of population standard deviation because my sample set corresponds to a whole population and not part of a. The sample standard deviation, where n - 1, is used to calculate when only a portion of population data is available.

  • That one link explains even better the "rule".

  • 1

    @Lacobus This second link better elaborates the practical application of the two. It is still vague the practical application of the two in this question not? In my view both would be valid, since the 10 values of v in the question can refer to 10 grades of 10 students and 10 cholesterol analyses of 10 individuals of a large population. Or I’ll be inferring too much ?

  • 1

    @Isac: You’re right in both directions, and I don’t see too much inference on your part, after all Scientia vincere! I believe that "the ends" of a given statistical experiment is what determines the most appropriate method to be applied. In the case at hand, I believe that what is vague is precisely the applicability and purpose of the experiment. Therefore, I agree with you, both equations would be valid in the answer.

Show 2 more comments

Browser other questions tagged

You are not signed in. Login or sign up in order to post.