How to average values in a list using python?

Asked

Viewed 326 times

-1

Hello, I have a list of python values that alternate between positive and negative values, and I want to work with the average of those values, However, I need that when the number has been positive, start counting the values and the last negative number before being positive again, this counter stops and takes the average of these values and then starts a new count. Here is an example of the data type:

   0;  2.3360; 0.4675
   1;  1.7439; 0.4174
   2;  1.3766; 0.3673
   3;  1.3766; 0.1719
   4;  1.4002; 0.1719
   5;  1.5687; 0.1719
   6;  2.2238; -0.6552
   7;  1.6181; -0.6552
   8;  2.2797; -0.6552
   9;  2.9562; -0.6552
  10;  3.4301; -0.6552
  11;  3.7597; -0.6552
  12;  4.0999; -0.6552
  13;  4.6294; -0.6552
  14;  4.4860; -0.6552
  15;  4.4504; 0.0356
  16;  4.3090; 0.1414
  17;  3.9967; 0.1556
  18;  3.8269; 0.1698
  19;  3.4952; 0.1978
  20;  3.2694; 0.1307
  21;  3.2059; 0.0635
  22;  3.1428; 0.0631
  23;  3.0802; 0.0626
  24;  2.9562; -0.0619
  25;  2.8950; -0.0612
  25;  2.8950; -0.0612
  26;  2.4214; -0.1155
  27;  2.2517; -0.1697
  28;  2.0055; -0.1900
  29;  1.7952; -0.1835
  30;  1.7952; 0.1835

For this case, for example, I would need to take the average from 0 to 14, from 15 to 29, and from 30 a new count would start.

However this average is the average of the values in the second column, vazmed[1], but the range of that average varies according to the third column vazmed[2].

I’m trying to do this with an auxiliary variable, but without success.

Follow the code I’ve built so far:

arquivo = open('vazdif.out', 'rt')

vazmed1 = []
vazmed2 = []

i = 0

for linha in arquivo:
    campo = linha.split(';')
    vaz1 = float(campo[1])
    vaz2 = float(campo[2])
    vazmed1.append(vaz1)
    vazmed2.append(vaz2)
    i = i+1
n = len(vazmed1)
m = sum(vazmed1)
aux = 0
for atual in vazmed2:
    if atual < 0:
        aux = 1
    if atual >= 0 and aux == 1:
        aux = 0
    if aux == 1:
        media = m/n
        print(media)

Any help? If possible I would like to try to solve this problem without packeage like numpy and pandas

  • You add all values in vasmed1 before calculating the average, so you will only be able to calculate the average of the entire list. Calculate the average within the for each time you encounter a positive/negative value exchange.

1 answer

2

I would recommend you to start thinking about organizing your solutions with functions, applying the principle of single liability. When you start creating a code that does everything sooner or later you lose control and it becomes more complex than it should or difficult to maintain.

First, we can think of a function that reads the CSV file and returns us a list of values for each line:

import csv

def read_csv(filename, delimiter=';'):
  with open(filename) as stream:
    reader = csv.reader(stream, delimiter=delimiter)
    yield from reader

Here it is worth noting that the yield is responsible for defining a generator. What is Yield for? At this point, if you consume your generator, it would have something like:

for row in read_csv('data.csv'):
  print(row)

['0', '  2.3360', ' 0.4675']
['1', '  1.7439', ' 0.4174']
['2', '  1.3766', ' 0.3673']
['3', '  1.3766', ' 0.1719']
...
['28', '  2.0055', ' -0.1900']
['29', '  1.7952', ' -0.1835']
['30', '  1.7952', ' 0.1835']

Note that the white spaces that are left in the file remain in our output.

The second function that we can imagine is to treat this data, so that we can use the value as a de facto float, and already generate the separation of the sets of values, which I will call here Chunks (and it is not for nothing). The idea is to consume the generator until a certain condition is satisfied to finish the Chunk. This condition is that the current value is positive and the previous negative or the end of the data. Thus, we can do:

def create_chunks(data):
  values = []
  previous_is_negative = False

  for row in data:
    value = float(row[2].strip())
    if value >= 0 and previous_is_negative:
      yield values
      values = []
      previous_is_negative = False
    elif value < 0:
      previous_is_negative = True
    values.append(value)
  yield values

Note: in row[2].strip() the call of strip is unnecessary in this context as the function itself float will already disregard white spaces at the beginning and end of string.

The function will consume the input generator, data, and will accumulate the values in values until the condition, if value >= 0 and previous_is_negative, be satisfied. When it occurs, it is returned the values accumulated and the variables restarted. If the data is closed before the condition is met all values are returnees and the function is terminated.

So we could already do:

data = read_csv('data.csv')
for chunk in create_chunks(data):
  print(chunk)

[0.4675, 0.4174, 0.3673, 0.1719, 0.1719, 0.1719, -0.6552, -0.6552, -0.6552, -0.6552, -0.6552, -0.6552, -0.6552, -0.6552, -0.6552]
[0.0356, 0.1414, 0.1556, 0.1698, 0.1978, 0.1307, 0.0635, 0.0631, 0.0626, -0.0619, -0.0612, -0.0612, -0.1155, -0.1697, -0.19, -0.1835]
[0.1835]

Note that all three were generated Chunks desired. From this it is sufficient to calculate the average of the values of each Chunk:

import statistics

# ...

for chunk in create_chunks(data):
  print(statistics.mean(chunk))

-0.27526
0.011068750000000002
0.1835

Browser other questions tagged

You are not signed in. Login or sign up in order to post.