How to find the most common value within each column of an array using python?

Question

How to find the most common value within each column of an array using python?

Asked 6 years, 5 months ago

Viewed 297 times

0

I have a matrix (3 x 3280), I need to go through each column and consequently each row and find the most frequent value and thus generate another vector (1 x 3280) with these values. For example:

matriz=[1 2 3 4 ....],[2 3 4 5....],[1 2 4 4...]]

for the first column, going through the three lines one has [1 2 1], then most common value is 1. For the second column running through the three lines [2 3 2] then most common value is 2.

I tried to make a code in Python, but as I know nothing Python gives too many errors.

Do you know any other language? Why Python?

– Woss

2018/06/01 at 16:19
1

Put the code you already have so we can understand your doubts and help you progress.

– Isac

2018/06/01 at 16:28
Anderson, because the rest of the program is in python, that part of the doubt is about 1/3 of the code, the rest is done and right :)

– LVoltz

2018/06/02 at 17:39

2 answers

1

Similarly to what the Vitor Hugo posted in his reply, just calculate the transposed matrix and check the most common element of each line; the logic is exactly what he used, but it is possible to do it in a simpler way:

def most_common_of_columns(matrix):
    for column in zip(*matrix):
        most_commons = Counter(column).most_common(1)
        yield most_commons[0][0]

Where zip(*matrix) returns the transposed matrix; Counter(column).most_common(1) returns a list with the pair (number, quantity) of the most common number, and finally, most_commons[0][0] returns the most common number.

So do something like:

matrix = [
    [1,1,1,1,2,3,4,5,9,8], 
    [1,2,3,3,3,4,5,6,7,8], 
    [1,1,1,1,3,4,4,4,4,4]
]

print(list(most_common_of_columns(m)))

Will return: [1, 1, 1, 1, 3, 4, 4, 5, 9, 8], which are the most common elements of each column. Note that if there is no more common element, the first row element will be returned.

Got it... thank you guys!!!! helped!!

– LVoltz

2018/06/02 at 17:41

Browser other questions tagged python matrix

You are not signed in. Login or sign up in order to post.

by Vitor Hugo • 48 points · Answer 1 · 2018-06-01T18:42:29+00:00

To do the procedure you describe, you basically need the type Counter found in lib collections and function zip, python native. I will first paste the sample code and at the end I will explain.

from collections import Counter

m = [
    [1,1,1,1,2,3,4,5,9,8], 
    [1,2,3,3,3,4,5,6,7,8], 
    [1,1,1,1,3,4,4,4,4,4]
]

inversa = []
contadores = []
resultado = []

for x in zip(m[0],m[1],m[2]):
    inversa.append(list(x))


for x in inversa:
    a = Counter(x)
    b = a.most_common(1)
    contadores.append(b)


for x in contadores:
    a = x[0]
    b = a[0]
    resultado.append(b)

print(resultado)

We started the work importing the Counter and defining the work matrix. For didactic purposes set a matrix 3 by 10, but don’t worry, you can increase its size and the code will still work.

After that I also defined three auxiliary lists, which will be explained throughout the code.

The first thing to do is to get the inverse (Mathematically, the correct term is transposed) of the inserted matrix, as this will facilitate our work. We do this using a for loop that traverses all terms from the list of tuples returned from the function zip and saves each tuple as a line from the reverse list.

if you broke the code there and gave a print in inverse would see something like:

[[1, 1, 1],
 [1, 2, 1],
 [1, 3, 1],
 [1, 3, 1],
 [2, 3, 3],
 [3, 4, 4],
 [4, 5, 4],
 [5, 6, 4],
 [9, 7, 4],
 [8, 8, 4]]

After that we use one more for loop to run all rows of the inverse matrix (which are the columns of our original matrix) and we use the Counter to count the number of repetitions of each term.

Still inside our for loop we use the method most_common accompanied by the parameter 1 to obtain the most common term of each line and save the tuples resulting from that loop in the list accountants.

If you stop the code here and give a print in accountants would see something like:

[[(1, 3)], [(1, 2)], [(1, 2)], [(1, 2)], [(3, 2)], [(4, 2)], [(4, 2)], [(4, 1)], [(9, 1)], [(8, 2)]]

Being that for each column of our matrix we have a [(x, y)] whereas x represents the most repeated term and y the amount of times this repeated.

Finally, we ran all the tuples from the list of counters, saved in a the first field of each line (this section is necessary because although it is a list of tuples with only one component, it is still a list).

Still inside the final loop, rescued in b the first term within a, For it is the most repeated term that concerns us, and not how many times it has been repeated. And finally we put all the values of b within the results list and display this.

At the end of the code you will get this result:

[1, 1, 1, 1, 3, 4, 4, 4, 9, 8]

The list results being the vector composed by the most common terms :)