Python Multidimensional Array Problem

Asked

Viewed 229 times

2

Basically my algorithm imports all the photos I have inside a directory (dataset_train), saving these photos in vector X and the name of the folder where it was in vector Y.

import os
import cv2
import numpy as np

x = []
y = []

for root, dirs, files in os.walk("dataset_train"):
    path = root.split(os.sep)
    for file in files:
        imagem_nome = root + '/' + file
        imagem = cv2.imread(imagem_nome, 0)
        x.append(imagem)
        y.append(path[1])

print('Imagens para treinamento lidas:', len(x))


(x_train, y_train) = (np.asarray(x), y)

print(x_train.shape)
print(x_train[0].shape)

Exit:

Imagens para treinamento lidas: 11957

(11957,)

(250, 250)

I’m using as a basis the CNN MNIST algorithm, in which it has 60000 images, 28x28 pixels. In this algorithm when I use shape() in the training vector it returns (60000, 28, 28). But when I do it in mine it returns only (11957,), I need him to return (11957,250, 250). My dataset are 250x250 pixel photos.

Can someone help me with this, please? I’m new to Python :(

  • Funny that I tried to do in a smaller dataset and this my code worked. But its I try to do in this larger dataset of that problem.

1 answer

4


In my experience, this occore when matrices in x have different sizes, and so the conversion to vector numpy group only the first dimension. Among the reasons that may have caused this problem are:

  • Loading of invalid files - there may be some file that is not image in your directory, so we try to use the cv2.imread to read it. According to the documentation, the function cv.read does not trigger exceptions when reading invalid files, but returns None[*]
  • Images with different dimensions - with different resolution images, the matrices read will also have different sizes

Suggestions

To find out if this problem occurs, it should be sufficient

  • Check if no image read by cv2.imread returns None. This can be done by changing the content of if internal to
imagem_nome = root + '/' + file
imagem = cv2.imread(imagem_nome, 0)
if imagem is None:
    print("Imagem inválida:", file)
else:
    x.append(imagem)
    y.append(path[1])
  • Create a set (type set) with the dimensions of all images. This can be done using
print({img.shape() for img in x})

right after the loop for.


[*] - Information on page 21 of opencv-python manual, in English

  • I discovered the problem, I had a text file in one of the folders of the dataset, because it gave this problem that you said about the dimensions. Now ta working. Thank you!

Browser other questions tagged

You are not signed in. Login or sign up in order to post.