11
I have hundreds of digital images of dogs and cats, I need to make an algorithm to recognize when the dog is and when the cat is. What steps should I take?
11
I have hundreds of digital images of dogs and cats, I need to make an algorithm to recognize when the dog is and when the cat is. What steps should I take?
14
First, it’s cool to say that this is a famous machine-Learning problem. It’s available as kaggle challenge, from where it is also possible to download the database. Inclusive, that’s where I downloaded the data to write the answer.
I’m going to show you a very simple methodology to train a classifier for this problem. The answer is quite a hello world of this world, but can help. This article describes a much more advanced methodology for forecasting (the correct is in 82% of the images)
Note also that this is a solution in R
to this problem.
In R you can read the images using the package imager
.
library(imager)
library(dplyr)
library(tidyr)
library(stringr)
img <- imager::load.image("train/cat.0.jpg")
Right at the beginning, I will leave the image with smaller and standardized dimension. 100 x 100. This is to stay Lighter, not a mandatory step, though recommended. I will also consider the grayscale and non-colored images to reduce them further.
img <- imager::grayscale(img)
img <- imager::resize(img, 100, 100)
Now we have a 100 x 100 matrix with each element representing the gray tone.
I prefer to represent the image as a data.frame in R, because it is easier to manipulate. Then I use the following code.
img_df <- as.matrix(img) %>%
data.frame() %>%
mutate(x = 1:nrow(.)) %>%
gather(y, t, -x) %>%
mutate(y = extract_numeric(y))
Here the image is represented in 3 columns of a data.frame. The first two x
and
y
identify the position of the pixel. The latter represents the gray tone of the pixel.
For entry into a statistical model/machine-Learning algorithm it is necessary to obtain a database in which each row is an observation/an individual/a sample unit, and each column is a feature observed in that individual.
So, to classify images of cats and dogs we need a database in which each image is represented in a row and each pixel of the image is a column (the pixels are the observed information of the image). In addition we will need a column indicating if the image is of a cat or a dog to train the algorithm/ estimate its parameters.
To transform the image in a row use the following command:
img_line <- img_df %>%
mutate(colname = sprintf("x%03dy%03d", x, y)) %>%
select(-x, -y) %>%
spread(colname, t)
If you wanted to consider the color of the image in your template, at this stage you would need to create a column for each pixel and each color, namely 3x100x100 = 30,000, you would end with each image represented by a row of 30,000 columns.
I explained how you would process an image, but training the algorithm requires several images. I will encapsulate the previous code in a function and use it to process a series of images.
processar <- function(path){
img <- imager::load.image("train/cat.0.jpg")
img <- imager::grayscale(img)
img <- imager::resize(img, 100, 100)
img_df <- as.matrix(img) %>%
data.frame() %>%
mutate(x = 1:nrow(.)) %>%
gather(y, t, -x) %>%
mutate(y = extract_numeric(y))
img_line <- img_df %>%
mutate(colname = sprintf("x%03dy%03d", x, y)) %>%
select(-x, -y) %>%
spread(colname, t)
return(img_line)
}
For demonstration purposes, I will take a sample of 100 dog images and 100 cat images for model training. In practice, much more images are needed.
arqs <- list.files("train", full.names = T)
amostra_gato <- arqs[str_detect(arqs, "cat")] %>% sample(100)
amostra_cachorro <- arqs[str_detect(arqs, "dog")] %>% sample(100)
amostra <- c(amostra_gato, amostra_cachorro)
bd <- plyr::ldply(amostra, processar)
Y <- as.factor(rep(c("gato", "cachorro"), each = 100) ) # vetor de respostas
This step takes a long time and is computationally intense. You do a lot of processing and images are heavy files.
Here any machine-Learning algorithm could be used. You have already transformed your images in a conventional database. I warn you, this usually takes a long time. On my computer to train with 200 images of 10,000 columns took approx 30 min.
I will use Forest Random to do the grading, but you could really any model.
m <- randomForest::randomForest(bd, Y, ntree = 100)
I won’t go into details of how modeling should be done. It’s right for you to separate a training base and a test base. Check that there was no overfitting, tuning the parameters using cross-validation, etc. But this would make the response too extensive, so I trained a Forest Random using all the patterns of the R function (changing only the number of trees).
I checked the error only on the basis of construction also (which is statistically wrong, but forward ball).
tabela <- table(predict(m, type = "class"), Y)
acerto <- sum(diag(tabela))/sum(tabela)
acerto
With the trained model and a new image processed, use the following command to predict the category:
predict(m, newdata = img_line)
Daniel, congratulations. Animal...
Won my +1 because the answer is really good (although I find the question too broad and voted to close it). By the way, when you mention that "this step takes a long time and is computationally intense," you are absolutely right. After all, 10,000 columns is simply absurd! rs Despite serving as a good basis for understanding, almost never directly use pixel values as characteristics. There are other features that can be extracted from images with intermediate processing algorithms that are more useful in greatly reducing the dimensions of the problem. :)
Agree @Luizvieira! I’m not a great expert on this subject, but I believe it is done in some algorithms (even if it is absurd). Check out this tutorial on the tensor flow https://www.tensorflow.org/versions/r0.9/tutorials/deep_cnn/index.html#convolutional-neural-Networks Of course they are smaller images (32x32), but they are much more images also.
@Danielfalbel The recognition of handwritten characters, for example, also uses very small sample images in training. Hence the size of the problem is much smaller, but still 32x32 is quite large. But there are other problems in using raw data (raw data) such as pixel values. The variance can be very large, and many of the "columns" can simply be useless or help to have overfit. This is why it is important to select the best Features.
@Danielfalbel The very link you pass has a section on selection of characteristics that mentions: "Much of the work of Designing a linear model consists of transforming raw data into suitable input Features." :) In fact, I explain a little of this subject in one of my answers around here.
@Luizvieira in this part is talking about linear models, but in the Deep CNN’s part he actually uses raw images in the algorithm. Of course, when you go to see this algorithm, it took 4 weeks to train in a cluster of I don’t know how many GPU’s. But I think that’s where the state of the art image recognition algorithms. For us mortals who do not have giant clusters of computers, it remains to select the best features.
@Danielfalbel Verdade. But regardless of whether it was linear or not, my point was more about the curse of dimensionality (the system requires exponential volume of training data as the number of characteristics used grows). But this time it’s me who doesn’t know much about deep Learning, maybe you’re right. :)
Daniel, that’s right, you even know the business kkkkkkkk. Only I don’t have much experience with neural networks. I thank you for the answer and I will try to implement your code... I wonder if I can contact you directly by email or something like that.
@Danielfalbel is there a program that you use to treat these images so that the algorithm is trained? How do you do this?
Browser other questions tagged python r matlab machine-learning
You are not signed in. Login or sign up in order to post.
Hello, welcome(a) to SOPT. Your question has several problems. First, it is very broad. This site is not a forum. If you haven’t done it yet, do the [tour] and read [Ask]. Secondly, you don’t make it clear what your main difficulty is in this whole process. Finally, you used tags from three different languages (Python, R and Matlab), but you should choose one to get more concrete help.
– Luiz Vieira