Algorithm to detect nudity with good accuracy

Asked

Viewed 4,860 times

87

I was researching some libraries in Python that could detect nudity in photos, so that somehow I could avoid inappropriate content on my site.

Then I found the library nudity.

The problem is that, for some images, it erroneously returned positive, because the images did not contain nudity.

For example, the image below (which contains no nudity), when testing, returned True.

Behold:

Minka Kelly

import nude

nude.is_nude('minka-kelly.jpg'); // True

The user @Bacchus recommended that I test with a photo that had only skin color. Already for this image the result was False.

Behold:

Paleta de cores de pele

Then I wondered what was the criterion used to detect nudity.

Even though I know this may be flawed, I would like to be clarified on the following points::

  • Is there an algorithm that detects nudity with a high degree of efficiency?

  • What is the technique that is used to detect nudity in an image?

#Updating:

I was asked to test images with black-and-white as well, to evaluate the shape that the nudepy analyzes the images.

Here’s the test. There are three images in a folder (I took the screen printscreen)

Analise de imagem com nudepy

When rotating nudepy *, the following results are returned:

flor.jpg          False
minka-kelly.jpg   True
tom_cinza.jpg     False
  • 18

    Post the images he hit :D

  • 56

    Simple, if the image is being much more accessed than the others, it is a sign of something :P

  • 5

    The algorithm can be based on Learning machine techniques and/or color histogram analysis. But this is just my guess.

  • 14

    Machine Learning aims to build algorithms that have the ability to learn and make certain predictions, without being explicitly programmed to do so. Basically such algorithms aim to build a model from examples (in this case nudity) that are introduced, to then make certain decisions depending on that model. Standard recognition algorithms, for example, or google search are based on machine Learning techniques. A histogram, on the other hand, is a graphical representation of a statistical distribution; in this case, it would be of color.

  • 2

    It is possible that there may be something common regarding the color distribution in this type of images and the algorithm may be based on this.

  • 4

    The answer is here: https://sites.google.com/a/dcs.upd.edu.ph/csp-proceedings/Home/pcsc-2005/AI4.pdf?attredirects=0. Apparently it is by statistical analysis of color distribution. It’s nothing more sophisticated.

  • 4

    tested with P&B photos?

  • 2

    There’s a really cool paper on it here: Nude Detection in Video using Bag-of-Visual-Features

  • 3

    Wallacemaxters have you tested with people of different skin tones (Black to Albino)? have you tested both sexes? (A shirtless man is not considered nudity already a woman yes)? has tested photos with some laterality?

  • 1

    @Ricardo the black is also recognized as nudity. I believe I’ve done this test, it works also.

Show 5 more comments

2 answers

115


TL;DR

More than 90% accuracy, as the library you already use apparently gets (according to the article it quotes), is a great accuracy. More than this is unlikely to be achieved with current technology.

And as you noticed in your tests, the same photo that produces fake positive color does not produce "error" in grayscale version (no confuse with black and white! PB is a binary image, only with black and white literally). But that’s because the library uses an RGB template. So it only works for color images.


Relevant Links


  1. On 19/04/2016 it was posted this fantastic article on approaches more modern in the detection of nudity, using deep Learning. Worth to reading. But, warning to the most sensitive: the article contains images with nudity.

Curiously, many people didn’t know that the famous Lenna (me already I mentioned her here in other answers), widely used in Vision Computational, is an image with nudity. lol

  1. The library deepgaze (available on Github) has, among others cool stuff, skin detection by retro-projection (backprojection).

This type of detection is performed on the data of a digital image, that is, on the discrete illumination values contained in the pixels. And the difficulty of "interpreting" is just there: what for you, human, is a person wearing a dress is, for the computer, only an array of integer values representing light sampling (in a single band, in the case of grayscale images, or more bands, as in the case of RGB images - a band for each band of the visible luminous spectrum).

There are different approaches to computationally working this data to try to detect something of interest (which can be a human face, a weapon, a spoiled fruit on a conveyor belt, or even - as in your domain of problem - a naked person). And these forms use statistical concepts.

Using a Color Histogram

They mentioned in their comments the use of histograms. A histogram is basically a count of occurrences of something, presented graphically as a way to illustrate a frequency distribution. For example, the reputation in Sopt is also shown in the form of a graph, where the vertical axis shows the reputation counts (frequencies) earned and the horizontal axis the discrete accounting intervals (each day of the month):

inserir a descrição da imagem aqui

Note: I am using your reputation for illustration purposes only. Such information shall be publicly available in your profile and also in the profile of any other Sopt user.

A color histogram counts from an image the number of pixels that occur (the frequency, on the vertical axis) for each color discrete value (on the horizontal axis, and usually on a scale of real values between 0.0 and 1.0 or integers between 0 and 255). However, what is treated as "color" may vary depending on some choices: you can count each RGB value (red, green and blue) separately, or count the brightness value (gray tone) as a whole (to understand what I mean as "as a whole", read this my other answer here at Sopt), or even account for only tint (Hue, in English) if the system used is HSV instead of RGB (most common form in problems like yours, because you want to treat color as a single value).

The following figure, reproduced of this website, illustrates the use of color histograms to analyze an image of a car. Note that it has four histograms: one for the pixel count in red (R), one for the pixel count in green (G) and one for the pixel count in blue (B), in the upper right corner; and one in the lower right corner for the pixel count in the brightness range (shades of gray). Understanding how a histogram works, you can look at the graph below (counting on the brightness scale) to realize that the photo is reasonably lighter than dark, since there is a higher concentration (a higher occurrence) of pixels lighter than dark (there are higher vertical bars on the right side, with values closer to 255, than on the left side, with values closer to 0). It is no wonder that this type of tool is widely used by photographers, since it allows you to notice how is the lighting of a photo (more details on this aspect in this great article).

inserir a descrição da imagem aqui

And how can this graph be used to detect something? Well, such accounting is a frequency distribution and as such can be used to calculate the probability of occurrence of a pixel (taken at random) with a certain color/value in that particular image for which the histogram has been calculated. Although the car is dark in the image illustrated earlier, most of the pixels are considerably lighter due to the wall, the floor, and even the light reflections on the car ceiling (which is lighter than it really is). Thus, by randomly drawing a pixel in this image, the probability that it is not from the car is quite large, consistent with what illustrates the color histogram of the image.

The probabilities of each color/value are calculated by normalizing this count by the total pixels in the image (illustratively equivalent to "stack" the bars, as shown in the following figure).

inserir a descrição da imagem aqui

This image is from a magazine article SERVO Magazine (2007 edition, page 37) which explains how Opencv’s Camshift face tracking algorithm works. The essential idea is that, having an example image of a face (the initial region of the face to be traced in Camshift), one can calculate the color histogram for it and then use these probability values to calculate in a new image the probability of each pixel being or not of a face. When a region has more neighboring pixels with greater probability of individually being of a face, the probability of the entire region increases, indicating that there is probably a face.

This same reasoning can be used to detect any object of interest, provided you have calculated the color histogram for an image representative of what you want to detect (that is, even if someone doesn’t like pictures of naked people, you still need to use them! hehehe). Note, however, that what this very simple algorithm returns is the probability of a region, from the probabilities of each individual pixel, being or not the object of interest. The algorithm is expected to hit hard, but it can miss, because that probability is never 100%.

Machine learning

They also mentioned in comments the idea of using computational learning (or learning). This area of Artificial Intelligence also uses many static methods. To understand the main idea behind data classification, read this my other answer (because everything is already there and I see no point in repeating here). It ends up focusing on a specific algorithm (SVM), but the beginning gives a general idea of the classification process.

Anyway, the principle of the thing is:

  1. From sample images of what is desired to detect/classify, important features are extracted. In your case, it probably involves color, but in other problems may involve edges, gradients, etc.
  2. These characteristics are used to train a classifier, who then "learns" how to separate two groups (for example, it’s nude, not nude) from the data.
  3. This classifier is then used with real-world data (new data, not included in the training database) to actually classify the new image.

An algorithm for detecting objects in images that uses machine learning and is quite popular and robust is the Haar Cascade. I’ve already explained briefly how it works in this my other answer, but an important distinction is that you need to provide examples of positive (which contains the object of interest) and negative (which does not contain the object of interest). It has implementation ready in Opencv to detect faces, eyes, nose and mouth, but you can use the basic functions to detect anything (even a banana). It uses as training features the variation of lighting according to different "filter windows" (such Aar Features), then it is robust for detecting objects because it uses not only the lighting colors/values but also the variation directions that derive essentially from the edges. I don’t know if it is or has been used to detect nudity, as it may add unnecessary complexity to the numerous possible variations in human posture. But here’s the hint.

Concluding

As you will notice in the examples illustrated in the answers I mentioned earlier, there is always the possibility of mistakes. Maybe because there really are real world examples that are outliers (something like "out of the curve"), or because natural measurement errors occur (in the case of digital image processing, illumination variations, partial occlusion by other objects, and variations in rotation or scale are very significant difficulties).

I don’t really know much about this particular problem domain to know what is most used. However, the library you use is based on another Javascript-based library in this article (also mentioned in comments). And this work proposes to use essentially a skin color model (with example images under different illuminations), considering the normalized RGB values and the proximity of pixels (regions), because it argues that this is the most common and feasible form. Essentially, it does so in a similar way to the one I explained earlier and used by Opencv’s Camshift. From what the article mentions, there are sensitivity settings that can be changed to allow fine-tuning by the user (I don’t know if the library provides access to this, but it’s worth your investigation since it may be useful to you).

According to the tests reported in the article, they obtained an accuracy rate around 96.29%. It is quite high, but see how it is not always accurate. There is still a rate of 6.76% false positives (as the case of your image).

In general, any algorithm of this type will have a margin of accuracy and you need to know how to work with it. Over 90% is a really good margin, and we should not belittle the result because it misses an image like the one you used, in which the color of the dress really confuses with human skin.

Alternatives

In a domain like yours, where I presume it is intended to prevent children or people who are easily offended from accessing nude images, it is much better to have false positives (images that are not nude but have been classified as such) than to have false negatives (images that are nude, but not classified as nude). If your algorithm misses favorable to intended use, you do not necessarily have a problem there, and such images can be handled in exceptional cases. You can, for example, send images classified as nudes to a later human evaluation (a kind of moderator) that will then release or not such images manually. Or you can just pass these images through a new classifier, which then uses other variables (such as the file name, its size, or the number of hits - as someone has even suggested in comment).

Artificial intelligence is still very limited compared to what a human being is capable of. So an alternative is just to use humans to make such a classification. Imagine that you could build a system that sends photos to other registered reviewers (who are not the main users of your system), who are paid by photo to only classify as pornographic or not. The amounts paid per photo are small, so you can send the same photo to three of these evaluators, for example, to have a "best of three" response at a low cost. Such evaluators would be interested in doing this work because it is simply trivial, and even paying little for a photo maybe they can get a good salary by doing many of these ratings in one day.

Does this solution seem inconceivable? Well, it already exists: see the Amazon Mechanical Turk, which provides a platform to contract this type of service, and Descriptive Camera, who uses crowd sourcing likewise to print a description of the scene instead of an image.

Does this solution look like something with the potential to do evil? Yes, so much so that it is something being widely discussed from an ethical point of view (imagine a country hiring people across the world to identify citizens in protests, just to quote an example).

  • 19

    I am not the author of the question, but your answer is practically an article on the subject! + 1

  • Thank you @Diegofelipe. :)

  • 3

    Just for fun: The printerest needs to use this algorithm urgently.

  • 2

    @Luizvieira I just came by late to congratulate you on the answer. It was a very impressive read!

  • I’m glad you like it, @Arturotemplário. :)

  • Excellent response. + 1

  • The first link of your reply is not working.

  • 1

    @Andréfilipe Thanks for warning. I edited and put a new link to the same article.

  • 1

    Ball show, your answer is magnificent!

Show 4 more comments

34

To complement the @Luizvieira response in the Machine Learning question to sort images.

In Machine Learning, first of all, you will need a large number of images previously classified by a human in "nude" and "not nude".

Algorithms of neural networks and mainly deep learning are currently the most used for image classification. Including Google, recently opened the code of a software p/ run algorithms of this type, Tensorflow. In their search blog, has a really cool tutorial on how to sort images.

The advantage of Deep Learning, and what made it so popular today, is that instead of what Luiz mentioned:

From sample images of what you want to detect/classify, important characteristics are extracted. In your case, probably involves color, but in other problems may involve edges, gradients, etc..

It is not necessary for a human to extract these important characteristics. The only input of these algorithms is the colors (r,g,b) of each pixel of the image (there is rather a p/standardization that all images have the same size). The algorithm itself can create "characteristics" (set of pixel combinations) which can then be used to classify new images without human interference.

Update: Recently Google launched a vision api which looks very interesting. The price is at most $0.0025 per image to predict explicit content.

  • 2

    +1 Great and succinct supplement. :)

  • great complement, beautiful references...

Browser other questions tagged

You are not signed in. Login or sign up in order to post.