TL;DR
More than 90% accuracy, as the library you already use apparently gets (according to the article it quotes), is a great accuracy. More than this is unlikely to be achieved with current technology.
And as you noticed in your tests, the same photo that produces fake
  positive color does not produce "error" in grayscale version (no
  confuse with black and white! PB is a binary image, only with black and
  white literally). But that’s because the library uses
  an RGB template. So it only works for color images.
Relevant Links
- On 19/04/2016 it was posted this fantastic article on approaches
  more modern in the detection of nudity, using deep Learning. Worth to 
  reading. But, warning to the most sensitive: the article contains images with nudity.
Curiously, many people didn’t know that the famous Lenna (me already
    I mentioned her here in other answers), widely used in Vision
    Computational, is an image with nudity. lol
- The library deepgaze (available on Github) has, among others
  cool stuff, skin detection by retro-projection (backprojection).
This type of detection is performed on the data of a digital image, that is, on the discrete illumination values contained in the pixels. And the difficulty of "interpreting" is just there: what for you, human, is a person wearing a dress is, for the computer, only an array of integer values representing light sampling (in a single band, in the case of grayscale images, or more bands, as in the case of RGB images - a band for each band of the visible luminous spectrum).
There are different approaches to computationally working this data to try to detect something of interest (which can be a human face, a weapon, a spoiled fruit on a conveyor belt, or even - as in your domain of problem - a naked person). And these forms use statistical concepts.
Using a Color Histogram
They mentioned in their comments the use of histograms. A histogram is basically a count of occurrences of something, presented graphically as a way to illustrate a frequency distribution. For example, the reputation in Sopt is also shown in the form of a graph, where the vertical axis shows the reputation counts (frequencies) earned and the horizontal axis the discrete accounting intervals (each day of the month):

Note: I am using your reputation for illustration purposes only.
  Such information shall be publicly available in your profile and also in the profile of any other Sopt user.
A color histogram counts from an image the number of pixels that occur (the frequency, on the vertical axis) for each color discrete value (on the horizontal axis, and usually on a scale of real values between 0.0 and 1.0 or integers between 0 and 255). However, what is treated as "color" may vary depending on some choices: you can count each RGB value (red, green and blue) separately, or count the brightness value (gray tone) as a whole (to understand what I mean as "as a whole", read this my other answer here at Sopt), or even account for only tint (Hue, in English) if the system used is HSV instead of RGB (most common form in problems like yours, because you want to treat color as a single value).
The following figure, reproduced of this website, illustrates the use of color histograms to analyze an image of a car. Note that it has four histograms: one for the pixel count in red (R), one for the pixel count in green (G) and one for the pixel count in blue (B), in the upper right corner; and one in the lower right corner for the pixel count in the brightness range (shades of gray). Understanding how a histogram works, you can look at the graph below (counting on the brightness scale) to realize that the photo is reasonably lighter than dark, since there is a higher concentration (a higher occurrence) of pixels lighter than dark (there are higher vertical bars on the right side, with values closer to 255, than on the left side, with values closer to 0). It is no wonder that this type of tool is widely used by photographers, since it allows you to notice how is the lighting of a photo (more details on this aspect in this great article).

And how can this graph be used to detect something? Well, such accounting is a frequency distribution and as such can be used to calculate the probability of occurrence of a pixel (taken at random) with a certain color/value in that particular image for which the histogram has been calculated. Although the car is dark in the image illustrated earlier, most of the pixels are considerably lighter due to the wall, the floor, and even the light reflections on the car ceiling (which is lighter than it really is). Thus, by randomly drawing a pixel in this image, the probability that it is not from the car is quite large, consistent with what illustrates the color histogram of the image.
The probabilities of each color/value are calculated by normalizing this count by the total pixels in the image (illustratively equivalent to "stack" the bars, as shown in the following figure).

This image is from a magazine article SERVO Magazine (2007 edition, page 37) which explains how Opencv’s Camshift face tracking algorithm works. The essential idea is that, having an example image of a face (the initial region of the face to be traced in Camshift), one can calculate the color histogram for it and then use these probability values to calculate in a new image the probability of each pixel being or not of a face. When a region has more neighboring pixels with greater probability of individually being of a face, the probability of the entire region increases, indicating that there is probably a face.
This same reasoning can be used to detect any object of interest, provided you have calculated the color histogram for an image representative of what you want to detect (that is, even if someone doesn’t like pictures of naked people, you still need to use them! hehehe). Note, however, that what this very simple algorithm returns is the probability of a region, from the probabilities of each individual pixel, being or not the object of interest. The algorithm is expected to hit hard, but it can miss, because that probability is never 100%.
Machine learning
They also mentioned in comments the idea of using computational learning (or learning). This area of Artificial Intelligence also uses many static methods. To understand the main idea behind data classification, read this my other answer (because everything is already there and I see no point in repeating here). It ends up focusing on a specific algorithm (SVM), but the beginning gives a general idea of the classification process.
Anyway, the principle of the thing is:
- From sample images of what is desired to detect/classify, important features are extracted. In your case, it probably involves color, but in other problems may involve edges, gradients, etc.
- These characteristics are used to train a classifier, who then "learns" how to separate two groups (for example, it’s nude, not nude) from the data. 
- This classifier is then used with real-world data (new data, not included in the training database) to actually classify the new image.
An algorithm for detecting objects in images that uses machine learning and is quite popular and robust is the Haar Cascade. I’ve already explained briefly how it works in this my other answer, but an important distinction is that you need to provide examples of positive (which contains the object of interest) and negative (which does not contain the object of interest). It has implementation ready in Opencv to detect faces, eyes, nose and mouth, but you can use the basic functions to detect anything (even a banana). It uses as training features the variation of lighting according to different "filter windows" (such Aar Features), then it is robust for detecting objects because it uses not only the lighting colors/values but also the variation directions that derive essentially from the edges. I don’t know if it is or has been used to detect nudity, as it may add unnecessary complexity to the numerous possible variations in human posture. But here’s the hint.
Concluding
As you will notice in the examples illustrated in the answers I mentioned earlier, there is always the possibility of mistakes. Maybe because there really are real world examples that are outliers (something like "out of the curve"), or because natural measurement errors occur (in the case of digital image processing, illumination variations, partial occlusion by other objects, and variations in rotation or scale are very significant difficulties).
I don’t really know much about this particular problem domain to know what is most used. However, the library you use is based on another Javascript-based library in this article (also mentioned in comments). And this work proposes to use essentially a skin color model (with example images under different illuminations), considering the normalized RGB values and the proximity of pixels (regions), because it argues that this is the most common and feasible form. Essentially, it does so in a similar way to the one I explained earlier and used by Opencv’s Camshift. From what the article mentions, there are sensitivity settings that can be changed to allow fine-tuning by the user (I don’t know if the library provides access to this, but it’s worth your investigation since it may be useful to you).
According to the tests reported in the article, they obtained an accuracy rate around 96.29%. It is quite high, but see how it is not always accurate. There is still a rate of 6.76% false positives (as the case of your image).
In general, any algorithm of this type will have a margin of accuracy and you need to know how to work with it. Over 90% is a really good margin, and we should not belittle the result because it misses an image like the one you used, in which the color of the dress really confuses with human skin.
Alternatives
In a domain like yours, where I presume it is intended to prevent children or people who are easily offended from accessing nude images, it is much better to have false positives (images that are not nude but have been classified as such) than to have false negatives (images that are nude, but not classified as nude). If your algorithm misses favorable to intended use, you do not necessarily have a problem there, and such images can be handled in exceptional cases. You can, for example, send images classified as nudes to a later human evaluation (a kind of moderator) that will then release or not such images manually. Or you can just pass these images through a new classifier, which then uses other variables (such as the file name, its size, or the number of hits - as someone has even suggested in comment).
Artificial intelligence is still very limited compared to what a human being is capable of. So an alternative is just to use humans to make such a classification. Imagine that you could build a system that sends photos to other registered reviewers (who are not the main users of your system), who are paid by photo to only classify as pornographic or not. The amounts paid per photo are small, so you can send the same photo to three of these evaluators, for example, to have a "best of three" response at a low cost. Such evaluators would be interested in doing this work because it is simply trivial, and even paying little for a photo maybe they can get a good salary by doing many of these ratings in one day.
Does this solution seem inconceivable? Well, it already exists: see the Amazon Mechanical Turk, which provides a platform to contract this type of service, and Descriptive Camera, who uses crowd sourcing likewise to print a description of the scene instead of an image.
Does this solution look like something with the potential to do evil? Yes, so much so that it is something being widely discussed from an ethical point of view (imagine a country hiring people across the world to identify citizens in protests, just to quote an example).
							
							
						 
Post the images he hit :D
– Maniero
Simple, if the image is being much more accessed than the others, it is a sign of something :P
– Bacco
The algorithm can be based on Learning machine techniques and/or color histogram analysis. But this is just my guess.
– Alexandre Cartaxo
Machine Learning aims to build algorithms that have the ability to learn and make certain predictions, without being explicitly programmed to do so. Basically such algorithms aim to build a model from examples (in this case nudity) that are introduced, to then make certain decisions depending on that model. Standard recognition algorithms, for example, or google search are based on machine Learning techniques. A histogram, on the other hand, is a graphical representation of a statistical distribution; in this case, it would be of color.
– Alexandre Cartaxo
It is possible that there may be something common regarding the color distribution in this type of images and the algorithm may be based on this.
– Alexandre Cartaxo
The answer is here: https://sites.google.com/a/dcs.upd.edu.ph/csp-proceedings/Home/pcsc-2005/AI4.pdf?attredirects=0. Apparently it is by statistical analysis of color distribution. It’s nothing more sophisticated.
– Alexandre Cartaxo
tested with P&B photos?
– Maniero
There’s a really cool paper on it here: Nude Detection in Video using Bag-of-Visual-Features
– Laerte
Wallacemaxters have you tested with people of different skin tones (Black to Albino)? have you tested both sexes? (A shirtless man is not considered nudity already a woman yes)? has tested photos with some laterality?
– Ricardo
@Ricardo the black is also recognized as nudity. I believe I’ve done this test, it works also.
– Wallace Maxters