MORON/PROTECTOR - Guarding the purity of your soul on the Internet (Experimental Branch)
by R.R.R. Sorority Labs
v0.6.2 / 02.Sep.2004
See file "COPYING". Should you use or adapt Moron to research purposes, a proper cite and public release of your adapted code is suggested.
While surfing perfectly honest and True to GOD sites on the Internet, the innocent are often assaulted by vile forces of evil and their pop-up pages of degrading and politically incorrect images of offensive filth.
Moron/Protector attempts to analyze the given images and predicts, in percentages, how much each image resembles
Levels: fbw fcg fidx fmanga fmcover fporn smut_close smut_nonpron
1 : pronlike (grayscale)
2 : drawn-looking content (cg)
3 : pronlike (index image)
4 : drawn-looking content (manga)
5 : drawn-looking content (cover)
6 : pronlike (general)
7 : pronline (closeup)
8 : good clean pixels
Caveat: many features available in the version 0.5.2 are missing from 0.6.x due to switch to R. Unimplemented features include recognizing hair, net-stockings and latex. Also, image regions are not colorized by content.
At the moment Moron/Protector is at R&D stage and not accurate enough for production use. Most of the classes are unreliable (and difficult to capture with the current methods).
In future, Moron/Protector might be usable to rank and sort content based on particular offensiveness criteria, e.g. percentage of latex found. See TODO for a list of some whimsical categories.
- Reasonably new version of R (e.g. >= 1.9.0) from http://www.r-project.org/
- R libraries "randomForest", "pixmap" and "rimage" from http://cran.r-project.org/
- Recent version of "imagemagick" package [ or any command that converts arbitrary images to .pnm files, in that case you'll need to edit moron.R/imread() a little ]
- Fast CPU and lots of mem. You can probably use this with 1G and/or a lot of swap.
Moron/Protector was developed on Debian/Unstable.
Installing Moron requires training a classifier to predict on the images. You'll have to do the training phase only once.
- Install R libraries randomForest, pixmap, rimage and cluster. From R command line, you can do this by
or by fetching the packages manually from CRAN, etc.
2) Unpack the Moron archive (you've done that, right?) 3) Start R in moron-protector/ directory 4) Append Moron to R workspace, load the training data,
build and save the classifier by
Thats all. The training shouldn't take too long on a decent machine. It will run for 150 iterations per default. [ The printed Out-Of-Bag error during the training should be an acceptable unbiased estimate of the quality of the built classifier. It should be around 10%. Usually the failing cases are deviations from the typical content of that class. ]
A typical amount of confusion to be expected from the built forest could be something like
fbw fcg fidx fmanga fmcover fporn close nonpron class_error fbw 401 0 0 13 0 2 0 3 0.04295943 fcg 0 363 3 0 30 18 3 2 0.13365155 fidx 1 4 388 7 10 9 0 0 0.07398568 fmanga 12 5 3 362 30 3 2 2 0.13603819 fmcover 0 8 5 6 369 26 2 3 0.11933174 fporn 0 13 0 0 23 302 62 19 0.27923628 close 0 2 0 0 3 48 361 5 0.13842482 nonpron 3 0 0 2 3 12 8 391 0.06682578
(entries on the diagonal are not confusions. Confusions between the drawn classes are not serious, if drawn/photo -discrimination is wanted.).
- Start R in protector/ directory
- The workspace should be automatically loaded. If not,
- Evaluate 5 randomly chosen images from a given directory,
The printed output will contain a distribution of the predictions. The image class with the maximum probability will be singled out. The predictions are sorted by weight.
0001/0001 /pron/latex//abc666_01b.jpg soft : fporn 0.76 fmcover 0.1266667 fcg 0.05333333 smut_nonpron 0.03333333 pred : fporn
After evaluating a directory of images, you can ask Moron to display them ordered by e.g. image category 4 presence;
Some further functionality may be added in the future.
The current version had an equal amount of training images from all the classes. Don't expect high accuracy for any predicted class. The way the images are processed and modelled as training vectors for the randomForest simply does not suffice to handle these problems perfectly.
Moron/Protector will like high-res high-quality images. It will most likely fail on grainy and lo-res images.
Q: Can it distinguish a brown sofa from well-tanned human? A: Probably not.
Q: Can it distinguish a wooden cabinet from a piece of ...? A: Probably not.
Q: Can it distinguish a 'grid patterned' tablecloth from
some questionable sort of underwear? A: Probably not.
One reason is that Moron most likely can not build a model of human anatomy as it is. For example, with the currently used approach, sofas and skin will probably look quite the same to the classifier. We are uncertain what kind of data or data representation would bias the process enough to make it estimate a statistical model for proper handling of different parts of humans and "understanding" their relationships. Hand-picking hundreds of limbs and noses and tanlines from images is not an option.
Q: This is a sick piece of software and you're wasting your time A: Errrr, was that a question...
Moron/Protector works by a feature extraction and machine learning combination. First, each training image is converted to TLS color space. Then it is divided to 3x3 (==9) rectangular regions per image plane, from which features are calculated. The region statistics are then also pooled to get some more global attributes as well.
The features calculated for each region and three planes are
1) ad-hoc FFT statistics
2) rotation invariant local binary patterns for d=1 and d=2 3) color histogram
4) histograms of local first and second order derivatives
These are also pooled to calculate some global functions with rowstats(). Aspect ratio of the image is included in the final vector.
All this happens in the function samplesingle() in moron.R
Some features supported by featext.R are not extracted currently. These include spatial color layout and dominant color features.
After the features are extracted, a data matrix is built, and a Random Forest is trained by the data to make predictions. However, in the default distribution not all over 4000 of the features are used or given in the data. Instead, 250 most useful were selected by a run of 500 tree randomForest by getmostimportant().
Unfortunately we can't distribute the training images. However, you can build your own dataset very easily, just put pictures of different classes to separate directories and give the dir paths to makedset() function (see source). The resulting dataset can then be used to train a new classifier with trainrf(), per default using all the extracted attributes.
DomCol : Dominant Color features LBP_riu81 : Local Binary Pattern feature extractor, pred=1 LBP_riu162 : Local Binary Pattern feature extractor, pred=2 rgb2hsv : Convert RGB image to HSV image rgb2tsl : Convert RGB image to TSL image hsv2rgb : Convert HSV image to RGB image GENhist : Generic histogram extractor for 3 dimensional arrays RANDhist : A color histogram based on sampled pixels fftstats : Ad-hoc fourier statistic features I fftstats2 : Ad-hoc fourier statistic features (larger set) dct : Generic 1D Discrete Cosine Transform dct2 : Generic 2D Discrete Cosine Transform for a matrix dctfeats : Spatial Color Layout by DCT2 difffeats : Statistics of local 1st and 2nd order derivatives wavefeat : Gives k most sign. wavelet coeffs & their locs
evaldir : evaluate a directory of images using a built classifier can also be used to copy images elsewhere that were predicted wrong, and write the prediction images to a specified directory showsorted : given preds from evaldir(), display images sorted by class showdifficult : given a forest and a dataset, show the images that were most difficult to predict makedset : sample patches from directories, output dataset (dset) joindsets : joins two datasets to a new dataset balancelabels : Returns a balanced dataset by subsampling (w/ same amount of instances / class) unifyclasses : relabel instances of some classes to another target class doskip : should we skip this string? (in "skiplist.txt") getmostimportant : After a randomForest has been built, this function can be used to generate a new, smaller dataset where only the most important attributes are retained. The dataset provided with Moron has been pruned by this function. randpermdset : order dset randomly orderlabels : order dset by class labels trainrf : train a random forest classifier using a given dataset sampledir : sample image patches from a given directory (internal) learnratetest : incrementally samples larger sets of data from the given dset and returns data showing how the out of bag error developed. sampledset : get a smaller subsample from a dset
Proper usage of the functions can be seen from the source code.
If it complains about not finding some function on startup, try source('moron.R').
Sometimes loading an image fails with an error related to system() command. I suppose this is a bug in R or some of the libraries. If this happens, the only option is to restart R.
See file TODO.
Random Forests are due to Leo Breiman and described in his paper in Machine Learning (2001). Thanks to Andy Liaw and co. for the R implementation. Local Binary Pattern (LBP) features were proposed by Pietikainen et al., across several papers. All of the mentioned authors deserve thanks for distributing their code. HSV histograms can be considered part of the folklore. Dominant Color and Spatial Color Layout (DCT features) are similar to those proposed in the MPEG-4 draft. To the defense of all the aforementioned authors, it must be stated that none of them is involved with the Moron project in any way.
To be removed from this hall of fame, send me email. :)
Method for Object Recognition of Obscure Nature. The Moron project focuses on trying to home in on psychologically relevant aspects of the given image content.
Bug reports, ideas, suggestions, patches, code, flames, etc are appreciated. Send mail to
Don't expect help on your homework, though - we are stricly limited to no-life projects.