SourceFiles.org - Use the Source, Luke
Home | Register | News | Forums | Guide | MyLinks | Bookmark

Sponsored Links

Latest News
  General News
  Reviews
  Press Releases
  Software
  Hardware
  Security
  Tutorials
  Off Topic


Back to files

MORON/PROTECTOR - Guarding the purity of your soul on the Internet (Experimental Branch)

by R.R.R. Sorority Labs

v0.6.2 / 02.Sep.2004

License & note

See file "COPYING". Should you use or adapt Moron to research purposes, a proper cite and public release of your adapted code is suggested.

Intro

While surfing perfectly honest and True to GOD sites on the Internet, the innocent are often assaulted by vile forces of evil and their pop-up pages of degrading and politically incorrect images of offensive filth.

Moron/Protector attempts to analyze the given images and predicts, in percentages, how much each image resembles

Levels: fbw fcg fidx fmanga fmcover fporn smut_close smut_nonpron

1 : pronlike (grayscale)
2 : drawn-looking content (cg)
3 : pronlike (index image)
4 : drawn-looking content (manga)
5 : drawn-looking content (cover)
6 : pronlike (general)
7 : pronline (closeup)
8 : good clean pixels

Caveat: many features available in the version 0.5.2 are missing from 0.6.x due to switch to R. Unimplemented features include recognizing hair, net-stockings and latex. Also, image regions are not colorized by content.

At the moment Moron/Protector is at R&D stage and not accurate enough for production use. Most of the classes are unreliable (and difficult to capture with the current methods).

In future, Moron/Protector might be usable to rank and sort content based on particular offensiveness criteria, e.g. percentage of latex found. See TODO for a list of some whimsical categories.

Requirements

  • Reasonably new version of R (e.g. >= 1.9.0) from http://www.r-project.org/
  • R libraries "randomForest", "pixmap" and "rimage" from http://cran.r-project.org/
  • Recent version of "imagemagick" package [ or any command that converts arbitrary images to .pnm files, in that case you'll need to edit moron.R/imread() a little ]
  • Fast CPU and lots of mem. You can probably use this with 1G and/or a lot of swap.

Moron/Protector was developed on Debian/Unstable.

Install

Installing Moron requires training a classifier to predict on the images. You'll have to do the training phase only once.

  1. Install R libraries randomForest, pixmap, rimage and cluster. From R command line, you can do this by

> install.packages("randomForest");
> install.packages("pixmap");
> install.packages("rimage");
> install.packages("cluster");

or by fetching the packages manually from CRAN, etc.

2) Unpack the Moron archive (you've done that, right?) 3) Start R in moron-protector/ directory 4) Append Moron to R workspace, load the training data,

build and save the classifier by

> source("buildmoron.R");

Thats all. The training shouldn't take too long on a decent machine. It will run for 150 iterations per default. [ The printed Out-Of-Bag error during the training should be an acceptable unbiased estimate of the quality of the built classifier. It should be around 10%. Usually the failing cases are deviations from the typical content of that class. ]

A typical amount of confusion to be expected from the built forest could be something like

             fbw fcg fidx fmanga fmcover fporn close nonpron  class_error
fbw          401   0    0     13       0     2     0       3  0.04295943
fcg            0 363    3      0      30    18     3       2  0.13365155
fidx           1   4  388      7      10     9     0       0  0.07398568
fmanga        12   5    3    362      30     3     2       2  0.13603819
fmcover        0   8    5      6     369    26     2       3  0.11933174
fporn          0  13    0      0      23   302    62      19  0.27923628
close          0   2    0      0       3    48   361       5  0.13842482
nonpron        3   0    0      2       3    12     8     391  0.06682578

(entries on the diagonal are not confusions. Confusions between the drawn classes are not serious, if drawn/photo -discrimination is wanted.).

Usage after training

  1. Start R in protector/ directory
  2. The workspace should be automatically loaded. If not,

    > sys.load.image(".RData",quiet=TRUE);

  3. Evaluate 5 randomly chosen images from a given directory,

    > preds<-evaldir('/some/path/',rf,pics=5);

The printed output will contain a distribution of the predictions. The image class with the maximum probability will be singled out. The predictions are sorted by weight.

Example output:

0001/0001 /pron/latex//abc666_01b.jpg soft : fporn 0.76 fmcover 0.1266667 fcg 0.05333333 smut_nonpron 0.03333333 pred : fporn

After evaluating a directory of images, you can ask Moron to display them ordered by e.g. image category 4 presence;

> showsorted(preds,class=4);

Some further functionality may be added in the future.

Design goals & caveats

The current version had an equal amount of training images from all the classes. Don't expect high accuracy for any predicted class. The way the images are processed and modelled as training vectors for the randomForest simply does not suffice to handle these problems perfectly.

Moron/Protector will like high-res high-quality images. It will most likely fail on grainy and lo-res images.

Frequently Asked Questions

Q: Can it distinguish a brown sofa from well-tanned human? A: Probably not.

Q: Can it distinguish a wooden cabinet from a piece of ...? A: Probably not.

Q: Can it distinguish a 'grid patterned' tablecloth from

some questionable sort of underwear? A: Probably not.

One reason is that Moron most likely can not build a model of human anatomy as it is. For example, with the currently used approach, sofas and skin will probably look quite the same to the classifier. We are uncertain what kind of data or data representation would bias the process enough to make it estimate a statistical model for proper handling of different parts of humans and "understanding" their relationships. Hand-picking hundreds of limbs and noses and tanlines from images is not an option.

Q: This is a sick piece of software and you're wasting your time A: Errrr, was that a question...

Internals

Moron/Protector works by a feature extraction and machine learning combination. First, each training image is converted to TLS color space. Then it is divided to 3x3 (==9) rectangular regions per image plane, from which features are calculated. The region statistics are then also pooled to get some more global attributes as well.

The features calculated for each region and three planes are 1) ad-hoc FFT statistics
2) rotation invariant local binary patterns for d=1 and d=2 3) color histogram
4) histograms of local first and second order derivatives

These are also pooled to calculate some global functions with rowstats(). Aspect ratio of the image is included in the final vector.

All this happens in the function samplesingle() in moron.R

Some features supported by featext.R are not extracted currently. These include spatial color layout and dominant color features.

After the features are extracted, a data matrix is built, and a Random Forest is trained by the data to make predictions. However, in the default distribution not all over 4000 of the features are used or given in the data. Instead, 250 most useful were selected by a run of 500 tree randomForest by getmostimportant().

Unfortunately we can't distribute the training images. However, you can build your own dataset very easily, just put pictures of different classes to separate directories and give the dir paths to makedset() function (see source). The resulting dataset can then be used to train a new classifier with trainrf(), per default using all the extracted attributes.

Functions in Moron

In featext.R

DomCol             : Dominant Color features
LBP_riu81          : Local Binary Pattern feature extractor, pred=1
LBP_riu162         : Local Binary Pattern feature extractor, pred=2
rgb2hsv            : Convert RGB image to HSV image
rgb2tsl            : Convert RGB image to TSL image
hsv2rgb            : Convert HSV image to RGB image
GENhist            : Generic histogram extractor for 3 dimensional arrays
RANDhist           : A color histogram based on sampled pixels
fftstats           : Ad-hoc fourier statistic features I
fftstats2          : Ad-hoc fourier statistic features (larger set)
dct                : Generic 1D Discrete Cosine Transform
dct2               : Generic 2D Discrete Cosine Transform for a matrix
dctfeats           : Spatial Color Layout by DCT2
difffeats          : Statistics of local 1st and 2nd order derivatives
wavefeat           : Gives k most sign. wavelet coeffs & their locs

In moron.R

evaldir            : evaluate a directory of images using a built classifier
                     can also be used to copy images elsewhere that were
                     predicted wrong, and write the prediction images to
                     a specified directory
showsorted         : given preds from evaldir(), display images sorted by class
showdifficult      : given a forest and a dataset, show the images that
                     were most difficult to predict
makedset           : sample patches from directories, output dataset (dset)
joindsets          : joins two datasets to a new dataset
balancelabels      : Returns a balanced dataset by subsampling (w/ same
                     amount of instances / class)
unifyclasses       : relabel instances of some classes to another target class
doskip             : should we skip this string? (in "skiplist.txt")
getmostimportant   : After a randomForest has been built, this function
                     can be used to generate a new, smaller dataset where 
                     only the most important attributes are retained. The 
                     dataset provided with Moron has been pruned by this
                     function.
randpermdset       : order dset randomly
orderlabels        : order dset by class labels
trainrf            : train a random forest classifier using a given dataset
sampledir          : sample image patches from a given directory (internal)
learnratetest      : incrementally samples larger sets of data from the
                     given dset and returns data showing how the 
                     out of bag error developed.
sampledset         : get a smaller subsample from a dset

Proper usage of the functions can be seen from the source code.

Known bugs

If it complains about not finding some function on startup, try source('moron.R').

Sometimes loading an image fails with an error related to system() command. I suppose this is a bug in R or some of the libraries. If this happens, the only option is to restart R.

Future work

See file TODO.

Acknowledgements and further info

Random Forests are due to Leo Breiman and described in his paper in Machine Learning (2001). Thanks to Andy Liaw and co. for the R implementation. Local Binary Pattern (LBP) features were proposed by Pietikainen et al., across several papers. All of the mentioned authors deserve thanks for distributing their code. HSV histograms can be considered part of the folklore. Dominant Color and Spatial Color Layout (DCT features) are similar to those proposed in the MPEG-4 draft. To the defense of all the aforementioned authors, it must be stated that none of them is involved with the Moron project in any way.

To be removed from this hall of fame, send me email. :)

MORON?

Method for Object Recognition of Obscure Nature. The Moron project focuses on trying to home in on psychologically relevant aspects of the given image content.

Contact

Bug reports, ideas, suggestions, patches, code, flames, etc are appreciated. Send mail to

<iwronsky(at)users.sourceforge.net>

Don't expect help on your homework, though - we are stricly limited to no-life projects.

-EOF-


Sponsored Links

Discussion Groups
  Beginners
  Distributions
  Networking / Security
  Software
  PDAs

About | FAQ | Privacy | Awards | Contact
Comments to the webmaster are welcome.
Copyright 2006 Sourcefiles.org All rights reserved.