Summary
PCP (Pattern Classification Program) is an open-source program for supervised classification of patterns.
PCP has been developed and tested on Linux/i386 platform (Fedora Core 4). The Linux binary is provided with the PCP distribution, and it should run out-of-the-box on most Linux distributions.
PCP distribution also comes with Windows binary. It requires the Cygwin library cygwin1.dll to run. Cygwin is a free UNIX-like environment for Windows which can be downloaded from http://www.cygwin.com. Note that PCP only requires Cygwin library cygwin1.dll to run, not the complete Cygwin environment.
PCP is released under MIT license (also known as X11 license). This license permits free use and distribution for any purpose, including commercial, in binary and source formats. See file LICENSING for details.
Unpacking the Distribution
GNU/Linux
% gunzip -c pcp-2.2.tar.gz | tar xvf -
This will create a subdirectory `pcp-2.2' in your current directory.
Windows/Cygwin
Double-click on the pcp-2.2.zip file. This will unpack the software. However, before you can run it you need to obtain a copy of the library file called cygwin1.dll. The Cygwin environment (which contains the library) can be downloaded from http://www.cygwin.com (perhaps you can download just the library file; I haven't tried that). Upon installation, the library can typically be found in directory c:\cygwin\bin. Then you have two options:
- add the Cygwin directory to the PATH environment variable
- copy the library file into the same directory where you install pcp.exe executable
Once you have the library, you can run PCP executable from DOS window or Cygwin terminal window.
Quick Start
For a quick test of PCP, try this:
% cd pcp-2.2
% Linux/pcp -b srbct_test.bat
This command runs pcp in batch mode using a command file srbct_test.bat. It will build a Support Vector Machine classifier for a well-known SRBCT child leukemia data set [3]. The problem is to predict leukemia subtype for a patient (and hence, help choose appropriate treatment) using a vector of microarray (gene expression) measurements. After completion of processing, the program returns to the command line prompt. The resulting SVM is stored in file pcp.svm.
In order to perform prediction, using the built SVM model, on an independent test dataset, type:
% Linux/pcp
You should see `Main Menu'. Press `b' to enter `Pattern Classification', then `f' to enter `Support Vector Machines' Menu. In the menu, press `c' for `Prediction', then `Enter' twice. The results should look something like this:
Enter SVM model file name [pcp.svm]:
Short (0) or long (1) output [0]:
+----------------------------------------------------------------------------+ | Class | Actual/predicted card. | Error rate | +----------------------------------------------------------------------------+
| | 25/25 | 12.00% ( 3/ 25) | | 1/ews_test | 7/10 | 0.00% ( 0/ 7) | | 2/rms_test | 8/6 | 25.00% ( 2/ 8) | | 3/nb_test | 7/6 | 14.29% ( 1/ 7) | | 4/bl_test | 3/3 | 0.00% ( 0/ 3) | +----------------------------------------------------------------------------+ | Vector | Actual class | SVM prediction | +----------------------------------------------------------------------------+ | 13 | rms_test | ews_test | | 14 | rms_test | ews_test | | 22 | nb_test | ews_test |
+----------------------------------------------------------------------------+
The above table shows classification results for the chosen data set. The class names are the file names without the extension, and the cumulative error rate is 12%.
See User's Guide for more information, including instructions how to prepare and run your own data sets.
Documentation
For usage, see the accompanying User's Guide pcp.pdf. For compilation for other platforms, see file COMPILING. For licensing information, see LICENSING.
Distribution contents
Linux/pcp PCP executable for Linux Cygwin/pcp.exe PCP executable for Windows/Cygwin README this file LICENSING plain English description of licensing terms PCP_LICENSE PCP license (MIT license) HASH_LICENSE license for Kaz Kylheku's hash code LIBSVM_LICENSE license for LIBSVM library by Chih-Chung Chang and Chih-Jen Lin LAPACK_LICENSE LAPACK license COMPILING instructions for porting PCP to other platforms ChangeLog history of differences between releases pcp.pdf User's Guide iris_setosa.dat IRIS dataset
iris_versicolor.dat
iris_virginica.dat
landsat*.dat Landsat dataset [1]
landtst*.dat
all_train.dat Leukemia dataset [2]
aml_train.dat
all_test.dat
aml_test.dat
ews.dat SRBCT (child leukemia) dataset [3]
rms.dat
nb.dat
bl.dat
ews_test.dat
rms_test.dat
nb_test.dat
bl_test.dat
iris.bat batch file for loading the IRIS dataset al.bat batch file for loading the Leukemia dataset landsat.bat batch file for loading the Landsat dataset srbct.bat batch file for loading the SRBCT dataset srbct_test.bat batch files for the SRBCT dataset SVM learning
srbct_svm.bat
src source code directory lapack LAPACK library source code directory configure.ac GNU Autoconf build files
configure
Makefile.am
Makefile.in
install-sh
aclocal.m4
missing
depcomp
Author
PCP was designed and written by Ljubomir J. Buturovic of San Francisco State University. Sasha Jaksic of San Francisco State University contributed code to feature selection functionality.
Please send feedback and comments to ljubomir@sfsu.edu.
Bibliography
[1] Blake, C.L., Merz, C.J, UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
[2] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov, H. Coller, M. Loh, J. Downing, M. Caligiuri, C. Bloomfield, and E. S. Lander, ``Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring,'' Science, 286(5439):531-537, October 1999.
[3] J. Khan, J. S. Wei, M. Ringner, L. H. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. R. Antonescu, C. Peterson, P. S. Meltzer, ``Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,'' Nat. Med., June 2001, vol. 7, no. 6, pp. 673-679.
