Clustering
Problem Description
Test Data
 3gauss.mat  200 points from a mixture of 3 Gaussian distributions.
 banana.mat  200 points from a banana data set.
 spiral.mat  200 points from a spiral data set.
 datasets.mat  Larger data sets of the above 3 types.
Program Descriptions
 NCE.m
This program finds a set of cluster means via the CE method with (independent) Normal updating.Usage:
Call the program from MATLAB, with the following syntax:
[ mu , count , score ] = NCE( N , g , alpha , k, data , modif , drplot , c , sigma_0 )
Example: [ mu , count , score ] = NCE( 1000 , 10 , 0.7 , 5 , DATA , 0 , 1 )Inputs: N  Number of samples to generate each iteration g  Number of these samples to use to update parameters alpha  Smoothing parameter k  Number of cluster means to find data  The data we are trying to fit means to (Should be n x d, where there are n points, and d dimensions) modif  If 1, use modified smoothing, otherwise use standard smoothing drplot  If 1, draws the cluster means and the data (for 2dimensions) c  Optional starting centroids sigma_0  Optional starting standard deviation Outputs: mu  The centroids found via the CE method, using Normal updating with the parameter set count  The number of iterations taken score  The final score of these centroids  genNCE.m
This program is used internally to generate the cluster means.  scoreNCE.m
This program is used internally to evaluate the performance of a particular set of means against the data.  MCE.m
This program finds a set of cluster means via the CE, looking at clustering as a ``mincut'' type problem.Usage:
Call the program from MATLAB, with the following syntax:
x = MCE( N , rho , alpha , k , data )
Example: x = MCE( 2000 , 0.01 , 0.6 , 3 , DATA )Inputs: N  Number of samples to generate each iteration rho  The fraction of samples used to update the probabilities alpha  Smoothing parameter k  Number of clusters to assign points to data  The data we are trying to assign to clusters (Should be n x d, where there are n points, and d dimensions) Outputs: x  The best found assignment of the data points  genMCE.m
This program is used internally to assign data points to clusters.  MCEJ.m
This is slight modification of the above program, using the ``injection'' idea (due to Zdravko Botev). It generally produces superior results to the unmodified MCE method.Usage:
Call the program from MATLAB, with the following syntax:
x = MCEJ( N , rho , alpha , k , data )
Example: x = MCEJ( 1400 , 0.03 , 0.75 , 4 , DATA )Inputs: N  Number of samples to generate each iteration rho  The fraction of samples used to update the probabilities alpha  Smoothing parameter k  Number of clusters to assign points to data  The data we are trying to assign to clusters (Should be n x d, where there are n points, and d dimensions) Outputs: x  The best found assignment of the data points  scoreMCE.m
This program is used internally to evaluate the performance of a particular assignment against the data.  clNCE.m
This is a small program which labels the points in a dataset according to a set of cluster means.Usage:
Call the program from MATLAB, with the following syntax:
x = clNCE( c , data , k )
Example: x = clNCE( mu , DATA , 5 )Inputs: c  A set of cluster means data  The data we are trying to assign to clusters (Should be n x d, where there are n points, and d dimensions) k  Number of clusters to assign points to Outputs: x  The assignment of the data points to clusters  cMCE.m
This is a small program which calculates cluster means from a given cluster labelling.Usage:
Call the program from MATLAB, with the following syntax:
c = cMCE( x , y , k )
Example: mu = cMCE( x , data , 6 )Inputs: x  A labelling of data points data  The data we are trying to fit means to (Should be n x d, where there are n points, and d dimensions) k  Number of cluster means Outputs: x  The cluster means calculated for this labelling of data points
Bibliography

D. P. Kroese and R. Y. Rubinstein.
The CrossEntropy Method: A Unified Approach to Combinatorial Optimization, MonteCarlo Simulation and Machine Learning.
Springer, 2004.