(C) Copyright Sarah George 2000.
See Snob.README for copyright & licensing information for the Snob software and official documentation.
This is not a replacement for the official documentation (snob.doc), and does not cover all of the things that Snob can do. It's just a way to try out the software & get a feel for it in just a few minutes. If you're planning to do anything serious with Snob, read snob.doc as well.
The official version:
"Snob does mixture modelling by Minimum Message Length (MML)."
My version:
"Snob categorises datasets based on their underlying numerical
distributions. It does this using the assumption that if it
can correctly categorise the data, then the data can be described
most efficiently (ie using the minimum message length)."
Example:
The sample data file, iris.raw, describes the sepal length & width; and the petal length & width, of three types of plant.
Feeding this data into Snob, it accurately notes the three species in its analysis, and also highlights some "outliers" from the three main categories.
It's worth testing Snob before trying it on your own data, to get a feel for how it runs.
1. Get an input file.
This is a .raw file with a bunch of plain-text numbers in it to describe your data and its attributes.
Take a look at iris.in for an example. It was used to generate iris.raw: the raw, uncommented data format that Snob requires. (You can just write up a .raw file yourself but I prefer to be able to comment my files)
Making iris.raw from iris.in: ./preproc.pl < iris.in > iris.raw
Run Snob (type ./snob) and type 'iris' when prompted for an input filename.
Snob will notice that it hasn't created a binary version of this file, and make a new one from iris.raw. If you switch to a different kind of computer that doesn't like the old .bin files, simply delete them & Snob will re-create them for you.
Now you are given a prompt to enter Snob commands.
Type 'adjust n' to itterate through n guesses of what classes might be a best fit.. (eg. adjust 100 to itterate 100 times)
Type 'sum' for a summary of what classes are currently considered best.
Type 'prmemb n' to print a list of members of class number n. and the class number (eg. prmemb 3 to print class 3, prmemb 0 to print all classes)
Type 'prclas n' to print information about each class. (eg. prclas 3 for information about class 3)
Type 'stop' to quit Snob.
Now if you were paying attention to the results, you'll see that there's four classes found. This includes one for each species of plant, and one for some "unusual" Iris-versicolor plants (possibly suggestive of a sub-species). Print out the members from each class, noting the folliwing:
* Items 1-50 are Iris-setosa * Items 51-100 are Iris-versicolor * Items 101-150 are Iris-virginica
(See irisplot.png for an XY-plot of the data with X = petal length, Y = petal width. Note that I'm only invistigating petal dimensions in this example to keep it simple.)
Also, note that the division is based on the observed data; so unusual plants can end up listed as a different species.
Full documentation for snob is in the file "snob.doc".