|Random Number Generators|
MDMC Software - Snob
Cluster analysis / automatic classification using MML methods. Snob categorises datasets based on their underlying numerical distributions. It does this using the assumption that if it can correctly categorise the data, then the data can be described most efficiently (ie using the minimum message length).
Like AutoClass, it aims to discover the natural classes in the data. Unlike AutoClass (at least in theory), Snob uses Minimum Message Length induction, a scale-invariant Bayesian technique based on information theory. In practice, AutoClass has used an approximation that is a kind of message length. In a 1996 comparison of unsupervised classifiers, Upal and Neufeld found that Snob did best, followed by AutoClass, with ART2 coming in last. Since then AutoClass has incorporated some of Snob's heuristics, so may be closer to Snob in performance.
For more information on Snob and MML clustering, see:
Snob has featured in many theoretical and applied papers. The classic citations are:
The Vanilla version of Snob can handle both continuous and discrete (multistate) variables, but restricts continous variables to Gaussian distributions. It assumes all variables are uncorrelated. It does not include all the features of standard Snob, and uses slightly different file formats.
However, versions 1.1 and higher compute a post-hoc hierarchical tree of class relations using a pseudo-Bhattacharyya coefficient. See the tree command for details.
The CVSTrac site has anonymous CVS, bug-tracking, and a Wiki. Anonymous users may add to the Wiki (not the main page) and post new tickets (bug reports or feature requests). The Wiki describes how to get CVS access.
Archives (latest version unless noted)
Factor Analytic, Hierarchical Snob, aka "cnob"
Written in C by Chris Wallace, this version can handle correlated variables by positing single-factor factor models to account for the correlation. It also explicitly searches for a hierarchical structure, not just the flat class structure of the other Snobs. It incorporates most of the distributions supported in standard Snob (and aims to incorporate all of them), and all who have used it report that it is very cool.
This version is now publicly available. Chris wanted to finish a few things before releasing it, but unfortunately he died in mid-2004. The department decided to release the code as it stands, and as we have used it internally. Contact MDMC if you would like help using Factor Analytic Snob, or have us run it on your data.
Anonymous users may read the Wiki on the CVSTrac site and post new tickets (bug reports or feature requests), but have no other rights, including no access to the code. The code is available here where we are sure you must agree to the Academic License first.
The only documentation for Factor-Analytic Snob is on the Wiki. Users wishing to contribute to the documentation should ask MDMC for a Wiki account. However, this version of Snob bears a strong resemblance to other versions, so users should be able to learn most of what they need from the documentation for the Vanilla and Standard versions of Snob.
This is the standard Fortran version written by Chris Wallace and then extended by David Dowe. It can handle Poisson, von Mises, and other distributions, in addition to those handled in the Vanilla version. It also assumes all variables are uncorrelated.
Originally, this version required f77 to compile. Sarah George fixed the code just enough to compile under g77 (and to add tags for JavaDoc-style auto-generated code documentation), and verified that it gave the same results (on the same machine) as the f77 version.
Standard Snob was converted from Fortran to C by Sarah George using f2c, and is available here for completeness, as some people have used it. However, Sarah reports that f2c introduced a great many bugs. She fixed all the ones that caused crashes, but remains uneasy about what other unfound bugs were created by the conversion. It requires some f2c (Fortran to C) libraries to run. Because of this, it is usually easier to just download a Fortran compiler (like g77) and get the Fortran version, or if you do not need the extra distributions, use the vanilla version.
The C version of Snob (converted from Fortran)
Parallel Snob (Currently unavailable here)Snob can now run on parallel clusters. This is a good thing. Stay tuned.