Monash Home | Monash Info | News and Events | Campuses and Faculties | Monash University
Monash Data Mining Center
 
  MDMC
  Home
  Consultancies
  Education
  Research
  Facilities
  Software
  Filler CaMML
  Filler DTree
  Filler Snob
  Filler Random Number Generators
  People
  Contacts
  Bibliography
  Seminars
  Member Login
Filler Filler Filler
Filler Filler Filler

MDMC Software - Snob

Cluster analysis / automatic classification using MML methods. Snob categorises datasets based on their underlying numerical distributions. It does this using the assumption that if it can correctly categorise the data, then the data can be described most efficiently (ie using the minimum message length).

Like AutoClass, it aims to discover the natural classes in the data. Unlike AutoClass (at least in theory), Snob uses Minimum Message Length induction, a scale-invariant Bayesian technique based on information theory. In practice, AutoClass has used an approximation that is a kind of message length. In a 1996 comparison of unsupervised classifiers, Upal and Neufeld found that Snob did best, followed by AutoClass, with ART2 coming in last. Since then AutoClass has incorporated some of Snob's heuristics, so may be closer to Snob in performance.

For more information on Snob and MML clustering, see:

References

Snob has featured in many theoretical and applied papers. The classic citations are:

  • Wallace C.S. & Boulton, D.M., `An Information Measure for Classification', Computer Journal, Vol.11, No.2, 1968, pp.185--194.
  • Wallace, C.S. and D.L. Dowe (2000). MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions, Statistics and Computing, Vol. 10, No. 1, Jan. 2000, pp.73--83.
    Table of Contents for the Issue
Please cite these in any published work using Snob. (Such citation is required for the academic license, and requested for Vanilla Snob which has a GPL license.)

Versions


Vanilla Snob

The Vanilla version of Snob can handle both continuous and discrete (multistate) variables, but restricts continous variables to Gaussian distributions. It assumes all variables are uncorrelated. It does not include all the features of standard Snob, and uses slightly different file formats.

However, versions 1.1 and higher compute a post-hoc hierarchical tree of class relations using a pseudo-Bhattacharyya coefficient. See the tree command for details.

The CVSTrac site has anonymous CVS, bug-tracking, and a Wiki. Anonymous users may add to the Wiki (not the main page) and post new tickets (bug reports or feature requests). The Wiki describes how to get CVS access.

Download

Prepackaged Distributions

  • Debian x86 (version 1.15)
  • Debian ppc (version 1.15)
  • Archives (latest version unless noted)

  • Binary: linux-x86 (Linux running on Intel). File size: (60k)
  • Binary: linux-ppc (Linux running on PowerPC G3) File size: (70k)
  • Binary: OS X (Darwin including Mac OS X, on a G4) File size: (66k)
  • Documentation in PS, PDF, text, and HTML: [ .tar.bz2 | .zip ]
    Filesizes: [ (264k) | (303k) ]
  • Source: Source code for programs & docs: [ .tar.bz2 | .zip ]
    Filesizes: [ (71k) | (164k) ]

  • Factor Analytic, Hierarchical Snob, aka "cnob"

    Written in C by Chris Wallace, this version can handle correlated variables by positing single-factor factor models to account for the correlation. It also explicitly searches for a hierarchical structure, not just the flat class structure of the other Snobs. It incorporates most of the distributions supported in standard Snob (and aims to incorporate all of them), and all who have used it report that it is very cool.

    This version is now publicly available. Chris wanted to finish a few things before releasing it, but unfortunately he died in mid-2004. The department decided to release the code as it stands, and as we have used it internally. Contact MDMC if you would like help using Factor Analytic Snob, or have us run it on your data.

    Anonymous users may read the Wiki on the CVSTrac site and post new tickets (bug reports or feature requests), but have no other rights, including no access to the code. The code is available here where we are sure you must agree to the Academic License first.

    The only documentation for Factor-Analytic Snob is on the Wiki. Users wishing to contribute to the documentation should ask MDMC for a Wiki account. However, this version of Snob bears a strong resemblance to other versions, so users should be able to learn most of what they need from the documentation for the Vanilla and Standard versions of Snob.

    Download: factorsnob.tgz
    File Size: (1139940 bytes)
    (Last updated: Wednesday, 23-Dec-2004, addition of README and LICENSE)

     


    Standard Snob

    This is the standard Fortran version written by Chris Wallace and then extended by David Dowe. It can handle Poisson, von Mises, and other distributions, in addition to those handled in the Vanilla version. It also assumes all variables are uncorrelated.

    Originally, this version required f77 to compile. Sarah George fixed the code just enough to compile under g77 (and to add tags for JavaDoc-style auto-generated code documentation), and verified that it gave the same results (on the same machine) as the f77 version.

    Download: snob.tgz
    File Size: (217299 bytes)
    (Last updated: Thursday, 29-03-01 00:34 EST)

     

    Converted Snob

    • Authors: Chris Wallace, David Dowe
    • Other contributors: Sarah George
    • License: Academic License

    Standard Snob was converted from Fortran to C by Sarah George using f2c, and is available here for completeness, as some people have used it. However, Sarah reports that f2c introduced a great many bugs. She fixed all the ones that caused crashes, but remains uneasy about what other unfound bugs were created by the conversion. It requires some f2c (Fortran to C) libraries to run. Because of this, it is usually easier to just download a Fortran compiler (like g77) and get the Fortran version, or if you do not need the extra distributions, use the vanilla version.

    The C version of Snob (converted from Fortran)

    Download: csnob.tgz
    File Size: (162666 bytes)
    (Last updated: Thursday, 29-03-01 00:34 EST)

     


    Parallel Snob (Currently unavailable here)

    Snob can now run on parallel clusters. This is a good thing. Stay tuned.
     

    Help Contacts Staff Directory Monash Sitemap  Search