|
|||
|
|
MDMC | ||
| Home | |||
| Consultancies | |||
|
|
Education | ||
|
|
Research | ||
| Facilities | |||
| Software | |||
| CaMML | |||
| DTree | |||
| Snob | |||
| Random Number Generators | |||
| People | |||
| Contacts | |||
|
|
Bibliography | ||
| Seminars | |||
|
|
Member Login | ||
MDMC Software - CaMMLCaMML attempts to learn the best causal structure to account for an observational sample, using an MML metric and an MCMC search (using the Metropolis algorithm). It does this by sampling from a posterior distribution over the causal model space, using a minimum message length (MML) probability [1]. MML provides an information-theoretic metric which combines a model complexity penalty with a penalty for unexplained data values. Other metric causal discovery algorithms include Cooper and Herskovits's K2 [2], the BGe and BDe metrics of Heckerman and Geiger [3], and MDL (minimum description length) in Lam & Bacchus [4]. These metrics are implemented in algorithms which apparently require more prior information about the domain than CaMML. K2 requires a prior total ordering of variables; the MDL algorithm requires a useful partial order to perform well; BGe/BDe requires a prior preferred causal model. When implemented in our own search algorithm, however, BGe/BDe performed similarly to CaMML [5]. A distinguishing feature of the MML metric from all others is that CaMML samples the space of totally-ordered models, rather than directed acycle graphs (DAGs) or patterns (statistically equivalent DAGs) [6]. Because of this, CaMML considers, for example, a common-cause structure (A <- B -> C) to be twice as likely a priori as a directed chain (A -> B -> C), assuming there is no specific prior information to the contrary. The other metrics assume a priori a uniform prior over either DAGs or patterns. A quite different class of causal discovery algorithm aims to first discover the probabilistic dependency structure, in individual tests, and use that to infer a causal structure which would best explain the given dependency structure (this is called constraint learning in the literature). The TETRAD algorithms [7] adopt this approach, using orthodox significance tests (Chi-square on partial correlations) to decide iteratively whether a particular probabilistic dependency exists. These techniques are not very robust to noise or small samples (see [8]), since the tests are made independently and since errors early in the process tend to cascade. CaMML was originally developed by Chris Wallace and Kevin Korb; other versions have been developed by Julian Neil, Rodney O'Donnell, Lucas Hope and Charles Twardy. Either linear path models or discrete Bayesian networks can be learned. Some versions are able to learn local structure (versus full conditional probability tables) in the form of logit models or classification trees. Alternative search algorithms include genetic algorithm search. References
We also have CaMML references in BibTeX. DocumentationOn-line documentation is available by clicking on the documentation links below. We have written a fairly extensive User's Guide which also covers the basic theory. Old CaMML
The original version of CaMML was written in C by Chris Wallace. There is a version for continuous path models and another for discrete Bayes nets. Other people built some variants which are described on the Wiki, above. Users may contribute to the Wiki, report bugs, and browse the revision history. Source CodeCaMML is not GPL, and users may not view the source without prior permission. However, the various versions are available via CVS for experimentation and regression testing. For those with permission, the source is available from the CVSTrac server listed above, with CVS instructions on the Wiki. DownloadAll versions are distributed as .zip files. Versions marked "Netica" have support for writing models as Netica .dne files. Versions marked "Plain" do not. Netica versions are restricted to small networks unless you have a Netica license. CaMML in WekaThere is a WEKA wrapper for the rodo version of OldCaMML. It is part of the mltools package of Machine Learning Tools -- algorithms and evaluation tools for the Weka machine-learning suite. Mltools was written at MDMC, primarily by Lucas Hope. Generally you would install mltools somewhere in your CLASSPATH and then put the appropriate version of CaMML into the directory mltools/learners/camml. Here is a sample doing that on Windows. Untested, so it may require some tweaking. DownloadCaMML IICVSTracBugs Wiki Starting in 2002, CaMML was replanned from the original papers, in Java using the modular CDMS framework. More later, but it has been extensively regression tested, has a nice GUI, and does cool things. It also supports Weka. Parallel CaMMLCaMML has been run on our parallel cluster. After all, much of the run time is stochastic search through a large model space. |
| ||