CaMML is a program which learns Bayesian networks from raw data. CaMML stands for Causal MML, because it searches for Bayesian networks according to causal structure, not just statistical equivalence classes, and uses the MML (Minumum Message Length) principle as its metric. It uses the Metropolis algorithm for stochastic sampling.
MML is the theory of Minimum Message Length inference developed by Chris Wallace. CaMML is a fully Bayesian way to infer Bayes nets. MML provides the correct information-theoretic penalty for model complexity, and thus the MML prior rewards simpler causal models. For example, among statistically-equivalent models, CaMML prefers common-cause structures to chain structures. Intuitively, the common-cause structure is more likely because it imposes far fewer time constraints than the chain model. For a more detailed explanation of the theory behind CaMML, see [9,1]. For elaborations including alternative search methods and restricted (logistic) interactions, see [3,8,6,7,5].
CaMML has consistently outperformed Tetrad, K2, and other standard inference methods. In addition to previous papers, see [2,3]. We should compare it against some newer alternatives like expand-and-prune learners that are supposed also to be good at finding minimal structure.
There are several versions of CaMML around.
This manual covers oldCaMML, specifically rodo.1.1 As of early 2004, rodo is the only version of oldCaMML still being actively maintained. The features of other versions will hopefully be incorporated into CaMML II. The rodo version has the following advantages over original CaMML:
CaMML is available under an Academic license, which means it cannot be used for profit or proprietary development. The source code is only available by special request. However, the binaries and documentation are available from:
Bug-tracking, Wiki pages, and further information is available from our CVSTrac server: HTML documentation for the original code was once maintained by Sarah George at: Maybe it's still there. Don't read it unless you're going to abide by the terms of the Academic License (available from the website).We won't cover compiling and installation here. Read the README and INSTALL files, or obtain a precompiled binary from the website.
After starting CaMML, you will see a brief introductory message with version number and limits (see 2.3.2), and then the menu shown in Table 2.1.
The main menu is designed for very basic interaction. A typical session is:
For more control you will want to drop into ``old CaMML'' before running the Gibbs sampling. More on that later. First let's walk through a sample session.
Camml> : i Enter File Name : AsiaCases.100.cas New Instance 1
You can add priors on the existence of links between any two nodes, and the ordering between nodes, given that there is a link. You can also set up causal tiers, such that all variables on tier 1 occur before all variables on tier 2, etc. (Causal tiers is an idea borrowed from Tetrad II.)
Camml> : 1
Enter Link Numbers 1 : 1
Enter Link Numbers 2 : 2
Enter Probability : .4
BiDirectional link(y/n) : n
Added Prior P( Connection:1,2) = 0.4
Any time a model finds a link between nodes
and
, it applies
the ordering prior to help determine the direction of the arc between
them.
So another way to specify that VisitAsia should precede TB is to put a high prior on that ordering:
Camml> : 2
Adding prior on Ordering
Enter Link Numbers 1 : 1
Enter Link Numbers 2 : 2
Enter Probability : .9
Added Prior P( 1 ->2 | Connection ) = 0.9
I'm not at all sure how this interacts with
arc priors, esp. bidirectional ones, or whether it gets applied when
there is a directed path between
and
but no direct
connection.
Causal tiers are an idea borrowed from Tetrad II. They involve setting up a list of tiers such that all values on tier 1 occur before all on tier 2, and all on tier 3 occur after all on tier 1 and 2. They're really just a ``script'' for batch-setting link orderings. For example, we might believe that VisitAsia, Smoking, and TB all precede XRay and Dysphenia, but have no other preferences. We would specify two tiers. Each tier gets one line, and we can use either names or numbers.2.1 Here, Smoking is Variable 3, and Dysphenia is Variable 8. (Use human numbers. CaMML will translate for you.) For example:
Camml> : 3 Number of Tiers : 2 Enter items on tier 0 (separate by space) : VisitAsia 3 TB Enter items on tier 1 (separate by space) : XRay 8 0 2 1 5 7 Adding prior on Ordering Added Prior P( 1 ->6 | Connection ) = 0.999 Adding prior on Ordering Added Prior P( 1 ->8 | Connection ) = 0.999 Adding prior on Ordering Added Prior P( 3 ->6 | Connection ) = 0.999 Adding prior on Ordering Added Prior P( 3 ->8 | Connection ) = 0.999 Adding prior on Ordering Added Prior P( 2 ->6 | Connection ) = 0.999 Adding prior on Ordering Added Prior P( 2 ->8 | Connection ) = 0.999
Performs Gibbs sampling (a Metropolis search) to find the best model. This procedure starts up the old (Chris Wallace) CaMML, and runns Gibbs sampling from there. You can tell this by noting the Fun: prompts and the last line ``Quelling the inner CaMML'', which reports that it is ending that run of the old CaMML. For more details on Gibbs sampling, see Chapter 4.
Camml> : g
For menu, type ?
Fun: ---- Starting Annealing ----
Anneal then sample
Cooking
<T=1.00 P=0.250> (BEST= 281.46) <T=1.00 P=0.164> (BEST= 281.46)
.
.
.
Prob(arc) = 0.147 weight = 3243.7
Goto getkept
At getkept
printit(fullf)
:::::::::::::::::: 1-th final model.
7 arcs, -LogLike 229.57, RawCost 286.04
Same M 30 Becomes N 1
Rel Prior 15.400 Posteriors: SEQ: 0.66 MML: 6.70
Perms 496
Fun:
Quelling the inner CaMML
Camml> :
Two print commands show different details about the best model. (Note, you may want to look at alternative models, for which you will need to run the Inner CaMML.)
Shows the number of arcs and some MML cost information, then the prior and posterior probabilities, the number of permutations represented by this model, and finally, a list of arcs using variable names (if provided). The score of interest is the MML posterior, given below as MML: 6.70, indicating that this model did not have a very high posterior probability. (Incidentally, the highest posterior I've found on AsiaCases.100.cas is about 11%.)
Camml> : p NumInstances = 0 For menu, type ? Fun: :::::::::::::::::: Current model. 7 arcs, -LogLike 229.57, RawCost 286.04 Same M 30 Becomes N 1 Rel Prior 15.400 Posteriors: SEQ: 0.66 MML: 6.70 Perms 496 Child <- Parent(s) V VisitAsia <- V TB <- Cancer ~TBorCancer V Smoking <- V Cancer <- V TBorCancer <- Cancer V XRay <- TBorCancer V Bronchitis <- Smoking ~ V Dysphenia <- TBorCancer Bronchitis~ Fun: Quelling the inner CaMML Camml> :
Prints out the CPTs for the best model. This was more useful when CaMML did not have the ability to save directly to Netica files. (Then one would log the output and run camml2dne.pl on it.) However, it is still good for a quick look at the numbers; Netica's GUI is not very good for that.
The parents of the node appear on the left, and the node's states
appear on the right. Probabilities are given to 5 decimal places. If
(as often happens), probabilities are still zero at 5 decimal places,
they are replaced with .00001.
Camml> : c
CPT of node 1 with 0 dads, Rawcost = 4.79
| 0 1
-+--------------
| 0.00495 0.99505
CPT of node 2 with 2 dads, Rawcost = 10.17
4 5 | 0 1
---------+--------------
0 0 | 0.06250 0.93750
0 1 | 0.50000 0.50000
1 0 | 0.83333 0.16667
1 1 | 0.00543 0.99457
.
.
.
Saves the model as a Netica (.dne) file. Will not do so if you have no active instances. Also, make sure your variable names agree with Netica's rules (no dots, etc) or it will give ``Error in NewNode_bn: Name string passed is bad....''
Camml> : x Enter FileName : nv-null.dne Netica (AB) 2.15 Linux, (C) 1990-2002 Norsys Software Corp. The license being used is +Site/MonashU/310-1 (security part removed). creating new casefile Writing cases reading cases Saving Network removing cases
That completes the walk-through. At the top level, there are a few other concepts and options you may wish to explore.
Rodo introduced the concept of multiple instances. CaMML can run up to 10 instances, each of which can have its own data set. Alternatively, you could give them all the same data set, and different random seeds or priors, to more completely explore the search space. If you use CAPITAL letters for the commands, they apply to all instances. For example, you could set up several instances and then type ``G'' to start them all sampling at once.
Camml> : s Enter Instance Number : 2When you switch instances, CaMML prints the menu and status line telling you whether that instance has been initialized or not.
Clear the current data, including models and data file.
Testing only. Should not be used.
This will print a table showing any specified arc priors. It does not
show ordering priors. It uses a dot (.) for unspecified priors. By
default, unspecified priors between
and
are 0.25 in each
direction, for an overall arc prior of 0.5. Usually, Bayesian networks
are sparser than this, so you may want to specify arc priors!
Mostly for testing, but if you want to see the defaults as well as the specified priors, use this one.
After doing Gibbs sampling, CaMML recalculates the overall arc
frequencies, changing the default from 0.25 to the overall arc
frequency, which is usually much lower.
Shows the MML cost to add a given link. This is equivalent to the arc priors, but using message length rather than probability. Only calculated after a sampling run.
Allows the user to run a system command. For example:
Camml> : \ls
AsiaCases.1000.cas CHANGELOG Netica.h
.
.
.
On *nix systems, you can always suspend CaMML via Ctrl-Z, and
run the command directly.
Set the random seeds based on a small integer you supply. Without this, every search with the same data (for the same number of steps) will return the same model. Also useful for reproducing a search.
Typing ? shows the menu, and h shows additional, but largely inadequate, help.
Rodo uses ``inner'' CaMML for its own work. This option gives you direct access to that inner CaMML, which is largely CPT CaMML. You have much more control here, but the menu options are different. They are described in detail in Chapter 3.
Note: the inner CaMML does not know about any sampling you
have done, nor of any models you have already found. Sadly, it also
seems to forget any constraints you have added.
CaMML is more forgiving than it used to be. It now allows comments (using the `%' character). CaMML takes data in case files, not total-count matrices. Two sample data files are included from the famous Asia network.
The first line lists the number of variables, followed by the number of cases (in fact, these can be separate lines). The next line is optional: it gives the variable names. The one after that lists the number of states for each variable. Following that are the cases, usually one per line although CaMML allows free format.
Older versions of CaMML did not allow for variable names (or comments).
CaMML has some unfortunate hard-coded limits. In the original version,
it had a limit of 7 parents per variable, with a max of 5 states per
variable, and a maximum number of cells in any variable's CPT of
40,000. The version described here reports the compiled-in limits on
startup. For example, my startup screen shows:
CaMML: Causal MML for discrete Bayes nets. Version : 2.20
-----------------------------------------------------------------------------
Original CaMML Code by Chris Wallace;
Modification made by Rodney O'Donnell rodo@csse.monash.edu.au
Limits: 40 states/var, 7 parents/var, 50000 entries/CPT
If CaMML hangs, segfaults, or dies, check these. You may increase the hard
limits by changing MAX_NUM_STATES, MAX_PARENTS, or Maxcell and recompiling.
-----------------------------------------------------------------------------
Mind you, CaMML doesn't necessarily check to see if you have
respected those limits. It might just hang or crash if you
disobey.
You should also note that the real constraint is the number of entries per CPT. A 31-state node can have two 40-state parents, and that's it. A 3-state node can have seven 4-state parents. In general you should try to keep the average number of states as low as possible.
Furthermore, CaMML does not currently handle missing values. However, you can circumvent this limitation manually by coding your missing values as a state (I suggest the first state (0) to be consistent). Then you can model missing data explicitly in your final network, and discover whether missing data is itself predictive.
Note: you can significantly speed up your search by reducing the maximum number of parents per node.
By typing r at the main menu, you drop down into the ``inner'' CaMML that rodo uses for its own work. Unfortunately, you lose all of your settings and findings except for the data file itself.
Table 1 in the Appendix presents the extended menu of commands available when you run old CaMML (by typing `r' from the main menu). You can see these commands by typing `?' for the common commands or `??' for the extended commands, each in a 25-line format. Each group of commands is explained in more detail in its own section later.
A typical session with inner CaMML goes like this:
We will cover the commands for such a session, roughly in that order. Type ? to get a menu, or refer to Table 1 in the Appendix.
| P: | Print all constraints |
| C: | Clear all constraints |
| A |
Add constraint |
| D |
Delete constraint |
You may provide CaMML with temporal constraints that variable
is
earlier than variable
. Then CaMML will ignore all models
inconsistent with those constraints, and search only the state space
of models consistent with that (partial) temporal ordering.
This is not nearly as powerful as the constraint-specifications available in the outer level. Ideally we should allow those constraints to cross over.
This setting is useful or speeding up a search. For example, if I am running a dataset with 200 variables (!), I might just want to verify that I'm getting sensible models at all, before waiting 2 days for a run. In that case, I can set w to .0001.
Specific models may be read in from files. A file may be read in to the current model using i, or to the true model using t. Either way, CaMML prompts you for a filename which must have a model in CaMML's format.
If you want to start your Gibbs sampling from a particular model, you should read that model in using i. You may then examine or modify it before starting the sampling run.
If you know the true model, read it in using t. If a true model is specified in this way, CaMML will provide an additional metric for comparing other models, because they can now be measured against the true model.
CaMML's model format is similar to its data format in that you must first specify the number of varaibles and the number of states for each variable. CaMML will check that your model has the same number of variables as the data file, and that each variable has the same number of states. If the check fails you are returned to the main menu. If your model passes these initial checks but is wrong in other ways, it will likely crash CaMML.
After those first two lines, each variable in order lists its parents and, optionally, its conditional probability table (CPT). Table 3.1 explains a small sample model file.
|
You must signal the end of the parents' list with either a -1 or a 0. A -1 says, ``End of the parents list for this variable, and no CPT info follows.'' CaMML then moves on to the parents list for the next variable, if any.3.1
A 0 signals the end of the parents list and the beginning of
the CPT for the current variable,
. The CPT for
will be read as
groups of
states, where
is the number of states in
, and
is the number of combinations of states of the parents of
. You can think of this as
rows of
entries per row, even
though you need not format the file in that way (though you can, and
for large hand-generated CPTs, it would be a good idea). Because all
probabilities in a row of a CPT must add up to 1, you need (and may)
only specify
probabilities. For example, if
is a binary
variable, you need only specify 1 probability per row.
Finally, the line breaks are not strictly necessary. But it's not like you need to conserve them, and they are necessary for human readability.
The list of kept models is that list of all the high-probability models found by the Gibbs sampling. The models counted for Gibbs sampling are already simplified representative models rather than actual individual models visited. For more details, see 4 and the papers.
After CaMML is done with the Gibbs sampling, it has a list of kept models. It then performs a more precise analysis of this set of models, to determine if some are indistinguishable given the data. If so, those models are joined into the most representative (simplest) of the bunch. The resulting set is the set of final models. Usually this is the set you want to work with. All the final models will be significantly different from each other. There message lengths may be close, but they will imply very different causal stories.
By default, the current model is the best of the final models. The model selection commands allow you to change this.
| model |
|
| lowest datacost model | |
| arc-probability based model | |
| true model |
Makes the kept model
the current model. You may then
analyze or modify this model.
Makes the final model
the current model. You may then
analyze or modify this model. It works just like m, but
operates on the final cleaned and joined models.
I'll talk first in Section 3.5.2 about printing general info about the model, including model structure. Then in Section 3.5.2 I will talk about printing the node details like the CPT (conditional probability table).
The three print commands p, pm, pn all use the same format.
All of them can be followed with the optional f which adds to
the standard report a representation of the arcs in the model. The
difference between them is that p prints only the current
model, while pm prints the standard information for all
kept models, and pn prints that info for all
final models. For example, the output for pf might be
as shown in Table 3.2.
If we ask for a full report with f option (pf, pmf, pnf), we get the final lines beginning with V (for `variable'), which depict the structure of the model. If we do not specify f, the report omits the graph structure.
Each structure line begins with a variable such as V 1, and gives the parents of that variable. An actual picture can be made from various utilities like dot. The graph for the current example is shown in Fig. 3.1.
There are a few flags which can appear on the V lines. They are explained in Table 3.3.
The command pd
will print details about node
,
including the number of dads, the RawCost, and the count table of the
number of times the node took a particular value, for all combinations
of its parent values. An example of the output of the pd
command is in Table 3.4.
If you select
, so pc -1, you will get the CPTs for all nodes. You
probably don't want to do this unless you are logging to a file! An
example of the pc command is in Table 3.5:
| fu: | Fill in all links |
| fd: | Fill in all links |
| c: | Clear all links, giving null model |
| a |
Add arc from |
| d |
Delete arc from |
| r |
Reverse arc from |
| s: | Simplify current model by deleting insignif arcs |
| j |
Reduce (join) kept models using bitmask |
All of these commands are self-explanatory except s and j. The s: (simplify) command removes dubious (insignificant) arcs from the current model. These arcs are marked with a `?' on the pf printout.3.2
The j: (join) command allows you to forcibly join the
kept models according to your desired method. For those of us who do
not think in bitmasks, here is what you enter:
| 0: | Join by absolute gain. |
| 1: | Join by relative gain. |
| 2: | Join by absolute gain, in any order. |
| 3: | Join by relative gain, in any order. |
Table of percent frequency of arcs from Vrow to Vcol
based on 200000 sample steps, total weight 1934
1 * 51.1 18.1 11.2 11.0 17.8 15.2 14.5
2 38.5 * 17.1 18.1 79.2 23.1 34.5 26.9
3 0.3 0.2 * 2.2 0.9 0.8 16.3 3.4
4 2.3 10.1 51.1 * 74.0 22.6 30.2 16.9
5 1.6 16.4 25.4 25.9 * 80.3 17.6 56.8
6 0.4 1.9 12.0 2.0 4.3 * 17.8 15.3
7 0.3 0.3 26.5 2.9 0.3 4.5 * 65.0
8 0.1 0.2 8.6 0.7 0.9 1.8 34.9 *
The table tells us that an arc from V2 to V1
::::: Distribution of model posteriors based on 21000 thousand steps
Numbers of skeletons of various posterior probabilities
>50.0 pc >25.0 pc >10.0 pc > 5.0 pc > 2.5 pc > 1.0 pc > 0.5 pc
0 0 0 0 7 5 28
Numbers of skeletons needed to reach post-prob:
10 pc 20 pc 30 pc 40 pc 50 pc 60 pc 70 pc 80 pc 90 pc
3 6 11 22 38 62 106 197 408
| Save results of sampling to file | |
| Restore sampling results from file | |
| ?: | Print this menu |
| q: | QUIT |
The Gibbs sampling step is where CaMML searches over many models and scores them. In theory, it does this by computing the prior probabilities and likelihoods for all possible models, collapsing them into indistinguishable subsets, and then choosing the most likely subset. In practice, that would take forever.
CaMML uses a Metropolis algorithm to sample the state space of all possible models (limited by any specified constraints). However, rather than just accumulate a simple count of the number of visits to each model, CaMML acts intelligently. Although CaMML walks in real model space, it does not count models. Rather, for every model CaMML visits on its Metropolitan walk, it computes a representative simplification, and counts only those representative structures. The reason is that there are many trivial variations which would otherwise be counted as separate models, even though they are statistically indistinguishable. All of the final models should be significantly different from each other.
Before the Metropolis search begins counting, CaMML goes through a simulated annealing phase where it does basically the same thing, but with a greater (and steadily decreasing) chance of moving ``downhill''. The goal is to escape local optimums that would otherwise trap it, before the real counting begins. Towards the end of the simulated annealing phase, CaMML re-estimates a few useful parameters (like the arc prior) which will figure in the Metropolis search proper.
Knowing how the search works, we can talk about one of the ``research only'' settings available: crippling.
Although the normal user would never want to use this feature, researchers are not normal users. In order to compare CaMML with other algorithms, you can cripple it in various ways. For those of us not used to thinking in terms of masking bits, here's what you actually enter:
| 0: | Cripple nothing (reset to default). |
| 1: | Cripple simplify only. |
| 2: | Cripple P(arc) encoding only. |
| 3: | Cripple both. |
If you cripple the simplification step, then CaMML will count each model independently. This will drastically reduce the prior probability for each model, and thus increase each model's message length. Worse than that, it makes it much less likely that you could find the best structure, for the simple reason that now in order to find it, you must visit that precise structure, not just any trivial extension of it. For example, in the classic Alarm network, Chris Wallace reports that a crippled CaMML will essentially never find the best structure, though an uncrippled CaMML will do so very quickly.
CaMML also normally uses an adaptive code to represent the presence or absence of arcs. In a sparse network, that means it takes just a fraction of a bit to say that an arc is missing, since you expect arcs to be missing unless told otherwise. Remember, the MML principle is based on having the most efficient code for each hypothesis. If you cripple the P(arc) feature, CaMML is forced to spend 1 bit to represent the presence or absence of each arc. That makes the true model more expensive to code, and therefore biases CaMML towards more complicated models, and impairs CaMML's performance.
When you print a model, you get some cryptic summary information like:
:::::::::::::::::: Current model.
6 arcs, -LogLike 2492.78, RawCost 2726.34
Same M 1 Becomes N 1
Rel Prior 1.000 Posteriors: SEQ: 97.78 MML: 97.78
Perms 4
Obviously it starts by telling us the number of arcs in the current model (6 in this case). The next two numbers on the first line represent the costs for this model.
-LogLike is the negative log of the likelihood of the data given the model. It is a measure of how well the model fits the data. In particular, it is the message length of the data, given the model, which is the second part of an MML message, and shorter is better.
RawCost is the total cost (message length) of specifying the model and the data given the model, but without regard to the fact that this model may be equivalent to other models. It is the RawCost which governs the Gibbs sampling process. The user will be more interested in the posterior probabilities, which take heed of which models are equivalent. (For example, two models having almost identical RawCost may nevertheless have vastly different posteriors.)
The RawCost displayed (at least in Discrete classic and rodo) is not
the actual raw cost:
The second line tells us something about the place of this model in
the kept and final hierarchy. Same M
means
that in the final cleaning and joining, the current model was judged
to be same as model
. Becomes N
means that after the
final filtering (clean and join), this model became the
final
model. If three kept models 2, 4 and 7 both reported
Same M 4, then they would all be collapsed to the same final
model N, and so would all report something like Same
N 1. When several models collapse in the final filtering, you may
sometimes see Part of N
as well as Same N
.
The third line reports the relative prior of the model and two posterior probabilities. The relative prior is the prior probability of this model compared to the prior probability of the best MML TOM. In MML, the simplest models have the shortest encoding and therefore the highest priors.
The SEQ posterior is the estimated probability that the true model is in the statistical-equivalence class of this model. The MML posterior is the estimated probability that the true model is in the MML equivalence class of this model. The MML equivalence class will always be larger than (or equal to) the SEQ class, so the MML posterior will always be greater than (or equal to) the SEQ posterior.
Remember that the posterior probabilities are really estimates, because they are derived from a Gibbs sampling process over all the possible models (and all possible parameters of those models). If you have doubts about the reliability of the estimates, vary the length of the sampling process or the temperature.
Perms is the number of permutations of variable orderings
consistent with generating the current model. For instance, if the
model is
, then there is only one permutation,
. However, if it has a common-cause structure
, it
has two permutations, namely
and
.
If you have specified a true model, you also get an Expected -LogLikelihood measure.
Finally, remember that the model shown is really the representative model for that MML equivalence class, in the same way that ``0.5'' may represent the range [0.25, 0.75).
I keep having to go back to the paper to figure out just how CaMML works, so I thought I would try to write down a step-by-step procedure here. However, Rodney noticed that while the paper describes Linear CaMML, Discrete CaMML has evolved quite a bit since then. Rodney kindly revised this section. So we will first present the algorithm in the paper (and used by Linear CaMML), and then introduce the more complex schemes used in Discrete CaMML.
Here is a simplified version of sections 6-8 of [9]. An even simpler schema for this algorithm is now given in chapter (7?) of [4].
main () {
Set hot = 3.0
Set temperature = hot
k3 = max(k^3,10^3)
// Standard Simulated Annealing search
cook( 20*k3 ) // cooking
for ( i = 1; i < 20; 1++) { // cooling
temperature = hot - (i/20)
cook( 2 * k3 )
}
// Long cook session to get good idea of arcProb
temperature = 1.0
cook( 40 * k3 )
}
// Sample N models keeping track of the average
cook( N ) {
totalArcs = 0
Repeat N times:
Pick one of the three mutators: do it
Calculate the change in LJP (dLJP) for this new TOM
Accept the mutation iff either:
1) dLJP is positive, or
2) ( exp (dLJP) / Temp ) > uniform(0,1)
If accepted, update LJP, C, and T, link matrix.
totalArcs = totalArcs + tom.numSignificantArcs
averageArcs = totalArcs / N
arcProb = (averageArcs + 0.5) / (maxArcs + 1.0)
}
The discrete version of classic CaMML algorithm has numerous changes. Although they do may not change the spirit of the algorithm, there is a good chance they could have a large impact on performance.
I suspect there are several reasons for the code changes:
anneal () {
temperature = 1.0
k3 = max(k^3,10^3)
cook( 7 * k3 )
// fill
Repeat 10 times
Clear all arcs
Randomize Total Ordering
Add all possible arcs to network(limited to 7 parents per node)
cook( 7 * k3 )
// clear
Repeat 10 times
Clear all arcs
cook( 7 * k3 )
// Anneal
hot = 2.0
temperature = hot
for ( i = 1; i < 12; i++ ) {
temperature = hot - (i / 12);
cook( 10 * k^3 )
}
temperature = 1.0
cook( 10 * k^3 )
}
// Perform N mutation steps searching for the best TOM.
// Recalculate arcProb based on best model
cook( int N ) {
Repeat N times:
Pick one of 4 mutations: do it. (fourth involves switching a parent of a node)
Calculate the change in LJP (dLJP) for this new TOM
Accept the mutation iff either:
1) dLJP is positive, or
2) ( exp (dLJP) / Temp ) > uniform(0,1)
If accepted, update LJP, C, and T, link matrix.
clean the best TOM found
arcProb = (cleanBestTOM.numArcs + 0.5) / (maxArcs + 1.0)
}
Begin with the final model from Anneal phase.
Temperature = 1.8
k3 = max(k^3,10^3)
Repeat 200000 k3 times:
Pick one of the mutators (respecting priors): do it
Calculate dLJP
If (dLJP > 0) or (exp (dLJP) / Temp > uniform(0,1)):
* accept change, update LJP, C, T, link matrix
* update arc frequency table
* compute the MML model M' for this TOM:
- clean the TOM (remove insignif. effect arcs)
- compute the skeleton S and likelihood L for the clean TOM
* Increment the visit count for M':
- h1 = hash S and L using symmetric random matrix 1
- h2 = hash S and L using symmetric random matrix 2
- increment visit count at indices h1 in H1 and h2 in H2
* Calculate weight of current TOM relative to best TOM (best TOM has weight 1.0)
- weight = exp((bestMML-currentMML)/temperature)
* Add weight to visit count for M's DAG
- h3 = hash of S using asymmetric randmatrix a
- h4 = hash of S using asymmetric randmatrix b
- Add weight to visit count at indices h3 in D1 and h4 in D2
- Add weight to totalWeight
* If the SMALLER visit count for M' is now in the top 50,
replace one of the top 50 MML models with M' [??]
Further clean the (up to 50) retained models
* Use a more accurate Fisher Info, and merge any
which become the same
* Then test for MML equivalence using KL-based
measure of expected change in msglen: if < 0,
merge. (Requires reestimation of priors by
inverted Bayes formula.)
Posterior prob = sum(weights of represented TOMs) / totalWeight
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -dir html -split +0 -local_icons -up_url http://www.datamining.monash.edu.au/software/camml -up_title 'CaMML home' userguide.tex
The translation was initiated by Charles Twardy on 2004-07-23