Quick Start Guide to DTree Copyright 2001 Sarah George PURPOSE OF THIS GUIDE --------------------- This is not a replacement for the official documentation (dtree.doc). You can use this guide to find out what dtree does without needing to become an expert. This will help with deciding if the program is useful to you, and can also help to make the official documentation less scary to read. THE SHALLOW END --------------- Here's some data: cse1301 cse1303 Job satisfaction marks marks (Great, Ok, Poor) ------- ------- --------- 50 50 Poor 70 70 Ok 100 100 Great 90 70 Great 55 60 Poor 55 100 Great 100 55 Poor If we treat job satisfaction as a class for each student, it is easy in this slightly contrived example to predict their class from their marks (attributes). A Decision Tree (dtree) gives a series of simple decisions (eg. mark < 70) that define categories for each thing (student). If we try to rig the tree so that each category consists mostly of one class, the categories become good predictors for a things class. This means we can predict the class for nearly all our data, reducing the size of the message needed to describe what class each thing belongs to. (For example, we could just send class info for the things that our tree doesn't predict) The DTree program takes some data and works out a near optimal dtree to classify that data. Once you have the tree, automatic classification is very simple and very fast. FEEDING OUR EXAMPLE TO DTREE ---------------------------- Here's an input data file that describes our dataset. I've commented it here but dtree itself doesn't accept comments. 3 2 # Number of classes, Number of attributes 1 1 # What sort of attribute (1 = continuous) for each attribute 1 1 50 50 # Thing ID, Thing Class, list of attributes in order 2 2 70 70 # (as above, repeated for each thing) 3 3 100 100 4 3 90 70 5 1 55 60 6 3 55 100 7 1 100 55 The Thing ID field just provides a way to identify particular things. The Thing Class is numeric, so I've used this mapping: 1 = Poor, 2 = Ok, 3 = Great. Now you run the program and type the name of the input file, eg1dat and the name for a result file (I suggest eg1dat.out). Type "help" at the prompt for a list of commands. TELLING DTREE TO MAKE A TREE ---------------------------- At dtree's prompt, type "f" to print out the full tree. DTree will show its working as it goes (eg the cost of trees it considers) and finally it will list its tree in "Short Form". With our data, dtree will decide to split on attribute 2 (the cse1303 marks), at the value 65. So there are two categories: mark2 < 65 and mark2 > 65. The first category consists entirely of students with Poor job satisfaction. The second category mostly consists of students with Great job satisfaction. There is one exception in the second category: one of the students only had Ok job satisfaction. dtree reports this tree as follows: Short Form RootCat Spliton 2 at 65.000 1 Cat 2 Class 1( 3) 2 Cat 3 Class 3( 3), 2( 1) Now use the "q" option to quit the program. FULL DOCUMENTATION ------------------ The full, "official" version of the documentation is available in the file dtree.doc. Do read this if you plan to use the program for anything serious, there is a lot that the Quickstart Guide doesn't cover.