ID3 Algorithm

Introduction

This is my implementation of ID3 algorithm from Machine Learning field. As I was unable to find similar solutions for .NET platform I decided to write my own solution that uses ID3 algorithm to calculate decision tree for a given input.
Regarding ID3 algorithm, detailed explanation about the theory behind is available in many places on Internet, but my personal suggestion is
Tom Mitchel "Machine Learning" http://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/0070428077/ref=sr_1_1?ie=UTF8&qid=1381348257&sr=8-1&keywords=tom+mitchell+machine+learning

First of all, I have to say that user interface is not the most intuitive one, as I had a little time to work on it, but everything else is OK, at least by my standards :-).

Algorithm

Then, a few words about implementation. I have used Tree structure to hold temporary (as well as final) nodes of a decision tree. Tree structure was implemented using classes shown below (in the picture).
Class Diagram Id3.png
Class Attribute is used as a container for input instances and classes TreeNode and Branch are used for Tree construction purposes inside DecisionTree class.As you can see, most of the work is done inside DecisionTree class. But although it looks like there are many methods inside it, in fact, it has a lot of helper methods that offload work from more important methods.
The most important method from DecisionTree class is FormId3DecisionTree(). It was written using slightly modified pseudocode below:

Pseudocode.png
Although it looks complicated, it is a very simple algorithm. Here is my implementation using recursion as that is the only possible way of forming the tree.
FormId3DecisionTree.png
So, to further explain what I did. First two if check whether all instances are of the same type. If yes we don't have any more job left to do, so return YES/NO depending on instance class and stop this branch of recursion. In case all instances are not of the same class we calculate the best attribute by Information Gain measure (look at the theory for clarification of Information Gain).
After we have chosen the best possible attribute, logic is the following. For each attribute value pick only those rows from input that have that specific value in column that matches the chosen attribute (by IG measure). Then, recursively, call the method only on those instances(if there are any). In case there aren't any instances with specified value remove the node representing that value from tree (this step is not in pseudocode, all others are).

Usage examples

After all fields inside WinForm are inputed:
Slika 1.png

After Algorithm performs calculation:
Slika 2.png

After openint .txt file from the disk where decision tree is stored:
Slika 3.png

Conclusion

To conclude with, this is only a short homepage about my project on Id3 algorithm. I hope you like it. If there would be some further interest I could explain my steps in more detail. Feel free to leav your comments. Thank you

Last edited Oct 9, 2013 at 8:19 PM by Gandalf_The_White, version 12