Math 176 - Homework #1

Set Operations using Threaded AVL Trees

Due date: Friday, October 13, Midnight.

This programming assignment is covered by special Academic Integrity Guidelines.

New: I have posted some suggestions and software to help you debug your programs.
Newer: The turnin procedure is now described in Step 5 below.
Newest: Sam Buss's own solution is now available.  Please note that this code contains extra items that were not required for the Math176 programming assignment.  Also, my implementation used sentinals, in the hope that this would simplify the code (but it actually slightly complicated the code) --- we recommended that you not use sentinals in your solution.   The code is available here: ThreadedAvlTree.java.

Overview: This homework assignment requires you to implement the basic operations for a threaded AVL tree including the ability to create an iterator that allows sequential access to the elements stored in the AVL tree.  You must implement the AVL so that it extends the Java class AbstractSet.   The associated interator must implement the Java specification of Iterator.    This will allow you to easily compare your AVL tree implementation of a set with the built-in Java 1.2 classes for data structures.  A special class CountComparable extends Comparable has been written that will allow you to easily gather statistics on the number of comparisons performed during the data structure operations, allowing you to indirectly compare the performance of your AVL trees against the built-in Java data structures.

These instructions may change somewhat or be augmented: please watch for announcements on this, or check back to this page.

The outline of the homework assignment is follows:

  1. Write a threaded AVL tree implementation of a set.  This must a Java class named ThreadedAvlTree and it must extend AbstractSet.  The basic operations it must support are:

    1. a constructor: public ThreadedAvlTree().
    2. an "add" or "insert" method: public boolean add( Object o ).
    3. a size operation:  public int size().
    4. an iterator creater: public Iterator iterator().
    5. an "remove" or "delete" method: public boolean remove( Object o ).

    Your implementation must obey the Java implementation standards for AbstractSet's: namely, the return code for add and remove indicate whether the set was changed as a result of the operation.  The add function will not add another copy of an object that is already present.  The iterator must implement hasNext(),  next() and remove() exactly as specified by the Java 1.2 specifications for Iterator.

  2. Use the supplied CountComparable class to wrap Comparable objects so as to keep track of the number of comparisons used when inserting and removing objects from Set's.   Run tests with the supplied data file, first with the Java LinkedList class, then with the Java TreeSet class (which is based on Red-Black trees) and then with your threaded AVL tree implementation.   Gather statistics and prepare a table reporting the average number of comparisons used per insertion and deletion operation.  You should see a dramatic difference between the O(n2) algorithms that a linked list uses, and the O(n log n) algorithms that balanced trees use.  

To do the homework you should do the following steps:

  1. In the directory ../public/ProgHomework1, there are main programs MainHw1 and MainHw1IO.  The MainHw1 shows examples of how use the Java classes LinkedList and TreeSet.  You should learn how to use these classes if you are not already familiar with them:  good ways to learn this is to look at the appendix in the text book and to read the online Sun java documentation at www.javasoft.com.
        You should also examine the use of the supplied class CountComparable and understand how it works and how to use it.
        Documentation for these classes in HTML format is available from the directory ../public/ProgHomework1, or on the web via ftp, at http://math.ucsd.edu/~sbuss/Math176/ProgHomework1/, or go directly to the following HTML files for documentation:  MainHw1.html, MainHw1IO.html, CountComparable.html, and ThreadedAvlTree.html.

        The program MainHw1 should provide you with a good skeleton for a main program for testing your Threaded Avl Tree implementation.

        Later, you will need to read commands from a file to gather statistics on the behavior of your Avl tree, and on the Java classes of red-black trees and linked lists.  The program MainHw1IO shows how to read from files and how to parse an input line into tokens with a StringTokenizer.

  2. Write and debug your threaded Avl tree class and iterator.  At first, do not try to implement any remove methods.  The iterator must be implemented as an inner class named AvlIterator.

  3. Extend your threaded Avl tree class and associated iterator class to support the remove operations.  Don't forget to check the description of how to rebalance after deletion.

  4. Once you have completed step 3 (or step 2, if you are unable to finish step 3), gather statistics.   There is a file named hw1Data in the same directory ../public/ProgHomework1.  This contains a series of lines with the format: "A xxxxx" or "D xxxxx" where "xxxxx" denotes a string of symbols.  These lines are commands to either add or delete the corresponding string from the set.  (If you have not implemented remove methods, then just skip over the delete commands.)  Sometimes, the delete commands will ask to delete a word that is not present (this happens about 10% of the time): in this case the set is not to be changed.  Sometimes an add command will ask you to add a string that is already present in the set: again, in this case the set is not to be changed, since sets do not support the presence of duplicate objects.
        Run these commands on the data structures of (1) LinkedList, and (2) TreeSet, and (3) your implementation of ThreadedAvlTree.  Do this for the first N add commands (and the delete commands which appear before the N-th add command), letting N equal 100, then 1000, then 10000, then 100000, then 1000000 --- but stop whenever the algorithms become so slow as to require more than 5-10 minutes of total running time.  You can use larger data sets by increasing the heap size of the Java virtual machine which is controlled by the -Xmx command line option to java.  (Run java -help and java -X for information on the java machine command line options).  You should expect the balanced tree structures to work well, until main memory is exhausted at which point, the program begins to page virtual memory from the disk and runtimes will become extraordinarily bad.
        Write a short report or table giving for each test: the number of adds attempted, the number adds which failed due to trying to insert duplicates, the number of delete attempted, the number of deletes which failed since the element was not present, the total number of comparisons performed, and the average number of comparisons per attempt to add or delete (i.e., per line processed from the file).  You may include additional information in the table if you wish, but you must include at least the items mentioned.  Your table/report should be prepared as a plain text file.

  5. You must turn in:

    1. A text file, named README, with the results from step 4.  Your report must also include a description (a short paragraph) describing how much of the homework you completed, and any special circumstances regarding your homework solution.
    2. A file ThreadedAvlTree.java with your source code.   This file will graded by an automated procedure, so it is important that it can compile on the ieng9 machine, and that you use the exactly correct interface for your classes and methods.
    3. The "turnin" procedure is as follows.  You must create two files: one named ThreadedAvlTree.java and the other named README.   Both files should be text files.  Lines in the README file should be at most 80 columns.  Place both files in a directory, and from that directory give the command bundleP1.   ("bundleP1" stands for bundle up programming assignment #1).  This command will check that the required files are present and then turn them in.
    4. In you later run bundleP1 again, it will overwrite all of your previously submitted homework.  (So: do not turn the files one at a time!). 
    5. If you get error messages that appear not to be your fault, please email me immediately at sbuss@math.ucsd.edu
    6. Just in case something goes wrong with the turnin procedures:  Keep your files on ieng until you have received your programming assignment grade.  In addition, do not modify them so that we can verify the last modified dates if necessary.

     

  6. Testing suggestions.  We will provide you with a program that checks whether your class definitions are correct.  Also, I will provide you with a program that I used for checking the AVL properties of trees, the correctness of thread pointers; you will need to modify this program since it is unlikely you will implement AVL trees in exactly the same way that I did..  Finally, you should be able to test your AVL trees, by using a TreeSet (red-black tree) from the Java library and checking whether it gives exactly the same results as your AVL tree implementation. 
        It is OK to do your program development on another machine other than ieng9, however, the final version must run on ieng9 and it would behoove you to allow a day or two extra time to make sure it runs there.  It is also OK to report your results in step 4 as run on another machine, but in this case, please report also on the machine type and especially on how much RAM memory it has.

  7. All programming work must be your own.  You may get help from TA's, from fellow students, etc., but must do your own work, and especially must "internalize" all advice, i.e., be able to understand everything well enough that you could re-implement it on your own.  In particular, you should not use code either verbatim from any source or which is a straightforward translation of some one else's code.  More information on what kinds of assistance are permitted can be found in the Academic Integrity Guidelines.   If you are not sure what kind of outside assistance is allowed, discuss it with me or a TA.

  8. Grading: The grade for your programming assignment will be based on the following (percentages and categories are preliminary and I reserve the right to change them based on the class performance).

    1. Correctness of the class definitions and method specifications.   Programming style (~ 5%).
    2. Table of data in your report is complete and numbers appear to be correct.   (~20%)
    3. Add and iterator are correctly implemented. (~40%).
    4. Size and hasNext are correctly implemented. (~10%)
    5. Remove is correctly implemented. (~25 %)

    The percentages do not correspond to letter grades in the traditional way.    For example, if you completely do parts a.-d., but not e., this would be viewed as good work, i.e., a "B" letter grade.