CSCE 410/810

Final Project Assignment

 

October 30, 2003

 

Problem

 

Define a programming project that implements a system related to the advance topics of information retrieval such as query modification, thesaurus construction, ranking algorithms, and clustering.  You may also look into routing, question answering, confusion, interactive search, and other issues in IR. 

 

The goal of this project is to motivate you to build an IR-related tool, test it, and analyze it.

 

This assignment counts 25% towards your grade.

 

Requirements

 

(1)               Proposal summary.  Write a 2-page summary about your proposed final project and turn it in to me before November 11, 2001.  Turn it in as early as possible so I can approve your proposed project and you can start early on your final project.   You proposal must also state what data you want to use in your final project. 

 

(2)        You are required to build a program that addresses some IR-related issues.  There are three general approaches to this:  (a) build a simple program but perform a rigorous experiment with many document collections, (b) build a complex program and test it on a simple document collection, or (c) build a moderately complex program and test it carefully on some document collections.

 

(2)               Document Collections.  I have many document or data collections: Topics from TREC conferences, confusion data, interactive data, routing data, questions and answers data, Reuters news, etc.  Please discuss with me what kind of data you want for your Final Project.  The collections are big and they are not put online.   Some data are available at our class handin account:

 

/home/grad/Classes/cse410/DATA/

 

There are documents similar to the document collections used in your homework assignment #4.  You may also download your own document collections from the web. 

 

If you want some other data, please come see me and I will provide them.

 

Hand In

 

 (1)       A comprehensive report that includes: (a) Introduction: the description of the problem you are addressing, why do you think it is important, and so on; (b) Design: the description of your implementation approach, solution strategy, styles/design, and so on; (c) Results: the experiments, datasets, discussion of results, comparisons with other literature, and so on; (d) Possible extensions and future work, (e) Conclusions.  In your appendix: (a) the instructions on how to run your programs, (b) results/output/graphs, (c) the printout of your programs.

 

(2)        You MUST make sure that your programs run on CSE platforms and your instructions on how to run your programs must be clear.  We have only a couple of days to grade your final project to turn in the final grades.  If we cannot run your programs, we will not have time to contact you to get it to work.  So, please keep this in mind.  To be ABSOLUTELY sure, you may want to turn in your programs earlier.

 

(3)               Turn in your homework electronically using the handin account and turn in a hardcopy of your report at my office.  

 

The assignment is due 8:00 a.m. December 18, 2003.  The following table specifies the penalties for late homework.

 

Time Turned In

Penalty

8:00 a.m. (12/18/2003)

None

Later than 8:00 a.m. (12/18/2003)

Not accepted

 

Grading

 

The Final Project will be graded in two parts: programming (50%) and report (50%).  The programming part will be graded as follows:

 

(1)        50% Program Correctness (including the accessibility of the programs)

(2)               10% Software Design

(3)               10% Programming Style

(4)               20% Testing

(5)               10% Documentation (in-program documentation)

 

The report will be graded as follows:

 

(1)        15% Introduction

(2)        20% Design

(3)        30% Results

(4)        10% Possible Extensions and Future Work

(3)        15% Conclusions

(6)        10% Appendices