CSCE 410/810

Information Retrieval Systems

Class Syllabus

Spring 2006

 

Instructor

 

Name:              Prof. Leen-Kiat Soh                             E-mail:             lksoh@cse.unl.edu

Office:             122E Avery Hall                                   Phone:             (402) 472-6738

Office Hours:    12:30-2:00 PM TR                               Class Time:       2:00 – 3:15 PM TR

Classroom:       Avery Hall, Room 109

Website:           http://www.cse.unl.edu/~lksoh/Classes/CSCE410_810_Spring06/

 

Catalog Listing

 

Outline of the general information retrieval problem, functional overview of information retrieval.  Deterministic models of information retrieval systems; conventional Boolean, fuzzy set theory, p-norm, and vector space models.  Probabilistic models.  Text analysis and automatic indexing.  Automatic query formulation.  System-user adaptation and learning mechanisms.  Intelligent information retrieval.  Retrieval evaluation.  Review of new theories and future directions.  Hands-on experience with a working experimental information retrieval system. (3 cr)

 

Class Objectives

 

The objective of this class is to introduce students to the fundamentals of information retrieval systems.  The course is organized into four stages.  First, the class will start by studying basic concepts in IR models, retrieval evaluation, query languages, query operations, text operations, and indexing and searching.  In the second stage, the class explores more advanced topics such as TREC, parallel and distributed IR, multimedia IR, web searches, and Google.  The third stage of the class focuses on interdisciplinary research issues such as digital libraries, and visual information retrieval.  Finally, the fourth stage of the class will be seminar-oriented, with presentations in the areas of recent TREC tracks such as genomics, terabytes, robust retrieval, spam filtering, enterprise search, etc.

 

Required Background

 

Prerequisites: CSCE 235, 310, or permission.  Programming experience in a high-level language (C, C++ or Java), knowledge of data structures (e.g., binary search trees, linked lists, hash tables) and experience with the UNIX or LINUX operating systems.

 

Text Book and Reading Material

 

Baeza-Yates, R. and B. Ribeiro-Neto (1999).  Modern Information Retrieval, New York: Addison-Wesley.  (Required)

 

Papers from TREC 2003, 2004, and 2005.  (http://trec.nist.gov)

 

Grading

 

Final grades in this class will be assigned based on the following scale:

 


A:         94% - 100%

A-:       90% - 93%

B+:       87% - 89%

B:         83% - 86%

B-:       80% - 82%

C+:      77% - 79%

C:         73% - 76%

C-:       70% - 72%

D+:      67% - 69%

D:         63% - 66%

D-:       60% - 62%

F:         below 60%


 

A+ is awarded to a student whose work and understanding of the class prove to be exceptional.

 

There will be (1) two homework assignments (a total of 20% of your grade), (2) one midterm exam plus a pre-requisite quiz (30%), (3) one presentation (10%), and (4) one final project (may be group) (40%) for each student.  800-level students will be required to solve additional problems or complete additional tasks for the above assignments.

 

Homework assignments that involve programming will be graded as follows: 45% Program Correctness, 15% Software Design, 10% Programming Style, 15% Testing, and 15% Documentation.

 

The Final Project will be graded in two parts: programming (50%) and report (50%).  The programming part will be graded similarly to programming homework assignments.  The report will be graded as follows: 50% Design Description and Discussion, 25% Organization, 15% Requirements, 10% Grammar and Errors

 

The presentation (of a technical paper) will be graded as follows: 50% Summary of Paper, 20% Organization, 20% Conclusions: Comparisons, Insights, etc., and 15% Q&A and Participation.

 

Academic Misconduct

 

Violations of academic integrity will result in automatic failure of the class and referral to the proper university officials.  The work a student submits in a class is expected to be the student’s own work and must be work completed for that particular class and assignment.  Students wishing to build on an old project or work on a similar topic in two classes must discuss this with both professors.  Academic dishonesty includes: handling in another’s work or part of another’s work as your own, turning in one of your old papers for a current class, or turning in the same or similar paper for two different classes.  Using notes or other study aids or otherwise obtaining another’s answers for an examination also represents a breach of academic integrity.  Sanctions are applied whether the violation was intentional or not.

 

Those who share their code or writing and those who copy other’s code or writing will be penalized in the same way; both parties will be considered to have plagiarized.