CSCE 410/810

Homework Assignment 1

 

September 2, 2003

 

Problem

 

Use your favorite search engines to download electronic copies of journal/conference/book articles in the area of information retrieval.  That is, you are required to not only retrieve a list of such articles but also the actual files of the articles in postscript (.ps) format or PDF. 

 

Some useful search engines include www.profusion.com (general), www.google.com (general scientific), dblp.uni-trier.de (database), citeseer.nj.nec.com/cs (computer science), and hpsearch.uni-trier.de (homepage search of researchers).  You are free to use other search engines.

 

The articles of interest should be related to information retrieval: indexing, retrieval, routing, query manipulation, lexical processing, and other areas.

 

Some useful keywords include “information retrieval,” “document indexing,” “retrieval evaluation,” “inverted files,” and so on.  Other useful keywords include the terms and definitions that we have covered in the class.

 

The goal of this exercise is to familiarize yourself with searching for general papers for downloads (for a background search exercise) and searching for the complete reference of a specific paper (for a target search exercise) on the Web and the related issues, preferences, and problems.

 

This assignment counts 5% towards your grade.

 

Requirements

 

Exercise 1                   Background Search

 

You are required to download 10 postscript or PDF journal/conference/book articles in the area of information retrieval.  You are required to describe how you obtain each article.  Depending on your search effort, you may come upon a useful repository of articles and be able to obtain all 10 articles from that single site; sometimes you may have to search different sites for the 10 articles.  Document your experience.  For example: Which search engine did you start with? Which keywords did you use?  Which site did you start with?  Which links did you use to go from the first site to the second, the second to the third, and so on, to the final destination where you obtained the desired articles?  How many broken links did you encounter?  How many articles you found did not come with electronic copies?  How many articles you found only had electronic abstracts?  Which sites were useful?  How many dead-ends did you come to?  Describe your search strategy.  For example, did you start with general search terms such as “information retrieval” and focus your search with more specific terms such as “inverted files”?  Did you try to find the citation of the articles first using search engine A and then locate the electronic copies of the articles using site B?

 

Also, you are required to report the time you spent on getting the 10 articles: how many minutes did you actually spend in searching before finally having all 10 articles in your account?  Moreover, report the number of articles that you downloaded in the first 15 minutes, the first 30 minutes, the first 45 minutes, and so on (in 15-minute increments).  Note that how much time you spent will not affect the grade of this homework, so you are encouraged to be truthful in your reporting.

 

Exercise 2                   Target Search

 

You are required to obtain the complete reference of the following journal paper:

 

“Information Retrieval and Artificial Intelligence”

 

published in 1999.  That means you need to find out the list of authors, the name of the journal, the volume and issue of the journal, and the page numbers of the journal where the paper appeared.

 

Once again, you are required to describe how you obtain the complete reference for the above paper.  Document your experience.  For example: Which search engine did you start with? Which keywords did you use?  Which site did you start with?  Which links did you use to go from the first site to the second, the second to the third, and so on, to the final destination where you obtained the desired information?  How many broken links did you encounter?  Which sites were useful?  How many dead-ends did you come to?  Describe your search strategy.  For example, did you start with general search terms such as “information retrieval” and focus your search with more specific terms such as “artificial intelligence”? 

 

Also, you are required to report the time you spent on getting the complete reference.  Note that how much time you spent will not affect the grade of this homework, so you are encouraged to be truthful in your reporting.

 

NOTE:  You may not be able to obtain the complete reference of the above paper.  However, it is important to document your search process (strategies and tactics).

 

 

Hand In

 

(1)        A report that documents your experience, including details as described above and whatever other details that you think are useful.  This report should include

(a)  A list of articles that you have downloaded: 

a.       For journal papers: Author(s), Year, Title, Publication, Volume, and Page Numbers.

b.      For conference proceedings papers: Author(s), Year, Title, Conference Title, Dates, Place, and Page Numbers.

c.       For book chapters: Author(s), Year, Title, Book Title, Author(s) or Editor(s) of the Book, Page Numbers, and Publisher.

(b)   A conclusion that describes your search strategy and your thoughts on the importance of keywords, roles of the search engines, usefulness of the sites, and other insights that you might have learned, for both Exercises.

 

IMPORTANT:  Hand in the postscripts or PDF files of the above articles to the CSE class handin account.

 

The assignment is due 9:30 a.m. September 9, 2003 in the beginning of the class.  The following table specifies the penalties for late homework.

 

Time Turned In

Penalty

9:30 a.m. – 9:35 a.m. (9/9/2003)

None

9:35 a.m. – 10:45 a.m. (9/9/2003)

Lose 10%

10:45 a.m. – 5:00 p.m. (9/9/2003)

Lose 20%

Later than 5:00 p.m. (9/9/2003)

Not accepted

 

Grading

 

(1)        40% on your report

(2)        15% on the completeness of the list of articles (including the information for each article listed above) (Exercise 1)

(3)        5% on the relevance of your articles (Exercise 1)

(4)        5% on the availability of your account directory where the files are stored (Exercise 1)

(5)        10% on the completeness of the reference for the specific journal paper (Exercise 2)

(5)        25% on your conclusion