Interface CorpusI


public interface CorpusI

The interface for an implementation that collects and analizes a corpus of documents, to be used in a Web Search Application.


Method Summary
 java.util.Collection<WDocI> getCorpus()
          return the corpus of documents previously collected
 int getDepth()
          get the depth to which the web crawler should seach when gathering documents for this corpus
 java.lang.String getRootPage()
          return the path to the root page of the corpus
 java.util.Set<java.lang.String> getStopWords()
          return the collection of stopwords used to process the documents in the corpus
 int howManyDocs()
          report the number of documents in the corpus
 void reset()
          reset the corpus.
 void setDepth(int newDepth)
          set the depth to which the web crawler should seach when gathering documents for this corpus
 void setRootPage(java.lang.String newRoot)
          set the root page of the corpus
 void setStopWords(java.util.Set<java.lang.String> newStopWords)
          set the stopwords list to be used when processing documents
 

Method Detail

getRootPage

java.lang.String getRootPage()
return the path to the root page of the corpus

Returns:
the root page

setRootPage

void setRootPage(java.lang.String newRoot)
set the root page of the corpus

Parameters:
newRoot - the new root of the corpus

getDepth

int getDepth()
get the depth to which the web crawler should seach when gathering documents for this corpus

Returns:
the current search depth

setDepth

void setDepth(int newDepth)
set the depth to which the web crawler should seach when gathering documents for this corpus

Parameters:
newDepth - the new depth to which the crawler should search

getStopWords

java.util.Set<java.lang.String> getStopWords()
return the collection of stopwords used to process the documents in the corpus

Returns:
the collection of stopwords

setStopWords

void setStopWords(java.util.Set<java.lang.String> newStopWords)
set the stopwords list to be used when processing documents

Parameters:
newStopWords - the new collection of stopwords

getCorpus

java.util.Collection<WDocI> getCorpus()
return the corpus of documents previously collected

Returns:
the collection of all documents

howManyDocs

int howManyDocs()
report the number of documents in the corpus

Returns:
the number of docs in the corpus

reset

void reset()
reset the corpus. for the current root, depth, and stopwords, crawl the site, gather the documents, process them