Search Engines and Question Answering

 

Docente

Giuseppe Attardi

mail: attardi@di.unipi.it

Ufficio: 292

Syllabus

Modelli di Information Retrieval

  1. Boolean and vector-space retrieval models
  2. ranked retrieval
  3. text-similarity metrics: TF-IDF (term frequency/inverse document frequency) weighting; cosine similarity
  4. performance metrics: precision, recall, F-measure.

Indexing & search

  1. Indexing and inverted files
  2. Compression
  3. Postings Lists
  4. Query languages

Web Search

  1. Search engines Architecture
  2. Crawling: parallel/distributed, focused
  3. Link analysis (Google PageRank)
  4. Scaling

Question Answering

  1. Information extraction
  2. Named Entity Recognition
  3. Natural Language Processing
  4. Part of Speech tagging
  5. Question analysis and semantic matching

Lezioni

Data

Materiale

Argomento

12/11/03

Introduction

Information Retrieval Models; ranked retrieval; text similarity metrics; TF-IDF weighting; cosine similarity; performance metrics: precision, recall, F-measure.

19/11/03

Indexing and Search
Web Search Engines

Indexing and Search
Web Search

Crawling

Web dynamics; Crawl Strategies; Focused Crawling; Distributed Crawling; Crawler Architecture; Refresh Strategies; Duplicate and Mirror Detection.

Web Mining

Text Analysis.
Categorization: Naive Bayes, Decision Trees, Maximum Entropy, k-Nearest Neighbor, Support Vector Machines

26/11/03

Question Answering

Question Answering

Seminari

Date

Title

Speaker

14/05/04

Clustering

Iwona Bialynicka-Birula

14/05/04

Crawling

Claudio Scordino

26/05/04

State of the art NL Parsing

Razvan-andrei Popescu

14/05/04

Interpretation as abduction

Maurizio Atzori

Testi di Riferimento

         Modern Information Retrieval, R. Baeza-Yates, B. Ribeiro-Nieto, Addison Wesley, 2000.

         Managing Gigabytes, 2nd Edition, I.H. Witten, A. Moffat, T.C. Bell, Morgan Kaufmann, 1999.

         Foundations of Statistical Natural Language Processing, C. Manning and Shutze, MIT Press, 1999.

         Mining the Web: Analysis of Hypertext and Semi Structured Data, Soumen Chakrabarti, Morgan Kaufmann, 2002.