Calendar

  • 7 May 2008

    15:00-17:00
    Slides

  • 8 May 2008

    9:00-12:00
    Slides

  • 9 May 2008

    9:00-11:00

  • 12 May 2008

    15:00-17:00Slides

  • 13 May 2008

    9:00-11:00

  • 14 May 2008

    9:00-11:00Slides

  • 15 May 2008

    9:00-12:00Slides

  • 16 May 2008

    9:00-11:00Slides

  • 19 May 2008

    9:00-11:00Slides

prof. Marcello Federico
Fondazione Bruno Kessler, Trento

Overview

Machine Translation (MT) is one of the oldest and still far from solved challenges undertaken by computer science. The course will present an overview of the history, approaches, progress and difficulties of MT. The central topic of the course will be the statistical MT (SMT) approach introduced in the early 90's at IBM. In particular, the following topics will be covered:

  • statistical framework of MT
  • word alignment models
  • phrase-based translation
  • log-linear models
  • training and search algorithms
  • minimum error training algorithms
  • advanced topics:
    • spoken language translation
    • syntax-based SMT
    • efficiency issues
  • performance evaluation
  • overview of publicly available software.

Alternative MT approaches will be discusses, and current research trends in the field will be presented.

The course is part of the activities of the Dottorato in Informatica (PhD in Informatics) at the Università di Pisa.

Bibliography

Exam

Students are requested to prepare a written report reviewing one of the above journal papers, or equivalently two conference papers about the same topic.
Papers have to be selected with the advice of the teacher. The report should have the following structure: introduction to the topic (500 words), explaining
its importance, difficulty, and possible solutions; description of the approach(es) in the paper(s) (750 words); presentation of the experimental results
(750 words): tasks, data sets, evaluation methods, results; conclusions and considerations by the authors (500 words); discussion by the student (500 words),
pointing out possible weak and strong points, directions for improving or enhancing the work, and connections with the course material. Reports have to be
written in English, with one the following Latex or Word style files, and sent as pdf files to the teacher.