MIDT (Merged Italian Dependency Treebank) is a first attempt to merge two existing Italian treebanks: TUT and ISST-TANL

Contents

MIDT Specifications

MIDT Mappings

MIDT Resources

Available MIDT resources include data obtained by automatic conversion from

  • ISST-TANL, developed as a joint effort by the Istituto di Linguistica Computazionale (ILC–CNR) and the University of Pisa and originating from the Italian Syntactic–Semantic Treebank or ISST; a previous version (ISST-CoNLL) was used for the CoNLL-2007 Shared Task on multilingual dependency parsing;
  • the Turin University Treebank (TUT), developed by the NLP group of the University of Turin.

A set of around 2000 questions where produced directly in the MIDT format and revised.

The composition is detailed below.

Original formatSourceGenreSize in tokensSize in sentences
TUTEvalita 2011 Dependency parsing, codcivLegal texts307471250
TUTEvalita 2011 Dependency parsing, costitaLegal texts 12703682
TUTEvalita 2011 Dependency parsing, eudirLegal texts6949201
TUTEvalita 2011 Dependency parsing, newsNewspaper articles from "Corriere della sera" and periodicals19301775
TUTEvalita 2011 Dependency parsing, vedchSentences including the verbs "vedere" and "chiamare"12045400
TUTEvalita 2011 Dependency parsing, wikiWikipedia articles15813534
ISST-TANLEvalita 2011 Domain adaptation, isst-tanlNewspaper articles809904136
MIDTTexts from various QA competitionsQuestions219682228
TOTAL20051610206

MIDT Download

MIDT version 1.0
MIDT version 1.1

References

  • C. Bosco, V. Lombardo, L. Lesmo, and D. Vassallo. 2000. Building a treebank for italian: a data-driven annotation schema. In Proceedings of LREC’00, Athens, Greece.
  • S. Montemagni, F. Barsotti, M. Battista, N. Calzolari, O. Corazzari, A. Lenci, A. Zampolli, F. Fanciulli, M. Massetani, R. Raffaelli, R. Basili, M. T. Pazienza, D. Saracino, F. Zanzotto, N. Mana, F. Pianesi, and R. Delmonte. 2003. Building the Italian Syntactic-Semantic Treebank. In A. Abeill´e, editor, Building and Using syntactically annotated corpora. Kluwer.
  • C. Bosco, S. Montemagni, A. Mazzei, V. Lombardo, F. Dell’Orletta, and A. Lenci. 2009. Evalita’09 parsing task: comparing dependency parsers and treebanks. In Proceedings of Evalita’09, Reggio Emilia, Italy.
  • C. Bosco, S. Montemagni, A. Mazzei, V. Lombardo, F. Dell'Orletta, A. Lenci, L. Lesmo, G. Attardi, M. Simi, A. Lavelli, J. Hall, J. Nilsson and J. Nivre. Comparing the influence of different treebank annotations on dependency parsing performance. Proc. of LREC 2010, Malta, 2010.
  • C. Bosco, S. Montemagni, M. Simi, Harmonization and Merging of two Italian Dependency Treebanks, Workshop on Merging of Language Resources, in Proceedings of LREC 2012, Instanbul, May 2012, pp. 23-30. LREC 2012
Powered by MediaWiki