MIDT (Merged Italian Dependency Treebank) is a first attempt to merge two existing Italian treebanks: TUT and ISST-TANL


MIDT Specifications

MIDT Mappings

MIDT Resources

Available MIDT resources include data obtained by automatic conversion from

  • ISST-TANL, developed as a joint effort by the Istituto di Linguistica Computazionale (ILC–CNR) and the University of Pisa and originating from the Italian Syntactic–Semantic Treebank or ISST; a previous version (ISST-CoNLL) was used for the CoNLL-2007 Shared Task on multilingual dependency parsing;
  • the Turin University Treebank (TUT), developed by the NLP group of the University of Turin.

A set of around 2000 questions where produced directly in the MIDT format and revised.

The composition is detailed below.

Original formatSourceGenreSize in tokensSize in sentences
TUTEvalita 2011 Dependency parsing, codcivLegal texts307471250
TUTEvalita 2011 Dependency parsing, costitaLegal texts 12703682
TUTEvalita 2011 Dependency parsing, eudirLegal texts6949201
TUTEvalita 2011 Dependency parsing, newsNewspaper articles from "Corriere della sera" and periodicals19301775
TUTEvalita 2011 Dependency parsing, vedchSentences including the verbs "vedere" and "chiamare"12045400
TUTEvalita 2011 Dependency parsing, wikiWikipedia articles15813534
ISST-TANLEvalita 2011 Domain adaptation, isst-tanlNewspaper articles809904136
MIDTTexts from various QA competitionsQuestions219682228

MIDT Download

MIDT version 1.0
MIDT version 1.1


