Skip to main content

Cross-Language dependency Parsing (CLaP)

Task description

The Cross-Language dependency Parsing is a cross-lingual transfer parsing task, organized along the lines of the experiments described in McDonald et al. (2013). In this task the participants are asked to use their parsers trained on the Universal variant of the “Italian Stanford Dependency Treebank” (henceforth “uISDT”) on test sets of other (not necessarily typologically related) languages, annotated according to Universal Stanford Dependencies (“uSD”).

Data sets

The Italian data set for this task will consist in the universal version of the ISDT resource used for the DPIE task. The data sets for the other languages will be extracted from Version 2.0 of “The Universal Dependency Treebank Project” (https://code.google.com/p/uni-dep-tb/) containing the CoNLL and uSD compliant versions of dependency treebanks for several languages collected and made available by Google. In particular, for the cross-lingual transfer parsing task, participant systems will be provided with:

  • a data set for development (henceforth referred to as “CLaP Development DS”), with basic dependencies in CoNLL format: it will include the universal version of the Italian training set distributed within DPIE and validation sets of about 7,500 tokens for each of the eleven languages of the Universal Dependency Treebank;

    The CLaP Development DS is now available for DOWNLOAD

  • test sets for evaluation (henceforth referred to as “CLaP Test DS”), with gold PoS and morphology and without dependency information, one for each language dealt with; this data set will consist of about 7,500 tokens for each language.

    The data set is now available for DOWNLOAD

For this task, the use of external resources (e.g. dictionaries, lexicons, machine translation outputs, etc.) in addition to the uISDT corpus provided for training is allowed. Participants in this task are allowed to focus on a subset of languages only: however, participants are strongly encouraged to perform the analysis of all eleven languages.

Participant results

Each participant can submit multiple runs. The format for submission is the CoNLL 2007 format (the same used for the development data):
  • ten tab-separated columns
  • columns 1-6 as provided by the organizers in the DPIE Test DS
  • parser results will occupy column 7-8
  • columns 9-10 are not used and will have to contain "_"

Evaluation

The outputs of participant systems will be evaluated in terms of:

  • standard accuracy dependency parsing measures, i.e. Labelled Attachement Score (LAS) and Unlabelled Attachment Score (UAS) for each language;
  • overall score, computed as the average of LAS and UAS obtained by a given system for all languages. Note that participants focusing on a subset of languages will achieve a lower overall score, since this score will be computed for all languages (with a LAS and UAS equal to 0 for each unanalyzed language).

LAS and UAS will be computed by using the official evaluation script of the CoNLL evaluation campaigns (eva07.pl) which will distributed to participants.

Important dates

Important dates for the CLaP subtask are as follows:

  • May 15th: release of CLaP Development DS;
  • September 8th: release of CLaP Test DS;
  • September 19th: system results due to organizers
  • September 22: assessment returned to participants
  • October 31: technical reports due to organizers
  • December 11: final workshop