Skip to main content

Dependency parsing for Information Extraction

Task description

The main task is a classical dependency parsing task with a main novelty consisting in a double evaluation track, aimed at simultaneously assessing the parser performance and evaluating the suitability of its output for information extraction tasks, where the latter is made possible thanks to the design principles underlying the Stanford annotation scheme.

Data sets

For the DPIE task, the following data sets (in CoNLL format) will be distributed to participants:

  • a data set for development (henceforth referred to as “DPIE Development DS”), split into a training set and a validation set, both based on the publicly available portion of ISDT with basic dependencies in CoNLL format; this data set will consist of about 178,500 tokens.

    The data set is now available for DOWNLOAD

  • a test set for evaluation (henceforth referred to as “DPIE Test DS”), with gold PoS and morphology and without dependency information; this data set will consist of about 9,500 tokens.

    The data set is now available for DOWNLOAD

Participant results

Each participant can submit multiple runs. The format for submission is the CoNLL 2007 format (the same used for the development data):
  • ten tab-separated columns
  • columns 1-6 as provided by the organizers in the DPIE Test DS
  • parser results will occupy column 7-8
  • columns 9-10 are not used and will have to contain "_"

Pelase send you submissions by email to: evalita_dpie@ilc.cnr.it

Evaluation

The output of participant systems will be evaluated on the basis of two scoring mechanisms focusing respectively on the parsing performance and suitability for IE:

  • the standard accuracy dependency parsing measures of Labelled Attachement Score (LAS) and Unlabelled Attachment Score (UAS) will be used to evaluate the parser performance. This evaluation will be performed against the Stanford Dependency representation with basic and typed dependencies in CoNLL format, where dependency structures are trees spanning all input tokens (henceforth referred to as ISDT.conll). To this end the official evaluation script of the CoNLL evaluation campaigns (eva07.pl) will be used;
  • measures of Precision, Recall and F1 will be used to assess suitability of the parser output with respect to information extraction tasks. They will be computed against the collapsed and propagated variant of the Stanford Dependency representation (henceforth ISDT.rels)
    Starting from the participant outputs in CoNLL format, the ISDT.rels version will be produced by using a conversion script that collapses dependencies involving prepositions (both single word and multi-word prepositions such as “grazie_a” or "a_di_sotto_di") and conjunctions, and that performs the propagation of dependencies involving conjuncts. This type of representation, which does not necessarily connect all words in a sentence nor form a tree structure, is expected to be more suitable for relation extraction and shallow language understanding tasks and for this reason is taken as the basis for this IE-oriented evaluation of the output of participant systems.
    In particular, evaluation will focus on a selection of relations (19 out of a total of 45) chosen according to the following general criteria:
    1. semantic relevance of the relation (i.e. nsubj, dobj ...)
    2. exclusion of syntactic easy to identify relations (i.e. det, aux ...);
    3. exclusion of sparse and difficult to identify relations (i.e. csubj)
  • The conversion script from the ISDT.conll to ISDT.rels formats as well as the evaluation script for the ISDT.rels format are provided by the organizers, to allows participants to fine tune their systems (by conversion and evaluation) according to the selected metrics.

    The evaluation package for DPIE is now available for DOWNLOAD

Important dates

Important dates for the DPIE task are as follows:

  • May 15th: release of DPIE Development DS;
  • June 20th: release of conversion scripts into RELS and evaluation scripts for RELS;
  • July 15th: release of updated version of DPIE Development DS;
  • September 8th: release of DPIE Test DS
  • September 19th: system results due to organizers
  • September 22: assessment returned to participants
  • October 31: technical reports due to organizers
  • December 11: final workshop