Annotation Guidelines for Dependencies
Contents
Clitics
Dependency relations involving clitic pronouns were carefully revised following the guidelines offered in our proposal for handling clitics. These guidelines were further extended on the basis of corpus evidence to cover unforeseen cases. For the reader's convenience, below is a detailed description of how the dependencies were applied.
- ci and vi (tagged as PCnn, with adverbial value) → COMP
- perché vogliono andarci ; non c'entra niente ; vi sono scrittori che...
- ne → con valore pronominale → COMP
- ricordandone sempre ; non ne abbiamo più ; non se ne parla neppure
- Indirect object → COMP_IND
- ci hanno detto la verità
- Direct object → OBJ
- ci vuole deportare ; l'ha ampiamente smentito
- Reflexive pronouns → OBJ
- io mi lavo
- impersonal constructions, pronominal verbs (either transitive or intransitive) → CLIT
- si può cambiare (si impers); mi sono messo la giacca (trans pronom) ; si scusa con te (intrans pronom) ; ci siamo divisi (intrans pronom)
- andarsene → CLIT + CLIT
- andarsene
Dates
According to the specifications of ISST, dates are tagged as follows:
- 1. il 25 dicembre 1999 sono partito
- - mod (partire, 25.<definitezza=+>)
- - mod (25, dicembre)
- - mod (dicembre, 1999)
- 2. nel 2000, sono partito
- - mod (partire, 2000.<introdep=“in”>)
- 3. nel/in maggio, sono partito
- - mod (partire, maggio.<introdep=“in”>)
where the head of the date is the element which selects the preposition or the article preceding the whole temporal expression. In dates consisting of day, month, and year, or any combination of these, only the head should be marked as comp/mod_temp. However, the dates in the corpus were not always in line with these specifications since the date, month, and year were each tagged as either mod or mod_temp. This was changed so that the head modifier of the date was mod_temp (usually the day) and the others were tagged as mod (usually the month and year). Consider the example morte del padre avvenuta il 27 giugno 1939:
… 20 morte Sfs 19 prep 21 del EAms 20 comp 22 padre Sms 21 prep 23 avvenuta Vpsfs 20 mod 24 il RDms 25 det 25 27 N 23 *mod 26 giugno Sms 25 *mod_temp 27 1939 N 26 *mod_temp
In this example the day, 27, is the head of the date while the month and the year, giugno 1939, are sequentially dependent on the head as we see by the attachment of dependencies. The modified version of the above example is shown below.
… 20 morte Sfs 19 prep 21 del EAms 20 comp 22 padre Sms 21 prep 23 avvenuta Vpsfs 20 mod 24 il RDms 25 det 25 27 N 23 mod_temp 26 giugno Sms 25 mod 27 1939 N 26 mod
Proper Names
According to ISST guidelines, the last name is the head while the first name modifies the last (this choice was carried out mainly for retrieval problems). For example, the dependencies for the segment Frank Sinatra ha casa qui should be tagged as so:
1 Frank SP 2 mod 2 Sinatra SP 3 subj 3 ha Vip3s 0 ROOT 4 casa Sfs 3 obj 5 qui B 3 mod
Punctuation: A first draft
Proposed Annotation
1. Sentence-Final Punctuation (i.e., “.” “?” “!” tagged as FS)
- These should attach to the ROOT of the sentence. Dependency relation: “punc”.
- La vincenda provocò parecchie polemiche e Oscar Luigi Scalfaro ha reintrodotto il vecchio ordinamento.
1 La RDfs 2 det 2 vicenda Sfs 3 subj 3 provocò Vis3s 0 ROOT 4 parecchie DIfp 5 mod 5 polemiche Sfp 3 obj 6 e CC 3 con 7 Oscar SP 9 mod 8 Luigi SP 9 mod 9 Scalfaro SP 11 subj 10 ha VAip3s 11 aux 11 reintrodotto Vpsms 3 conj 12 il RDms 14 det 13 vecchio Ams 14 mod 14 ordinamento Sms 11 obj 15 . FS 3 punc
2. Conjunctive Punctuation (also described in Nunberg (1990) as separating punctuation). Unlike the paired adjunctive punctuation (see 3 below), conjunctive punctuation marks occur as singletons, independent of any other punctuation mark or phenomenon. As the name conjunctive punctuation indicates, the function of these punctuation marks is to join together portions of adjacent text in coordinating structures. Marks that can function in this manner are typically the comma, but also the dash and the semi-colon.
- Conjunctive punctuation marks should attach to the head of the first conjunct of the conjoined structure. Dependency relation: “con”.
- […] infestato da rapinatori, profughi dal Ruanda, bande rivali.
17 infestato Vpsms 2 mod 18 da E 17 comp 19 rapinatori Smp 18 prep 20 , FF 19 con 21 profughi Smp 19 conj 22 dal EAms 21 comp_loc 23 Ruanda SP 22 prep 24 , FF 19 con 25 bande Sfp 19 conj 26 rivali Anp 25 mod 27 . FS 4 punc
3. Adjunctive Punctuation (also described in Nunberg (1990) as delimiting punctuation). This class includes non-coordinative, more syntactically-contentful punctuation marks. In this case, the role that punctuation marks seem to perform is to mark phrasal boundaries. Some examples follow:
- a. Recently, I went out.
- b. The man, my friend, is here.
- c. The man, with the stick, is here.
- d. His, but not her, dog won the contest.
- e. I met my best friend, Arthur Smith.
Adjunctive punctuation marks include paired commas (b, c and d) as well as the combination of a comma and the sentence-initial capital (a) or the final full-stop (e). Also cases of balanced punctuation marks (i.e. paired dashes and brackets as well as double quotes) are treated along the same lines. Dependency relation: “punc”.
- 3a. Paired punctuation marks.
- Example 1. The typical case
- The two commas are connected to the head of the delimited phrase
- La coppia, residente a Milano anche se di origini siciliane, stava trascorrendo un periodo di vacanza.
1 La RDfs 2 det 2 coppia Sfs 13 subj 3 , FF 4 punc 4 residente Ans 2 mod 5 a E 4 comp_loc 6 Milano SP 5 prep 7 anche_se CS 4 con 8 di E 4 conj 9 origini Sfp 8 prep 10 siciliane Afp 9 mod 11 , FF 4 punc 12 stava VAii3s 13 modal 13 trascorrendo Vg 0 ROOT 14 un RIms 15 det 15 periodo Sms 13 obj 16 di E 15 comp 17 vacanza Sfs 16 prep 18 . FS 13 punc
- 3b Combination of a comma and a sentence-initial capital.
- Example 2. The comma is attached to the head of the delimited phrase.
- Subito soccorsa dal coniuge e da alcuni medici presenti nel villaggio, la donna è giunta […]
1 Subito B 2 mod_temp 2 soccorsa Vpsfs 14 mod 3 dal EAms 2 comp 4 coniuge Sns 3 prep 5 e CC 3 con 6 da E 3 conj 7 alcuni DImp 8 mod 8 medici Smp 6 prep 9 presenti Anp 8 mod 10 nel EAms 9 comp_loc 11 villaggio Sms 10 prep 12 , FF 2 pun 13 la RDfs 14 det 14 donna Sfs 16 subj 15 è VAip3s 16 aux 16 giunta Vpsfs 0 ROOT 17 cadavere Sms 16 pred 18 all' EAms 16 comp_loc 19 ospedale Sms 18 prep 20 di E 19 comp_loc 21 Campobello SP 20 prep 22 di E 21 comp 23 Mazara SP 22 prep 24 . FS 16 punc
- 3c Combination of a comma and a final full-stop (or other sentence-final punctuation mark).
- Example 3. The comma is attached to the head of the delimited phrase. However, the final full stop is attached to the sentence root.
- Sul posto sono intervenuti anche i carabinieri della compagnia di Venaria, che hanno compiuto accertamenti.
1 Sul EAms 4 comp 2 posto Sms 1 prep 3 sono VAip3p 4 aux 4 intervenuti Vpsmp 0 ROOT 5 anche B 7 mod 6 i RDmp 7 det 7 carabinieri Smp 4 subj 8 della EAfs 7 comp 9 compagnia Sfs 8 prep 10 di E 9 comp 11 Venaria SP 10 prep 12 , FF 15 punc 13 che PRnn 15 subj 14 hanno VAip3p 15 aux 15 compiuto Vpsms 7 mod_rel 16 accertamenti Smp 15 obj 17 . FS 4 punc
- 3d Balanced punctuation.
- They are represented by parentheses (different types of), Quotation Marks, Paired Dashes. These should attach to the head of the delimited phrase.
- (a) I quattro lavoravano nello Zaire per conto di “Mondo Giusto”
1 I RDmp 2 det 2 quattro N 3 subj 3 lavoravano Vii3p 0 ROOT 4 nello EAms 3 comp_loc 5 Zaire SP 4 prep 6 per E 3 comp 7 conto Sms 6 prep 8 di E 7 comp 9 " FB 10 punc 10 Mondo SP 8 prep 11 Giusto SP 10 mod 12 " FE 10 punc
- (b) La tragedia è avvenuta sabato mattina (ma la notizia è giunta in Italia solo ieri)
1 La RDfs 2 det 2 tragedia Sfs 4 subj 3 è VAip3s 4 aux 4 avvenuta Vpsfs 0 ROOT 5 sabato Sms 4 mod_temp 6 mattina Sfs 5 mod_temp 7 ( FB 12 punc 8 ma CC 4 con 9 la RDfs 10 det 10 notizia Sfs 12 subj 11 è VAip3s 12 aux 12 giunta Vpsfs 4 conj 13 in E 12 comp_loc 14 Italia SP 13 prep 15 solo B 16 mod 16 ieri B 12 mod_temp 17 ) FE 12 punc
Statistics
Required revisions to CONLL-ISST annotation are concerned with cases 1 and 3.
Case 1: sentence-final punctuation (“.”, “!”, “?”, singleton “-“, )
- 581 colon
- 3312 full-stop
- 160 question marks
- 6 exclamative marks
- 289 “-“ (this figure also includes paired dashes)
Case 3: adjunctive punctuation
- 3185 commas
- 290 opening brackets (sic!)
- 305 closing brackets (sic!)
- 1739 double quotes
- 289 dashes (this figure also includes singleton dashes)
A Concrete Annotation Example
A complex annotation case is reported below, combining the different punctuation types discussed throughout the “Proposed Annotation” section. Note that the single dash in position 2 and the colon in position 15 are both treated as phrasal boundaries and are attached to the ROOT of the preceding (sub)tree.
1 GOTEBORG SP 0 ROOT 2 - FC 1 punc 3 È VAip3s 4 aux 4 stata Vpsfs 0 ROOT 5 la RDfs 6 det 6 giornata Sfs 4 pred 7 del EAms 6 comp 8 doppio Ams 9 mod 9 oro Sms 7 prep 10 italiano Ams 9 mod 11 ai EAmp 4 comp 12 Mondiali SP 11 prep 13 di E 12 comp 14 atletica Sfs 13 prep 15 : FC 4 punc 16 ha VAip3s 17 aux 17 cominciato Vpsms 0 ROOT 18 Michele SP 19 mod 19 Didoni SP 17 subj 20 , FF 21 punc 21 milanese Ans 19 mod 22 di E 21 comp 23 Quarto SP 22 prep 24 Oggiaro SP 23 mod 25 , FF 21 punc 26 con E 17 comp 27 il RDms 28 det 28 titolo Sms 26 prep 29 nella EAfs 28 comp 30 20 N 29 prep 31 km SA 30 mod 32 di E 31 comp 33 marcia Sfs 32 prep 34 ( FB 43 punc 35 De SP 36 concat 36 Benedictis SP 43 subj_pass 37 , FF 38 punc 38 terzo NOms 36 mod 39 , FF 38 punc 40 è VAip3s 41 aux 41 stato VApsms 43 aux 42 poi B 43 mod 43 squalificato Vpsms 17 conj 44 ) FE 43 punc 45 , FF 17 con 46 ha VAip3s 47 aux 47 finito Vpsms 17 conj 48 la RDfs 49 det 49 splendida Afs 51 mod 50 Fiona SP 51 mod 51 May SP 47 subj 52 , FF 53 punc 53 londinese Ans 51 mod 54 ora B 55 mod 55 italiana Afs 51 mod 56 grazie_al EAms 55 comp 57 matrimonio Sms 56 prep 58 con E 57 comp 59 l' RDns 60 det 60 astista Sms 58 prep 61 lunghista Ans 60 mod 62 Gianni SP 63 mod 63 Iapichino SP 60 mod 64 , FF 53 punc 65 con E 47 comp 66 la RDfs 67 det 67 vittoria Sfs 65 prep 68 nel EAms 67 comp 69 lungo Sms 68 prep 70 femminile Ans 69 mod 71 . FS 17 punc
Critical Cases
The examples which follow represent problematic cases for which the current annotation guidelines do not provide unambiguous indication of how to annotate them. In particular, in the reported cases commas can be either governed by the head of the phrase occurring either before or after the punctuation mark. In order to resolve these critical cases, it is necessary to define a general strategy to be followed when in doubt. Our working hypothesis is a General Rule such that the default attachment for ambiguous punctuation marks is to the left: examples follow.
I. Multiple Roots
- Comma used in a phrase with two roots: should the punctuation mark in 28 connect to 26, Grazie, or to 31, dato?
- Example 1: "Grazie Italia, ti ho dato l’oro".
25 " FB 26 punc 26 Grazie I 0 ROOT 27 Italia SP 26 comp_ind 28 , FF 25 punc → 26 29 ti PC2ns 31 comp_ind 30 ho VAip1s 31 aux 31 dato Vpsms 0 ROOT 32 l' RDns 33 det 33 oro Sms 31 obj 34 " FE 26 punc 35 . FS 31 punc
- ++According to the General Rule hypothesised above, the punctuation mark in 28 should be attached to 26, Grazie, since its dependency is ambiguous and by default should be attached to the left. In addition, the period in 35 should be attached to 31, dato, since it is the first ROOT to the left of the punctuation mark.
- ++According to the General Rule hypothesised above, the punctuation mark in 28 should be attached to 26, Grazie, since its dependency is ambiguous and by default should be attached to the left. In addition, the period in 35 should be attached to 31, dato, since it is the first ROOT to the left of the punctuation mark.
II. Cascading Delimited Phrases
- There are many examples of “cascading delimited phrases”, or delimited phrases that follow one after the other. The guidelines specify how to handle individual and even embedded delimited phrases, however the dependencies are unclear for reoccurring delimited phrases. Example 2 below is a segment of the sentence in Example 3, and demonstrates the problem with cascading delimited phrases.
- Example 2: Vittima, Gabriella F., 20 anni, disoccupata, […]
1 Vittima Sfs 3 mod 2 , FF 1 punc → 1 3 Gabriella SP 0 ROOT 4 F. SP 3 mod 5 , FF 7 punc → 7 6 20 N 7 mod 7 anni Smp 3 mod 8 , FF 7 punc → 7 9 disoccupata Afs 3 mod 10 , FF 19 punc → 9
- ++It is clear from the guidelines that the punctuation mark in 2 should connect with 1, Vittima. However, the attachments for punctuation marks in 5, 8, and 10 are not as clear. The following options have been proposed a possible solutions, where A is the main phrase and DPn are the delimited phrases.
- Option 1: Forward-Moving Attachments
A , DP1 , DP2 , DP3 , […A…] → DP1 → DP2 → DP3 ← DP3
- Option 2: Backward-Moving Attachments++
A , DP1 , DP2 , DP3 , […A…] → DP1 ← DP1 ← DP2 ← DP3
- ++ The General Rule suggests that the solution for such critical cases is Option 2 since the dependencies are attached to the left of the punctuation marks. The implementation of Option 2 is shown in Example 2 (marked with →).
III. Relative Clause with Cascading Delimited Phrases
- In addition to the complexity of dependencies with cascading phrases, consider the dependencies for the punctuation used in Example 3 in which punctuation for a relative clause complicates the dependencies for the commas.
- Example 3: Vittima, Gabriella F., 20 anni, disoccupata, che poco prima delle 6 di ieri mattina si trovava in un parco con il fidanzato Paolo F., 27 anni, rappresentante di articoli di cartoleria, all’uscita di un locale vicino all’aeroporto di Linate.
1 Vittima Sfs 3 mod 2 , FF 1 punc → 1 3 Gabriella SP 0 ROOT 4 F. SP 3 mod 5 , FF 7 punc → 7 6 20 N 7 mod 7 anni Smp 3 mod 8 , FF 7 punc → 7 9 disoccupata Afs 3 mod 10 , FF 19 punc → 9 11 che PRnn 19 subj 12 poco B 13 mod 13 prima_delle EAfp 19 comp_temp 14 6 N 13 prep 15 di E 14 comp_temp 16 ieri B 15 prep 17 mattina Sfs 16 mod_temp 18 si PC3nn 19 clit 19 trovava Vii3s 3 mod_rel 20 in E 19 comp_loc 21 un RIms 22 det 22 parco Sms 20 prep 23 con E 19 comp 24 il RDms 25 det 25 fidanzato Sms 23 prep 26 Paolo SP 25 mod 27 F. SP 26 mod 28 , FF 19 punc → 30 29 27 N 30 mod 30 anni Smp 26 mod 31 , FF 32 punc → 30 32 rappresentante Sns 26 mod 33 di E 32 comp 34 articoli Smp 33 prep 35 di E 34 comp 36 cartoleria Sfs 35 prep 37 , FF 32 punc → 32 38 all' EAfs 19 comp_loc 39 uscita Sfs 38 prep 40 di E 39 comp 41 un RIms 42 det 42 locale Sms 40 prep 43 vicino_all' Ens 42 comp_loc 44 aeroporto Sms 43 prep 45 di E 44 comp 46 Linate SP 45 prep 47 . FS 3 punc
- As you can see, the comma in 29 is attached to 19, trovava which is the head of the relative clause; therefore it is treating the relative clause as a delimited phrase, introduced by the comma in 10. However, if we apply either option for cascading delimited phrases, the comma in 10 will no long open a delimited phrase, but close one (it will attach to 9 instead). This example demonstrates numerous levels of complexity that need to be worked out before being able to annotate the corpus.
- ++The dependencies according to the General Rule and Option 2 are shown for the critical cases in Example 3 (marked with →).