Annotation Guidelines for Dependencies

From Medialab

Clitics

Dependency relations involving clitic pronouns were carefully revised following the guidelines offered in our proposal for handling clitics. These guidelines were further extended on the basis of corpus evidence to cover unforeseen cases. For the reader's convenience, below is a detailed description of how the dependencies were applied.

  • ci and vi (tagged as PCnn, with adverbial value) → COMP
perché vogliono andarci ; non c'entra niente ; vi sono scrittori che...
  • ne → con valore pronominale → COMP
ricordandone sempre ; non ne abbiamo più ; non se ne parla neppure
  • Indirect object → COMP_IND
ci hanno detto la verità
  • Direct object → OBJ
ci vuole deportare ; l'ha ampiamente smentito
  • Reflexive pronouns → OBJ
io mi lavo
  • impersonal constructions, pronominal verbs (either transitive or intransitive) → CLIT
si può cambiare (si impers); mi sono messo la giacca (trans pronom) ; si scusa con te (intrans pronom) ; ci siamo divisi (intrans pronom)
  • andarseneCLIT + CLIT
andarsene

Dates

According to the specifications of ISST, dates are tagged as follows:

1. il 25 dicembre 1999 sono partito
- mod (partire, 25.<definitezza=+>)
- mod (25, dicembre)
- mod (dicembre, 1999)
2. nel 2000, sono partito
- mod (partire, 2000.<introdep=“in”>)
3. nel/in maggio, sono partito
- mod (partire, maggio.<introdep=“in”>)

where the head of the date is the element which selects the preposition or the article preceding the whole temporal expression. In dates consisting of day, month, and year, or any combination of these, only the head should be marked as comp/mod_temp. However, the dates in the corpus were not always in line with these specifications since the date, month, and year were each tagged as either mod or mod_temp. This was changed so that the head modifier of the date was mod_temp (usually the day) and the others were tagged as mod (usually the month and year). Consider the example morte del padre avvenuta il 27 giugno 1939:

…
20	morte		Sfs	19	prep
21	del		EAms	20	comp
22	padre		Sms	21	prep
23	avvenuta	Vpsfs	20	mod
24	il		RDms	25	det
25	27		N	23	*mod
26	giugno		Sms	25	*mod_temp
27	1939		N	26	*mod_temp

In this example the day, 27, is the head of the date while the month and the year, giugno 1939, are sequentially dependent on the head as we see by the attachment of dependencies. The modified version of the above example is shown below.

…
20	morte		Sfs	19	prep
21	del		EAms	20	comp
22	padre		Sms	21	prep
23	avvenuta	Vpsfs	20	mod
24	il		RDms	25	det
25	27		N	23	mod_temp
26	giugno		Sms	25	mod
27	1939		N	26	mod

Proper Names

According to ISST guidelines, the last name is the head while the first name modifies the last (this choice was carried out mainly for retrieval problems). For example, the dependencies for the segment Frank Sinatra ha casa qui should be tagged as so:

1	Frank	SP	2	mod
2	Sinatra	SP	3	subj
3	ha	Vip3s	0	ROOT
4	casa	Sfs	3	obj
5  	qui	B	3	mod

Punctuation: A first draft

Proposed Annotation

1. Sentence-Final Punctuation (i.e., “.” “?” “!” tagged as FS)

These should attach to the ROOT of the sentence. Dependency relation: “punc”.
La vincenda provocò parecchie polemiche e Oscar Luigi Scalfaro ha reintrodotto il vecchio ordinamento.
1       La		RDfs	2	det
2       vicenda		Sfs	3	subj
3       provocò		Vis3s	0	ROOT
4       parecchie		DIfp	5	mod
5       polemiche	Sfp	3	obj
6       e		CC	3	con
7       Oscar		SP	9	mod
8       Luigi		SP	9	mod
9       Scalfaro		SP	11	subj
10      ha		VAip3s	11	aux
11      reintrodotto	Vpsms	3	conj
12      il		RDms	14	det
13      vecchio		Ams	14	mod
14      ordinamento	Sms	11	obj
15      .		FS	3	punc


2. Conjunctive Punctuation (also described in Nunberg (1990) as separating punctuation). Unlike the paired adjunctive punctuation (see 3 below), conjunctive punctuation marks occur as singletons, independent of any other punctuation mark or phenomenon. As the name conjunctive punctuation indicates, the function of these punctuation marks is to join together portions of adjacent text in coordinating structures. Marks that can function in this manner are typically the comma, but also the dash and the semi-colon.

Conjunctive punctuation marks should attach to the head of the first conjunct of the conjoined structure. Dependency relation: “con”.
[…] infestato da rapinatori, profughi dal Ruanda, bande rivali.
17      infestato	Vpsms   	2	mod
18      da 		E       	17	comp
19      rapinatori 	Smp     	18	prep
20      ,		FF      	19	con
21      profughi	Smp     	19	conj    
22      dal 		EAms    	21	comp_loc
23      Ruanda 	SP      	22	prep
24      ,		FF      	19	con
25      bande 	Sfp     	19	conj
26      rivali 	Anp     	25	mod
27      .		FS      	4	punc


3. Adjunctive Punctuation (also described in Nunberg (1990) as delimiting punctuation). This class includes non-coordinative, more syntactically-contentful punctuation marks. In this case, the role that punctuation marks seem to perform is to mark phrasal boundaries. Some examples follow:

a. Recently, I went out.
b. The man, my friend, is here.
c. The man, with the stick, is here.
d. His, but not her, dog won the contest.
e. I met my best friend, Arthur Smith.

Adjunctive punctuation marks include paired commas (b, c and d) as well as the combination of a comma and the sentence-initial capital (a) or the final full-stop (e). Also cases of balanced punctuation marks (i.e. paired dashes and brackets as well as double quotes) are treated along the same lines. Dependency relation: “punc”.

3a. Paired punctuation marks.
Example 1. The typical case
The two commas are connected to the head of the delimited phrase
La coppia, residente a Milano anche se di origini siciliane, stava trascorrendo un periodo di vacanza.
1       La 		RDfs    		2	det
2       coppia 	Sfs     		13	subj
3       ,       		FF      		4	punc
4       residente 	Ans     		2	mod
5       a 		E       		4	comp_loc
6       Milano 	SP      		5	prep
7       anche_se 	CS      		4	con
8       di 		E       		4	conj
9       origini 	Sfp     		8	prep
10     siciliane 	Afp     		9	mod
11      , 		FF      		4	punc
12      stava 	VAii3s  		13	modal
13      trascorrendo 	Vg     		0	ROOT
14      un 		RIms    		15	det
15      periodo 	Sms     		13	obj
16      di 		E       		15	comp
17      vacanza 	Sfs     		16	prep
18      .		FS      		13	punc


3b Combination of a comma and a sentence-initial capital.
Example 2. The comma is attached to the head of the delimited phrase.
Subito soccorsa dal coniuge e da alcuni medici presenti nel villaggio, la donna è giunta […]
1       Subito 	B       	2	mod_temp
2       soccorsa 	Vpsfs   	14	mod
3       dal 		EAms    	2	comp
4       coniuge 	Sns     	3	prep
5       e 		CC      	3	con
6       da 		E       	3	conj
7       alcuni  	DImp    	8	mod
8       medici  	Smp     	6	prep
9       presenti 	Anp     	8	mod
10      nel 		EAms    	9	comp_loc
11      villaggio 	Sms     	10	prep
12      , 		FF      	2	pun
13      la 		RDfs    	14	det
14      donna 	Sfs     	16	subj
15      è 		VAip3s  	16	aux
16      giunta 	Vpsfs   	0	ROOT
17      cadavere 	Sms     	16	pred
18      all' 		EAms    	16	comp_loc
19      ospedale 	Sms     	18	prep
20      di 		E       	19	comp_loc
21      Campobello 	SP      	20	prep
22      di 		E       	21	comp
23      Mazara 	SP      	22	prep
24      . 		FS      	16	punc


3c Combination of a comma and a final full-stop (or other sentence-final punctuation mark).
Example 3. The comma is attached to the head of the delimited phrase. However, the final full stop is attached to the sentence root.
Sul posto sono intervenuti anche i carabinieri della compagnia di Venaria, che hanno compiuto accertamenti.
1       Sul     	EAms    		4	comp
2       posto   	Sms     		1	prep
3       sono    	VAip3p  		4	aux
4       intervenuti     	Vpsmp   		0	ROOT
5       anche   	B       		7	mod
6       i       		RDmp    		7	det
7       carabinieri     	Smp     		4	subj
8       della   	EAfs    		7	comp
9       compagnia       	Sfs     		8	prep
10      di      		E       		9	comp
11      Venaria 	SP      		10	prep
12      ,       		FF      		15	punc
13      che     	PRnn    		15	subj
14      hanno   	VAip3p  		15	aux
15      compiuto        	Vpsms   		7	mod_rel
16      accertamenti    	Smp     		15	obj
17      .       		FS      		4	punc


3d Balanced punctuation.
They are represented by parentheses (different types of), Quotation Marks, Paired Dashes. These should attach to the head of the delimited phrase.
(a) I quattro lavoravano nello Zaire per conto di “Mondo Giusto”
1       I		RDmp    		2       det
2       quattro	N		3       subj
3       lavoravano	Vii3p   		0       ROOT
4       nello 		EAms		3       comp_loc
5       Zaire		SP      		4       prep
6       per 		E       		3       comp
7       conto		Sms     		6       prep
8       di 		E       		7       comp
9       "		FB		10      punc
10      Mondo 	SP       		8       prep
11      Giusto 	SP      		10      mod
12      "		FE      		10      punc
(b) La tragedia è avvenuta sabato mattina (ma la notizia è giunta in Italia solo ieri)
1       La 		RDfs		2	det
2       tragedia 	Sfs     		4	subj
3       è		VAip3s		4	aux
4       avvenuta 	Vpsfs		0	ROOT
5       sabato	Sms     		4	mod_temp
6       mattina 	Sfs     		5	mod_temp
7       (		FB		12	punc
8       ma 		CC      		4	con
9       la 		RDfs    		10	det
10      notizia 	Sfs     		12	subj
11      è 		VAip3s  		12	aux
12      giunta 	Vpsfs   		4	conj
13      in 		E       		12	comp_loc
14      Italia 	SP      		13	prep
15      solo 		B       		16	mod
16      ieri 		B       		12	mod_temp
17      ) 		FE      		12	punc

Statistics

Required revisions to CONLL-ISST annotation are concerned with cases 1 and 3.

Case 1: sentence-final punctuation (“.”, “!”, “?”, singleton “-“, )

581 colon
3312 full-stop
160 question marks
6 exclamative marks
289 “-“ (this figure also includes paired dashes)

Case 3: adjunctive punctuation

3185 commas
290 opening brackets (sic!)
305 closing brackets (sic!)
1739 double quotes
289 dashes (this figure also includes singleton dashes)


A Concrete Annotation Example

A complex annotation case is reported below, combining the different punctuation types discussed throughout the “Proposed Annotation” section. Note that the single dash in position 2 and the colon in position 15 are both treated as phrasal boundaries and are attached to the ROOT of the preceding (sub)tree.

1       GOTEBORG 	SP      		0	ROOT    
2       -       		FC      		1	punc    
3       È       		VAip3s  		4	aux     
4       stata   	Vpsfs   		0	ROOT    
5       la      		RDfs    		6	det     
6       giornata        	Sfs     		4	pred    
7       del     		EAms    		6	comp    
8       doppio  	Ams     		9	mod     
9       oro     	Sms     		7	prep    
10      italiano        	Ams     		9	mod     
11      ai      		EAmp    		4	comp    
12      Mondiali        	SP      		11	prep    	
13      di      		E       		12	comp    
14      atletica        	Sfs     		13	prep    
15      :       		FC      		4	punc    
16      ha      	VAip3s  		17	aux     
17      cominciato      	Vpsms   		0	ROOT    
18      Michele 	SP      		19	mod     
19      Didoni  	SP      		17	subj    
20      ,       		FF      		21	punc    
21      milanese        	Ans     		19	mod     
22      di      		E       		21	comp    
23      Quarto  	SP      		22	prep    
24      Oggiaro 	SP      		23	mod     
25      ,       		FF      		21	punc    
26      con     	E       		17	comp    
27      il      		RDms    		28	det     
28      titolo  	Sms     		26	prep    
29      nella 		EAfs    		28	comp    
30      20      	N       		29	prep    
31      km 		SA      		30	mod     
32      di 		E       		31	comp    
33      marcia  	Sfs     		32	prep    
34      (       		FB      		43	punc    
35      De      	SP      		36	concat  
36      Benedictis      	SP      		43	subj_pass       
37      ,       		FF      		38	punc    
38      terzo   	NOms    		36	mod     
39      ,       		FF      		38	punc    
40      è       		VAip3s  		41	aux     
41      stato   	VApsms  		43	aux     
42      poi     	B       		43	mod     
43      squalificato    	Vpsms   		17	conj    
44      )       		FE      		43	punc    
45      ,       		FF      		17	con    
46      ha      	VAip3s  		47	aux     
47      finito  	Vpsms   		17	conj    
48      la      		RDfs    		49	det     
49      splendida       	Afs     		51	mod     
50      Fiona   	SP      		51	mod     
51      May     	SP      		47	subj    
52      ,       		FF      		53	punc    
53      londinese       	Ans     		51	mod     
54      ora     	B       		55	mod     
55      italiana        	Afs     		51	mod     
56      grazie_al       	EAms    		55	comp    
57      matrimonio      	Sms     		56	prep    
58      con     	E       		57	comp    
59      l'      		RDns    		60	det     
60      astista 	Sms     		58	prep    
61      lunghista       	Ans     		60	mod     
62      Gianni  	SP      		63	mod     
63      Iapichino       	SP      		60	mod     
64      ,       		FF      		53	punc    
65      con     	E       		47	comp    
66      la      		RDfs    		67	det     
67      vittoria        	Sfs     		65	prep    
68      nel     	EAms    		67	comp    
69      lungo   	Sms     		68	prep    
70      femminile       	Ans     		69	mod     
71      .       		FS      		17	punc    

Critical Cases

The examples which follow represent problematic cases for which the current annotation guidelines do not provide unambiguous indication of how to annotate them. In particular, in the reported cases commas can be either governed by the head of the phrase occurring either before or after the punctuation mark. In order to resolve these critical cases, it is necessary to define a general strategy to be followed when in doubt. Our working hypothesis is a General Rule such that the default attachment for ambiguous punctuation marks is to the left: examples follow.

I. Multiple Roots

Comma used in a phrase with two roots: should the punctuation mark in 28 connect to 26, Grazie, or to 31, dato?
Example 1: "Grazie Italia, ti ho dato l’oro".
25	"	FB	26	punc
26	Grazie	I	0	ROOT
27	Italia	SP	26	comp_ind
28	,	FF	25	punc		→ 26
29	ti	PC2ns	31	comp_ind
30	ho	VAip1s	31	aux
31	dato	Vpsms	0	ROOT
32	l'	RDns	33	det
33	oro	Sms	31	obj
34	"	FE	26	punc
35	.	FS	31	punc
++According to the General Rule hypothesised above, the punctuation mark in 28 should be attached to 26, Grazie, since its dependency is ambiguous and by default should be attached to the left. In addition, the period in 35 should be attached to 31, dato, since it is the first ROOT to the left of the punctuation mark.

II. Cascading Delimited Phrases

There are many examples of “cascading delimited phrases”, or delimited phrases that follow one after the other. The guidelines specify how to handle individual and even embedded delimited phrases, however the dependencies are unclear for reoccurring delimited phrases. Example 2 below is a segment of the sentence in Example 3, and demonstrates the problem with cascading delimited phrases.
Example 2: Vittima, Gabriella F., 20 anni, disoccupata, […]
1	Vittima		Sfs	3	mod
2	,		FF	1	punc		→ 1
3	Gabriella		SP	0	ROOT
4	F.		SP	3	mod
5	,		FF	7	punc		→ 7
6	20		N	7	mod
7	anni		Smp	3	mod
8	,		FF	7	punc		→ 7
9	disoccupata		Afs	3	mod
10	,		FF	19	punc		→ 9
++It is clear from the guidelines that the punctuation mark in 2 should connect with 1, Vittima. However, the attachments for punctuation marks in 5, 8, and 10 are not as clear. The following options have been proposed a possible solutions, where A is the main phrase and DPn are the delimited phrases.


Option 1: Forward-Moving Attachments
	A	,	DP1	,	DP2	,	DP3	,	[…A…]
		→ DP1
				→ DP2
						→ DP3
								← DP3 


Option 2: Backward-Moving Attachments++
	A	,	DP1	,	DP2	,	DP3	,	[…A…]
		→ DP1
				← DP1
						← DP2
								← DP3 
++ The General Rule suggests that the solution for such critical cases is Option 2 since the dependencies are attached to the left of the punctuation marks. The implementation of Option 2 is shown in Example 2 (marked with →).


III. Relative Clause with Cascading Delimited Phrases

In addition to the complexity of dependencies with cascading phrases, consider the dependencies for the punctuation used in Example 3 in which punctuation for a relative clause complicates the dependencies for the commas.
Example 3: Vittima, Gabriella F., 20 anni, disoccupata, che poco prima delle 6 di ieri mattina si trovava in un parco con il fidanzato Paolo F., 27 anni, rappresentante di articoli di cartoleria, all’uscita di un locale vicino all’aeroporto di Linate.
1	Vittima		Sfs	3	mod
2	,		FF	1	punc		→ 1
3	Gabriella		SP	0	ROOT
4	F.		SP	3	mod
5	,		FF	7	punc		→ 7
6	20		N	7	mod
7	anni		Smp	3	mod
8	,		FF	7	punc		→ 7
9	disoccupata		Afs	3	mod
10	,		FF	19	punc		→ 9
11	che		PRnn	19	subj
12	poco		B	13	mod
13	prima_delle		EAfp	19	comp_temp
14	6		N	13	prep
15	di		E	14	comp_temp
16	ieri		B	15	prep
17	mattina		Sfs	16	mod_temp
18	si		PC3nn	19	clit
19	trovava		Vii3s	3	mod_rel
20	in		E	19	comp_loc
21	un		RIms	22	det
22	parco		Sms	20	prep
23	con		E	19	comp
24	il		RDms	25	det
25	fidanzato		Sms	23	prep
26	Paolo		SP	25	mod
27	F.		SP	26	mod
28	,		FF	19	punc		→ 30
29	27		N	30	mod
30	anni		Smp	26	mod
31	,		FF	32	punc		→ 30
32	rappresentante	Sns	26	mod
33	di		E	32	comp
34	articoli		Smp	33	prep
35	di		E	34	comp
36	cartoleria		Sfs	35	prep
37	,		FF	32	punc		→ 32
38	all'		EAfs	19	comp_loc
39	uscita		Sfs	38	prep
40	di		E	39	comp
41	un		RIms	42	det
42	locale		Sms	40	prep
43	vicino_all'		Ens	42	comp_loc
44	aeroporto		Sms	43	prep
45	di		E	44	comp
46	Linate		SP	45	prep
47	.		FS	3	punc
As you can see, the comma in 29 is attached to 19, trovava which is the head of the relative clause; therefore it is treating the relative clause as a delimited phrase, introduced by the comma in 10. However, if we apply either option for cascading delimited phrases, the comma in 10 will no long open a delimited phrase, but close one (it will attach to 9 instead). This example demonstrates numerous levels of complexity that need to be worked out before being able to annotate the corpus.
++The dependencies according to the General Rule and Option 2 are shown for the critical cases in Example 3 (marked with →).