Each node in the dependency tree can be thought of as an attribute-value
matrix, i.e., a bundle of features with values. All values must be set for each
node in the tree. This will require checking each node before finishing the
analysis. Here is a list of features:
Position (wpos). The linear position of the word in the sentence. This should not be modified or annotated, except for new empty nodes created by the annotator, which should always be given a wpos feature which inserts the new node in the place where it would belong if it were not empty.
Word (lex). This is the inflected word form associated with the node. It is almost always correctly displayed already. Example: "anunció" (he/she announced)
Part-of-Speech (POS). This is the lexical class, taken from a short list. Example: verb. Specific options:
o V -- verbs, but not auxiliary verbs (=Aux)
o N -- common nouns
o PN -- proper nouns
o Adj -- adjectives
o Adv -- adverbs
o P -- prepositions and subordinating conjunctions
o Pron --pronouns
o Conj -- coordinating conjunctions, but not subordinating conjunctions;
also includes the comma used in enumerations as a substitute for "y" (and) or "o" (or)
o Det -- determiners
o Aux -- auxiliary verbs
o Pun -- punctuation marks, but not comma used as conjunctions
o Sym -- various symbols (%, =, and the like)
o Uh -- speech-specific sounds, even if meaningful (such as /ui/)
o Misc -- everything else, including greetings (hola, -Hello-) and interjections (vale -O.K.-)
Citation form . This is the base form (lexeme) of the inflected form. A first "guess" will be provided, which needs to be checked and possibly corrected. Example: for the expression “demostró” (he/she showed), <demostrar> (show).
num -- number: singular or plural.
gen -- gender: masculine (m), feminine (f), or
common (c). Common gender is
for nouns which end in “–e”, such as estudiante (student),
which may be either masculine or femenine.
det --determiner: definite and indefinite. The definite article,
el|la|los|las, always precedes the noun in
Spanish. Sometimes, the definite article may also precede an
infinitive: e.g. el liberar
(freeing). The indefinite article un|una|unos|unas
can also precede the noun in Spanish.
tense – present (pres): demuestro (I prove); preterit (pret): demostró (he proved); future (fut): demostrará (he will prove).
aspect – progressive (prog): estar realizando (be carrying out); or perfect(perf): haber dejado (have left).
voice- active: alquilo (I rent); or passive (pas): fueron alquilados (they were rented).
mood – indicative (ind), subjunctive (sub), conditional (cond) or imperative (imp).
num --number: singular (sg) or plural (pl).
per -- person: first (1), second (2) or third (3). The subject is often omitted in Spanish but the ending of the verb indicates the person and number. So, in Igualmente señaló (He/She also pointed out), we can assume that the subject is a third person singular.
DOclitic -- direct object clitic pronoun: lo|los|la|las
(him|them|her|them). In Spanish, the direct object is sometimes
repeated. For example, in the clause "Esta inversión ... el
gobierno la considera ..." ("This investment... the government
considers it..."), both esta inversión
(this investment) and la (it)
fulfill the function of direct object (Obj). In such cases, we remove
the node for lo (him), la (her), los|
las (them) and will add "DOclitic: lo| la| los| las" to the features of the verb. If
the Direct Object is not doubled, then the clitic stays as a separate
argument of the verb.
IOclitic -- indirect object clitic pronoun: le or les. In Spanish, the indirect object is often repeated, usually for emphasis. For example, in the clause "[...] a los Arabigos todavía les queda un camino por recorrer"(the Arabic ones still have a way to go), both a los Arabigos (to the Arabic ones) and les (them) fulfill the function of indirect object (Obj2). In such cases, we remove the node for le (him/her/it) or les (them) and will add "IOclitic:le| les" to the features of the verb. If the Indirect Object is not doubled, then the clitic stays as a separate argument of the verb.
refle-- reflexive: se instalaron (they installed themselves)
none -- use "none" when the word is not inflected, other than infinitive verbs (e.g. adjectives in the base form)
Deep Syntactic Role (DSyntRole). The DRole reflects the argument patterns of a verb in its default active form. Thus, it indicates the role of a daughter argument node with respect to its mother predicate node in a somewhat abstract representation.
o Root. This is the main word (usually the verb) in a sentence.
Verbs are heads of sentences and clauses.
Back to verbs
The head of any complete clausal utterance is the main verb. Incomplete utterances (NPs, PPs, Greetings) should have as their head the usual head for that type of phrase.
Auxiliary verbs (ser/estar -be- and haber -have-) are deleted. Their meaning is represented as features on the main verb (for example, aspect: progressive). Modals (poder -can-, deber -should-, tener que -have to, must-) are syntactically very much like auxiliaries, but they are included in IL0 for semantic reasons as dependents on the main verb, which is always in the infinitive form in Spanish. In all cases, when the main verb is missing, as in VP ellipsis, an empty verb node should be created and used as the head of the entire clause.
Sequences of auxiliary verbs (había sido alquilado -it had been rented, podría haber estado lloviendo -it could have been raining-) should be annotated with the main verb as the head, and all auxiliaries removed and modals represented as dependents on heads.
When the main verb is a form of the copula, the head of the clause will be the predicate. There are only two kinds of copular structures in Spanish: equative and predicative. There are two verbs in Spanish that correspond to the English copula in Predicative copular constructions: ser and estar. So, the copula used will be listed under the features of the headword. For equative copular constructions, only “ser” can be used.
Back to verbs
Argument vs. Adjuncts
In distinguising between arguments and adjuncts , consistency is the mosy important thing. This distinction will matter most for annotating empty categories. In addition, each argument will be annotated with a feature encoding its grammatical role. All non-arguments will be annotated as modifiers, including function words.
The only NPs that will be considered arguments for annotation purposes are:
NPs that never appear with a preposition;
NPs that appear with the preposition "a" (personal "a"). In Spanish, the preposition "a" precedes an animate direct object with many verbs: prevenir a (warn), enfrentar a (face), etc.;
NPs that appear as part of an obligatory prepositional complement (e.g. 552.4 millones de dólares serán destinados a proyectos de inversión... -52.4 million dollars will be allocated to investement projects-); or Se está pensando en un proceso gradual... (We are thinking about a gradual process..).
Back to verbs
The role of each argument (subject, object, indirect object) must be annotated as a feature of its node. The deep grammatical relations should be annotated particularly when there is a functional role reversal, i.e. a mismatch between surface subject and deep subject. There are three possible cases:
A) In an impersonal construction, (e.g. se están realizando proyectos/ projects are being carried out), the surface subject should be annotated as the deep object and a created node, "<pro>", will become the deep subject. We delete the node for the pronoun "se", and include "imper" as a feature of the verb.
B) For a reflexive construction, we need to distinguish between two
different cases: "real"reflexive verbs (the subject and the object are one
same person) such as matarse (kill oneself), and "inherent"reflexives.
In the former, the surface subject should be annotated as both deep subject
and deep object. For Inherent reflexives, i.e., verbs that are conjugated
like reflexive verbs but have a different subject and object (e.g.
enfrentarse a –to face- in "[...] sus grandiosos proyectos
fracasaron tras enfrentarse a interlocutores cambiantes y a menudo
adversarios..." -[...] their grandiose projects failed upon
encountering changing and frequently adversary interlocutors-). Here, we
should not include a separate node for the pronoun se, but rather list it next to the verb,
e.g., <enfrentar+se>, and annotate the verb with the feature
"refle."
C) In a passive construction, (e.g. fueron alquilados
–they were rented) the surface subject should be annotated as the deep
object. If the logical subject in the form of a por phrase (e.g.: "por el presidente Gonzalo
Sanchez de Lozada” -by the President Gonzalo Sanchez de Lozada) is
present, it should be annotated as the deep subject. If it is absent, an
empty <pro> node should be created for the deep subject.
Back to verbs
See the general discussion under Empty
Nodes
Back to verbs
Raising verbs will not have a missing category. Instead, annotate them with the surface subject as the direct dependent of the lower verb. In other words, in a raising construction, it is really the lower verb that is imposing the selectional restrictions on the subject of the whole clause.
Verbs (and adjectives) that will be regarded as raising predicates here include parecer hacer algo (seem to do something), necesitar hacer algo (need to do something), soler hacer algo (tend to do something), empezar a/ comenzar a hacer algo (start to do something), resultar estar/ser/tener (turn out to be/to have), ir a hacer algo (be going to do something (gonna)), continuar haciendo algo (continue doing something), estar seguro/-a de ser/estar/tener (be certain about being/having), ser probable (be likely), acabar de hacer algo (finish doing something), venir haciendo algo (be doing something).
1. La compra parece <estar> excluida de momento... (Purchase seems to be ruled out at the moment...)
2. [...] las economías latinoamericanas comienzan a plantearse modelos con mayor pragmatismo. ([..] the Latin American economies are beginning to show models with increased pragmatism).Back to verbs
Control structures should have an empty node included as the subject of the lower verb.
Subject control structures, such as those having intentar hacer algo (try to do something) as their head, are easy to confuse with raising structures (e.g. headed by parecer hacer algo -seem to do something-) because they appear to be the same on the surface.
...Sánchez de Lozada intentará mejorar la distribución de recursos... (Sanchez de Lozada will try to improve the distribution of resources...)
Juan parece desatender sus obligaciones. (John seems to neglect his duties).
Some common subject control verbs/adjectives are intentar hacer algo
(try do something), esperar hacer algo (hope/ expect to do
something), querer hacer algo (want/wanna do something), estar
deseando hacer algo (be keen to do something), estar ansioso/-a por
hacer algo (be eager to do something), desear hacer algo (wish to
do something), decidir hacer algo (decidir to do something), ser
tonto por hacer algo (be silly to do something), ser dichoso/-a
por hacer algo (be lucky to do
something).
Object control verbs include: persuadir (persuade), forzar (force). An empty node must be
included as the dependent of the lower verb.
Note that although querer (want) is a subject control verb when the
subject of querer is the same as the
subject of the embedded clause, when the subject of the lower clause is
different, it is not.
Back to verbs
Non-finite (present participial, past participial or infinitival) can
appear with (1) or without subjects (2 and 3).
1. Cediendo la inflación, en los últimos tres
años ya no se habla de crisis. (With inflation
lowering, there is no more talk about crisis in the last three
years).
3. Fundada la empresa "Russian Real Estate" en 1989, recientemente realizó un estudio ... (Founded in 1989, the firm "Russian Real State" recently carried out a study...)
When they appear without subjects, an empty "<pro>" node should be included as a dependent of the verb. In general, non-finite clauses will be dependents of main verbs.
Back to verbs
Small clause complements will be analyzed with the predication as the head of the small clause and dependent on the head verb. The predication may be nominal, prepositional, or adjectival.In the following, the small clause is bracketed:
1. Esta inversión,
según Cossio, aun cuando no es significativa el gobierno la considera [la
única manera de asegurar crecimiento]... (This investment, according
to Cossio, although not significant, the government considers it [the only way
to ensure growth]...)
2. ... y la duración de los
trabajos que el informe estima [de tres a cinco veces superior]...
(...and the duration of projects, which the report estimates ["three to five
times greater"])
The analysis of small clauses is identical to predicative copular
constructions, since the overt copula is omitted anyway at IL0.
In the case of a past participle-headed predication, like the following,
the participle should be tagged as a verb as well. The missing arguments
(the deep subject) needs to be added.
Back to verbs
In Spanish, the verb "haber" (there is, there are) is conjugated in the 3rd person singular to indicate existence.
1. [...] hubo la convicción... ([…]there was always the conviction…)
2. No habrá liberación masivas del Café retenido… (There Will Not Be A Massive Release of Stockpiled Coffee)
In this type of structure, the
form of the verb “haber” is the head, and the surface subject is
also the deep subject.
Back to verbs
As with declarative clauses, the head of a question will be its main/lexical verb. The interrogative pronoun will be a dependent of the main verb like any other argument.
When the interrogative pronoun is part of a long-distance dependency, it will not be a dependent of the highest main verb, but rather on the embedded main verb heading the clause in which the interrogative pronoun originated. The linear order will allow a reconstruction of the pronoun's surface position. In cases of long-distance dependencies, there may be "crossing arcs". This is ok.
Back to verbs
If an overt subject is not present, as in (1), include an empty noun; otherwise an imperative will have the same analysis as a declarative sentence.
Back to verbs
A relative clause will be a dependent of whatever it modifies, in most
cases a noun. As with other clauses, its main verb will be its own head. The
relativizer will be a dependent of the main verb like any other argument or
adjunct.
In long-distance dependencies(e.g., Éste es el presupuesto que el ministro creyó que el parlamento había aprobado. -This is the budget that the Minister thought the Parliament had approved-), the relativizer will not be a dependent of the highest main verb, but of the embedded main verb heading the clause in which it originated. The linear order will allow a reconstruction of its surface position.
Reduced relative clauses (e.g., Según un estudio realizado recientemente por ...; According to a study recently performed by...) are analyzed like regular relative clauses without overt relative pronoun. They have only an object node inserted, but not an empty complementizer, nor an empty auxiliary.
Reduced relative clauses appear similar to non-finite past or present participial clauses and may be difficult to distinguish from these. However, they will always depend on a nominal rather than a verbal head. Two tests to use to decide whether the clause is modifying the verb or a noun:
Can you insert mientras (while) or siendo (being) at the beginning without changing the meaning? If yes, it should modify a VP; otherwise, it's a dependent of the NP.
Back to verbs
The surface vs. deep subject of a passive construction can be indicated through the use of the features. The grammatical subject (usually the patient) will be indicated as the deep object.
The underlying subject (usually the agent), if expressed, will be annotated
as the deep subject. If it is not expressed, an empty node should be
included.
The node for the auxiliary ser (be) will
be deleted, but we will include it under the features of the participle.
1. [...] los 300
metros cuadrados del tercer piso... fueron alquilados esta mañana...
([…] the 300 square meters on the third floor […]were rented this
morning…)
2. Las mismas fueron
inscritas en las ofertas de la Unión Europea en la Ronda Uruguay. (The
same agreements were recorded in the European Union's proposals at the Uruguay
Round.)
3. El presupuesto nacional
de Bolivia... fue promulgado recién este viernes por el presidente Gonzalo
Sanchez de Lozada. (The Bolivian national budget… was made public this
Friday by President Gonzalo Sanchez de Lozada)
Back to verbs
VP-ellipsis should be annotated with an empty verbal head as the root node. Any auxiliaries and the subject will be dependents of this node. No missing arguments should be added. Also see section on empty nodes
The head of a noun phrase is the head noun. A definite determiner (el| la| los| las), or an indefinite one (un| una| unos| unas) will be included in the features of the head noun; any other determiner is a dependent of the head noun. Adjectives are separate dependents from determiners. If there are multiple adjectives, the default structure will simply have each adjective as a direct dependent of the noun. This is the case for multiple determiners also.
Proper nouns should have the value PN for feature POS. They are treated largely like nouns, except that compound proper nouns are not analyzed syntactically as if they were common nouns. So in América Latina, América is the head, has POS PN, and carries the other features of this proper noun. Latina is a dependent on América (with SRole Mod), and also has POS PN.
In a noun phrase consisting of only a
quantifier, the quantifier should be the head of the NP. Any modifying phrases
are directly dependent on it.
1. Los Doce
acordaron una tregua... (The Twelve agreed to a truce…)
Adjectives and adverbs will be coded in much the same way that nouns and
verbs are coded. The same procedure is
followed.
General
Adverbs and adjectives depend on the lexemes they modify --
adjectives for nouns, adverbs for verbs. For example, in the phrase
sus grandiosos
proyectos (their grandiose projects), the adjective grandiosos
modifies the noun proyectos by identifying the scope of the
projects. In "distribuir más equitativamente", the
adverb "equitativamente" modifies the
verb by specifying the manner in which the action was
performed.
Degree
The degree of modification can be specified by modifying adverbs such as
muy (very), as in muy demandados (very demanded), or bien (well), as in bien presentes (well
established).
In addition, degree can be expressed by way
of comparative and superlative constructions. For the comparative, the
modifiers más (more) and menos (less) precede the adjective in
its positive form: soy más alto que Juan (I’m taller than Juan) soy
menos inteligente que tú (I’m
less intelligent than you). In the superlative,
el| la| los| las (the) and más or
menos precede the adjective: el más nuevo (the newest
one).
In order to simplify the lookup procedure in
Omega, and to allow for a common interlingual representation of degree,
adjectives and adverbs will be shown in their base form (called their
"positive degree"). If they are in the text as comparatives or superlatives,
that will be indicated as a feature of their node.
In
Spanish, there are a few irregular comparative forms. These will also be
represented in the parse tree in their base form. Below is a short list, with
the positive form in capitals, followed by the irregular comparative and
superlative forms.
BUENO/BUENA/BUENOS/BUENAS/BIEN (good/well) -- mejor
(better) -- el/la mejor; los/las mejores
(the best)
MALO/MALA/ MALOS/MALAS/MAL (bad/badly) -- peor (worse) -- el/la peor;
los/las peores (the worst)
JOVEN/JOVENES (young) -- menor (younger)-- el/la menor; los.las menores
(the youngest)
VIEJO/VIEJA/VIEJOS/VIEJAS (old) -- mayor
(older) -- el/la mayor; los/las mayores
(the oldest)
Participial
adjectives
Participial forms of
verbs, i.e., those ending in "-ando" or
"-iendo" (present participle) or in "-ado" or "-ido"
(past participle), often show up in the same syntactic positions
as adjectives. In addition, some
adjectives have the form of past participles, e.g., cerrada (closed), inesperados
(unexpected).
These participles and participial
adjectives can appear in:
(a) in post-nominal position:
* interlocutores cambiantes
(changing interlocutors)
* una tienda cerrada (a
closed store)
(b) copulative position
* Los resultados fueron
acumpando. (The results were accumulating.)
* Las cortinas están descoloridas. (The curtains are
faded.)
The semantic distinction between participles and adjectives is that participles refer directly to the event denoted by the verb and cast the referent of the modified noun into one of the roles of that event. Adjectives, on the other hand, refer to a state that characterizes the referent of the modified noun. It is not always easy to tell the difference, but here are some tests to help tell the difference:
For
now, there will be separate nodes for V and Prep. Annotators will
annotate each with the correct concept, and if that concept conflates meaning
of the preposition in the verb, e.g.: enfrentarse a (be faced with),
visitar a (visit with), acabar de (to have just finished) then
mark the preposition as "EMPTY".
At IL2, the preposition will disappear.
To verbs
There are two verbs in Spanish that correspond to the English copula: ser and estar. So, the copula used will be listed under the features of the headword. Sentences whose main verb is a copula fall into two types: equative, and predicative. Equative use of to be equates two entities ([...] lo más importante es mantener un equilibrio fiscal...; [...] the most important thing is to maintain a fiscal balance...) while the predicative use asserts that the post-verbal predicate holds of the deep subject (Juan es médico; John is a doctor). For equative copular constructions in Spanish, only “ser” can be used.
Both Equative or Predicative copular constructions will have the predicate (noun, adjective, or preposition) as their head. Note that the grammatical role of the predicate reflects the role of the predicative construction in the sentence. In [...] pero <los proyectos> están lejos de ser satisfactorios.. (but <the works> are far from being satisfactory...), ser (be) is a modifier because it depends on lejos (far)
Conjunction has its own part-of-speech (Conj). The conjunction (y, o,
pero,etc) (and, or, but, etc) is placed as a dependent of the first
conjunct with role Mod, and the second conjunct is a dependent of the
conjunction with role Obj.
If a comma acts as a conjunction, it
is treated as such (given part-of-speech Conj and analyzed as in the above
paragraph); "[...] la región comienza a mostrar avances significativos, y
prueba de ello es que en los últimos tres años no se habla ya de crisis, la
inflación está cediendo...(the region is beginning to show significant
advances, and proof of that is in the last three years there has no longer
been any talk of a crisis, inflation is yielding…) However,
note that in "[...],y prueba de ello es
que..." ([...], and proof of that is…) the comma does not serve
as a conjunction (since there is an explicit "y" –and-), and it is
removed at IL0. The
last comma does serve as a conjunction.
To verbs
An
empty node is a node that does not corrrespond to a word (or other graphical
manifestation such as a punctuation mark) in the input string.
In all
cases, when you create an empty node, give it a wpos feature so that it ends
up in a position that roughly corresponds to its grammatical function (i.e.,
if it is a subject, to the left of its governing verb, and so on).
There are (at least) two types of empty nodes.
Big-PRO, and related cases
These are cases of empty nodes where the meaning can be derived from the syntactic context:
Big-PRO is the missing subject in embedded infinitivals. For example, in "[...] un joint venture ruso-turco acaba de terminar la construcción de un edificio...” ([…] a Russian-Turkish joint venture has just finished construction on a small executive office building…), the implicit subject of "terminar" is (co-referential with) "el joint venture ruso-turco".
VP ellipsis is the term for cases in which the main verb is replaced by the adverb también (too), as in 1, or by a modal, as in 2:
VP ellipsis requires an empty verbal head; the adverb or the modal are deleted in the usual manner and replaced as needed by features. In addition, add all missing arguments (but not adjuncts), as described above. In these cases, we introduce an empty node and identify the node with which it is co-referential. We then copy the co-referential node's word and lexeme values to the empty node, but add brackets around the value: "< venture>".
Gapping. In gapping, a verb is deleted in a conjunction (Francisca comió un melocotón y Elisa, un albaricoque. Francis ate a peach, and Elise, an apricot.). In the second conjunct, the verb must be restored as an empty node, as with VP ellipsis.
These include:
missing por phrases in passives. For example, in "[...] los 300 metros cuadrados... fueron alquilados esta mañana..." ([…] the three hundred meters… were rented this morning…), we know that there is another argument role which is not explicitly mentioned, namely the person who rented the space.
arbitrary empty subjects in adjunct clauses. For example, in "[...]el liberar los mercados permitiría un aumento en el nivel de vida..." ([…] freeing up the markets would allow for an increase in the standard of living…), the subject of "liberar" (freeing up) is not specified.
In these cases, we cannot tell what the understood missing item is syntactically, but rather only pragmatically. We introduce an empty node and we label both the lexeme and the word feature of the new node "<pro>". In case of doubt (i.e. whether it is big-Pro "<venture>" or " little-pro <pro >"), ask yourself: can I tell from syntax alone what this node means? If no, "<pro>". If yes, fill in a copy of the co-referent lexeme.
Remove all punctuation, except meaningful punctuation. Examples:
Quotes -- leave them (open and closed) attached to the constituent that is quoted.
Commas that act as conjuncts (see Conjunction)
Do remove:
All non-conjunction commas.
All sentence-final punctuation.
All dashes and so on.