IAMTC

Hindi IL0 Manual  

Version 0.2.1, July 22, 2004

Compiled by: Advaith Siddharthan


Table of Contents 


Features on All Nodes 

Note: this section needs work!

Each node in the dependency tree can be thought of as an attribute-value matrix, i.e., a bundle of features with values. All values must be set for each node in the tree. This will require checking each node before finishing the analysis. Here is a list of features:


Verbs 

Verbs are heads of sentences and clauses.


Verbs and Auxiliaries: choosing a head

Back to verbs

The head of any complete clausal utterance is the main verb. Incomplete utterances (NPs, PPs, Greetings) should have as their head the usual head for that type of phrase.

Auxiliary verbs (do, have, had, auxiliary-be) are deleted. Their meaning is represented as features on the main verb (for example, tense:fut). For example, jaa raha hai (is going), jaayega (will go) and jata hai (goes) should all be represented by the verb jaa (go) with tense: present-continuous, future or present. Modals (sakta (can)) are syntactically very much like auxiliaries, but they are included in IL0 for semantic reasons as dependents of the main verb. In all cases, when the main verb is missing, as in VP ellipsis, an empty verb node should be created and used as the head of the entire clause.

When a form of the copula is present in a sentence, the head of the clause will vary depending on the type of copular sentence. Predicative copular constructions will have the predicate as their head. Equative copular constructions will have the copula as their head.

 


Arguments and adjuncts

Back to verbs

Arguments vs. Adjuncts

In distinguishing between arguments and adjuncts, consistency is the most important thing. This distinction will matter most for annotating empty categories. In addition, each argument will be annotated with a feature encoding its grammatical role. All non-arguments will be annotated as adjuncts, including function words.

The only NPs that will be considered arguments for annotation purposes are

  1. NPs that never appear with a preposition;

  2. NPs that are obligatory (e.g. Y in X ko Y per rakh (put X on Y).

    A list of argument patterns of common verbs can be consulted for questionable cases.


Grammatical relations

Back to verbs

The role of each argument (subject, object, indirect object) must be annotated as a feature of its node. See the features page for a more detailed description.

Both deep and surface grammatical relations should be annotated. For Hindi, these are one an the same.


Empty categories and missing constituents

Back to verbs

See the general discussion here.


Light Verbs

Back to verbs

There is a small class of Hindi verbs that function as light verbs in verb compounds. The main light verbs are ja/gaya (go/went), le (take), de (give), daal (put). These verbs are semantically void and should be deleted (their function is similar to auxillary verbs in that light verbs decide agreement features of arguments of the verb compound; however the arguments are determined by the main verb solely).

Examples:

There are some tricky cases where what appears to be a light verb is actually not semantically void. In these cases, they should not be removed.

Examples:

In the the above examples, ja (go) is not functioning as a light verb. In the first instance, the kar clitic indicates sequencing; hence this is an example of a missing subject for ja (see also Empty Nodes). In the second example, ja contributes meaning to the sentence and should be preserved as a node. In such cases, the supposed light verb should be treated as the head of a verb group, and the other verb (in this case, kha (eat)) should be a dependent of it.



Raising Verbs

Back to verbs

I cannot find an example of raising verbs in Hindi. This doesn't necessarily mean that it doesn't happen, but i can't find documentation that explicitly says it is impossible.


Control Structures

Back to verbs

Control structures should have an empty node included as the subject of their lower verb.

Some common subject control verbs/adjectives are koshish(try), aasha (hope), chaahna or aakaansha (want, desire), utsuk (keen), nirnay lena or nishchay karna (take a decision, decide), murkh hona (be silly), bhagyavaan honaa (be lucky).

Object control verbs include: tell, tempt, force, persuade, appeal to. As with subject control verbs, object control constructions cannot be used with expletives or non-thematic subjects of sentential idioms. Here too an empty node must be included as the dependent of the lower verb. Just like subject control verbs can be confused with raising, object control verbs can be confused with ECM verbs. Using an expletive object is generally a good test to distinguish between the two, as shown here with the control verb decide and the ECM verb believe.

  1. ? I decided there to be a problem.

  2. ? I decided the shoe to be on the other foot.

  3. I believed there to be a problem.

  4. I believed the shoe to be on the other foot.

Note that although want is a subject control verb, when it appears with a second NP, it is an ECM verb. In addition, it can appear with a infinitival for-complement. An empty node should only be included in its subject control version. The case with for should be analyzed as an ECM construction, differing only in the fact that for appears as a complementizer dependent of the embedded verb.

  1. I want to leave.

  2. * There wants to be a solution.

  3. I want him to leave.

  4. I want there to be a solution.

  5. I want for him to win the race.

Here are some more examples to motivate the different treatment of the two constructions.

  1. That seems to be my husband.

  2. ?? That tried to be my husband. (sounds like an insult to whoever the deictic pronoun refers to)

  3. I believe that to be my husband.

  4. ?? I persuaded that to be my husband (sounds like an insult to whoever the deictic pronoun refers to)

In English, we cannot use that as a deictic pronoun to refer to people without a derogatory effect (since the designated person becomes an object, that being used only for objects): ??that  (= George) likes apples or ??I work with that (= Hardy).  The pronoun that  can, however, be used to refer to something deictically in order to predicate of  it that it is a (particular) person: that is my husband  or that is my co-worker.  Here, that  does not refer to a person, but to unformed sense data, which is then identified as being a person.  The data above shows exactly the same pattern: that can be used felicitously (without derogatory effect) as a subject of a predication (1, 3), even if that subject has raised to surface subject (1) or surface object (3) position of another verb.  This is because in raising (1) and ECM verbs (3), the argument is not an argument of the higher verb.  However, that cannot be used to refer to a person (without derogatory effect) in any other argument position -- in (2), that  is not only subject of the lower predication, but also of the higher verb (subject control verb), and in (4), it is not only subject of the lower predication, but also object of the higher verb (object control verb).  Thus the odd effect comes from the use of control verbs and, as a consequence, the that participating in the higher verb's argument structure.


Exceptional Case Marking Verbs

Back to verbs

In an exceptional case marking (ECM, also known as AcI "Akkusativ cum Infinitiv" or "raising-to-object") construction, the NP that morphologically appears to be a direct object is really  the subject of the lower verb. That is, it will have as its head not the ECM verb, but the lower verb.

Common ECM verbs include expect, assume, believe, forbid, know, let, need.

As with raising verbs, the best tests are to use expletive there and non-thematic subject idioms.

  1. I believe there to be a problem.

  2. I believe the shoe to be on the other foot.

  3. I need there to be a solution.

  4. I need the cat to be out of the bag.

  5. He let there be light.

ECM constructions may be confused with object control. See Control for a discussion of this matter.

Exceptional case marking constructions with for as in (1-2) below should be analyzed as a subordinate clause with for as a complementizer dependent on the subordinate clause's main verb:

  1. For me to eat Crispy Critters would be unprecedented.

  2. I want for you to eat only Crispy Critters.

Some ECM verbs (need) subcategorize for either an NP and an infinitive or an NP and a past participle. In the case of the latter, the analysis will be the same as that of the small clause complement analysis. The past participle will be tagged as an adjective.

  1. John needs me to solve the problem.

  2. John needs the problem solved.


Non-finite clauses

Back to verbs

When non-finite verb phrases appear without subjects, an empty noun node should be included as a dependent of the verb. If a subject noun phrase is present and part of the VP, as in (1) above, an empty node should not be included. Instead, that head noun (and its dependents if any) should be a dependent of the non-finite verb.

  1. Norma ki sab pe shikaayat karna mujhe hamesha sataata hai (Norma 's everyone on complaint doing to-me always annoys is) “Norma's complaining about everyone always annoys me”.

  2. Sab pe shikaayat karna hamesha auron ko sataata hai (Everyone on complaint doing always others to annoy is) “Complaining about everyone always annoys others”.

  3. Abhi jaane se sab bhang ho jaayega (now leaving from everything disrupt will go) “Leaving now would disrupt everything”.

  4. Parinaam se dukhi hokar Uli ne mehnat karna chod diya results from sad happened-then Uli did effort doing cease gave) “ Depressed by the results, Uli ceased to make an effort”.

  5. Jaane se pehele Max ne Mike ko bulaya (Leaving from before Max did Mike to call) “Before leaving, Max called Mike”.

In general, non-finite clauses will be dependents of main verbs. Exceptions are reduced relative clauses, if they modify nouns. In cases that are not clear, the default choice of a head should be the verb.

 


Small clauses

Small clause complements will be analyzed with the predication as the head of the small clause and dependent on the head verb. The predication may be nominal, prepositional, or adjectival.  In the following, the small clauses are bracketed:

  1. Prabhandkarta [Ernie ko sangat ke liye mehetvapoorna] samajhte hain (Manager Ernir to company for importance considers is) “The manager considers [Ernie an asset to the company]”.

  2. Adhikari [us mamale ko hamare charche ke chaukhat ke bahar] samajhte hain (officer that issue to our discussion 's scope 's outside considers is) “The agent considers [that issue outside the scope of our discussion]”.

  3. Hum [is samasya ko mushkil] samajhte hain (We this problem to difficult consider is) “We consider [the problem difficult]”.

The analysis of small clauses is identical to predicative copular constructions, since the overt copula is omitted anyway at IL0.

In the case of a past participle-headed predication,like the following, the participle should be tagged as an verb as well. The missing arguments (the deep subject) needs to be added.

  1. Hum [is samasya ko samadhan huve] samajhte hai (We this problem to solution happened consider is) “We consider [the problem solved]”.

  2. Hum [is gaadi ko marammat kiye huve] chahte hain (We this car repair done to need is) “We need [the car repaired]”.


Wh-questions

Back to verbs

Not quite “wh” in Hindi, but this section deals with questions containing kya (what), kaun (who), kaunsa (which), kab (when), kaise (how) etc. As with other full clauses, the head of a wh-question will be its main/lexical verb. The wh-word will be a dependent of the main verb like any other argument.

When the wh-word is part of a long-distance dependency, it will not be a dependent of the highest main verb, but of the embedded main verb heading the clause in which the wh-word originated. The linear order will allow a reconstruction of the wh-word's surface position. In cases of long-distance dependencies, there may be "crossing arcs". This is ok.

 


Imperatives

Back to verbs

If an overt subject is not present, as in (1), include an empty noun; otherwise an imperative will have the same analysis as a declarative sentence.

  1. Mujhe akele chod! (Leave me alone!)

  2. Tu mujhe akele chod! (You leave me alone!)

 


Relative clauses

Back to verbs

A relative clause will be the dependent of whatever it modifies, in most cases a noun. The arc is labeled MOD. As with other clauses, its main verb will be its own head. The relativizer will be a dependent of the main verb like any other argument (or adjunct, in cases such as woh jagah jahaan usne machli dekha (that place where he-did fish see) “the place where he saw the fish”).

In long-distance dependencies, the relativizer will not be a dependent of the highest main verb, but of the embedded main verb heading the clause in which it originated. The linear order will allow a reconstruction of its surface position.

Reduced relative clauses (aapse chuna huva udaan (you-from choose has-been flight) “the flight chosen by you”) are analyzed like regular relative clauses without overt relative pronoun. They have only an empty subject inserted, but not an empty complementizer, nor an empty auxiliary.

Reduced relative clauses appear similar to non-finite past or present participial clauses and may be difficult to distinguish from these. However, they will always depend on a nominal rather than a verbal head. Although most reduced relative clauses are postnominal, it seems that they can be preposed as in (1) below. When sentence initial, it may be difficult to decide what they depend on. If it is clear that they modify a noun phrase (as in (1) below), choose the noun; otherwise choose the verb as their default head, have in (2) and (3), sang in (4). Note that world knowledge needs to be used when making these decisions.

  1. [Staying at the Palace Hotel], you can use the gym.

  2. [Returning on the eleventh], I have a couple flights, the first one departing Baltimore at twelve forty p.m.

  3. The lowest rate I have for a car [using your discount number] is going to be Avis.

  4. [Playing in the yard], the boy sang happily.

Two tests to use to decide whether the clause is modifying the verb or a noun:


VP ellipsis

Back to verbs

VP-ellipsis should be annotated with an empty verbal head as the root node. Any auxiliaries and the subject will be dependents of this node. No missing arguments should be added.  Also see section on empty nodes.

 


Nouns and Proper Nouns 

 

Nominal modifiers

The head of a noun phrase is the head noun. Any determiner is a dependent. Adjectives are separate dependents from determiners. If there are multiple adjectives, the default structure will simply have each adjective as a direct dependent of the noun. This is the case for multiple determiners also.

Adverbial noun modifiers can be dependents of the determiner or the noun in the phrase they modify. For example, lagbagh (approximately), n bartaav se (practically), jyada-se-jyada (at most), only can depend on cardinals or some quantifiers; kum-se-kum (at least), sirf(only), bus(just), even can depend on nouns (i.e. modify entire noun phrases). These classes have some overlap; the default head choice in cases of ambiguity should be the noun.

Compound nouns

Compound noun phrases, when clear, can have multiple noun phrases as dependents. For example, chaubis ghanta samachar seva (twentyfour hour news service) will have seva (service) as the head and samachar (news) and ghanta (hour) as its direct dependents. Chaubis (twentyfour) will be a dependent of ghanta (hour). A good test for this is to remove each noun in turn, to see if the phrase still retains part of its original sense. Because a 24 hour news service is a news service and a 24-hour service, this analysis is the one we want.

In cases where it's not clear whether or which nouns modify each other, the default compound structure will have all modifying nouns as direct dependents on the rightmost noun.

http://www.cis.upenn.edu/~creswell/dependency/compound.gif

Proper Nouns

Proper nouns should have the value PN for feature POS. They are treated largely like nouns, except that compound proper nouns are not analyzed syntactically as if they were common nouns, but rather given right-branching structures. (The intuition is that they are really fixed phrases.) So in British Airways, British is the head, has POS PN, and carries the other features of this proper noun (in American English, singular number). Airways is a dependent on British (with SRole Adj), and also has POS PN. In Hindi, proper nouns cannot be identified by capitalization (which doesn't exist). Hence all nouns or compound nouns that are the names of companies, organizations, locations, people or animals etc should be marked as proper nouns. Note how this differs from the English Manual, where Heathrow Airport would be marked as PN PN, while Heathrow airport would be marked PN NN. In Hindi, heathrow hawaii adda (Heathrow air terminal) should be marked PN PN PN for standarization.

Quantifier headed NPs

In a noun phrase consisting of only a quantifier, the quantifier should be the head of the NP. Any modifying phrases are directly dependent on it.

  1. Sab aa gaye (All come went) “All came”



Adjectives and Adverbs

Adjectives and adverbs will be coded in much the same way that nouns and verbs are coded. The same procedure is followed.

General

Adverbs and adjectives point to modifying concepts -- adjectives for nouns, adverbs for verbs. For example, in the phrase, neela kitab (blue book) the adjective neela (blue) modifies kitab(book) by identifying the color of the book. In woh sunder nachchti hai (she gracefully dances is) “she danced gracefully" the adverb sundar (gracefully) modifies the verb by specifying the manner in which the action was performed.

Degree

The degree of the modification can be specified by other modifiers, such as bahut (very) or halka (light). These degree modifiers are also adverbs.

In addition, there are two kinds of degree specification that you probably know them as the comparative and superlative forms. In Hindi, the comparative is achieved by using jyada (more) and the superlative by using sabse jyada (of all more) “most”.

In order to simplify the lookup procedure in Omega, and to allow for a common interlingual representation of degree, adjectives and adverbs will be shown in their base form (called their "positive degree"). If they are in the text as comparatives or superlatives, that will be indicated as a feature of their node.  

Participial Adjectives

Quite often participial forms of a verb will show up in syntactic positions also occupied by adjectives. Some adjectives also have the form of participles. The present participle of a verb ends in "-ing," e.g., eating, buying; the past-participle ends in "-ed," e.g., loved, believed.

These participles and participial adjectives can show up

(a) in pre-nominal position:

The semantic distinction between participles and adjectives is that participles refer directly to the event denoted by the verb and cast the referent of the modified noun into one of the roles of that event. Adjectives, on the other hand, refer to a state that characterizes the referent of the modified noun.

It is not always easy to tell the difference. Here are some clues / tests to tell the difference:

(1) If there is no corresponding verb, it must be an adjective. E.g., unexpected, talented, down-hearted, diseased.

(2) If you can add the adverb "very" in front of the participial form, then it is probably an adjective. For this test to work, however, the adjective must be scalar or gradable. For example, the adjective blue is scalar and thus intensifiers like very can be added easily, and comparative and superlative forms exist: very blue, bluer, bluest. The adjective, triangular, however, is not gradable. The intensified and comparative forms sound funny: very triangular, the most triangular, etc.

The very smiling man. (bad, and thus a verb) The very frost-bitten man. (good, and thus an adjective) The very heart-breaking results. (good, and thus an adjective) The very quail-hunting vice-president. (bad, but maybe because hunting is not gradable?)

(3) If there are dependents on the participial form (a direct object, or an agent), then it is likely that it is a verb. Thus most postnominal modifiers will be verbs, since their position almost guarantees the presence of additional dependents.

(4) If the word is not listed in an on-line dictionary like Merriam Webster as an adjective, it is likely to be a verb.

(5) When in doubt, make your best guess and discuss the issue with Owen. Participles, which are sometimes coded as adjectives, are generally coded here as verbs. Participles are the -ing and -ed form of the verb, and are not main verbs. For example, in the sentence "The man eating the eggplant is old." the word "eating" is a present participle and modifies or specifies the "man". Similarly, in the sentence, "The man killed yesterday by police was buried today." "killed" is a past participle and again modifies or specifies "man". Since these are coded as verbs, they will also assign semantic roles.

 

Predicative Use of Adjectives -- N-Copula-Adj Constructions

See the manual section on copular constructions for how to handle such sentences as The book is blue.

 


Prepositions and Particles

IN PROGRESS.....

Prepositions and particles

Note that Penn TreeBank did something arbitrary, but consistent, across verbs. What we have decided to do for this project is mainly for the sake of consistency, not out of any strong theoretical bias.

For now, there will be separate nodes for V and Prep. Annotators will annotate each with the correct concept, and if that concept conflates meaning of the preposition in the verb, then mark the preposition as "EMPTY".


Predicative Use of Prepositions with a Copula

See the manual section on copular constructions for how to handle such sentences as The book is in the tub.



Copular constructions 

To verbs

Sentences whose main verb is a form of to be fall into several types, mainly existential, equative, and predicative. Existential use of hai asserts existence (chand par aadmi hai (moon on man is) “there is a man on the moon”), equative use of hai equates two entities (John woh doshi hai (John the culprit is) “John is the culprit”), while the predicative use asserts that the post-verbal predicate holds of the deep subject (John doshi hai (John culprit is) (John is guilty/John is a culprit).  These three constructions are treated in two different ways: existential in one way, and equative and predicative in another way. In general, the use of a definite determiner on the second argument suggests an equative use, while the lack of a definite article suggests a predicative use (Hindi does not have a non-definite determiner).

In the case of existential hai (be), the head of the sentence is the verb hai, with any prepositional phrase as an adjunct. The meaning of the existential construction is that the existence of the subject is asserted. Any PP is modifying the existence assertion.

In the case of equative and predicative hai, the predicate (Obj) of the verb hai as the sentence head with one deep syntactic Subj. The verb hai is treated as an auxiliary, and thus deleted. See verbs and auxiliaries: choosing a head. This analyses makes predicative and equative copula constructions look just like small clauses. Note that the grammatical role of the predicate reflects the role of the predicative construction in the sentence. In Mary laal hai (Mary red is) “Mary is all red”, laal (red) is the root of the sentence, while in Vaidya hote huve John aftar khoon dekhta tha (Doctar being is John often blood see was) “Being a doctor, John often saw blood”, vaidya (doctor )depends on dekhna (see) and is a MOD.

The meaning of a predicative construction is the assertion that the predicate holds of the subject. The meaning of the existential construction is that the identity of the two arguments is asserted.

If you are having trouble determining whether the use of to be is existential or predicative, use the following rules (also see Expletive subjects and there-insertion):

Gulab bahut laal hai (Rose very red is) “This rose is very red” (predicative)
Yeh phool gulab hai (This flower rose is) “This flower is a rose” (predicative)
Yeh phool woh phool hai jo main utha raha tha jab maine Pat ko pehle bar dekha (This flower that flower is that I pick-ing was when I-did Pat to first time see) “This flower is the flower that I was picking when I first saw Pat”.  (equative)
Phool guldaan main tha (Flower vase in was) “The flower was in the vase”. (predicative)

Sarvaagat chetna hai (Universal consciousness is) “There is universal consciousness”. (existential


Conjunction 

Conjunction has its own part-of-speech (Conj). The conjunction (aur (and), ya (or), lekin (but), etc) is placed as a dependent of the first conjunct with role Mod, and the second conjunct is a dependent of the conjunction with role Obj.

If a comma acts as a conjunction, it is treated as such (given part-of-speech Conj and analyzed as in the above paragraph). However, note that in "chicken, ducks, and geese", the second (last) comma does not serve as a conjunction (since there is an explicit "and"), and it is removed at IL0. The first comma does serve as a conjunction.

 


Empty Nodes 

To verbs

This section discusses cases in which the annotator must add an empty node to the tree. An empty node is a node which does no correspond to a word (or other graphical manifestation such as a punctuation mark) in the input string.

New empty nodes are created using the "new" option under "Node" in the TrEd tree editing tool. The new node should have feature POS to N (most cases) or V (if VP ellipsis). Give the new node a wpos feature so that it ends up in a position that roughly corresponds to its grammatical function (i.e., if it is a subject, to the left of its governing verb, and so on). When the fs files come out of the parser, the nodes have wpos features in increments of 10, so there are enough unused positions to place new nodes where they belong. Never reuse an already used position.

For the lex feature, first identify which node this node is coreferential with. This is usually straightforward. Then copy the co-referential node's word and lexeme values to the empty node, but add brackets around the value, for example "<Dominic>". Alternatively, use lex feature "<pro>" if it is hard to identify the correct coreferential node.

Sometimes, it is not clear what the reference of the empty node is ("arbitrary pro"). Arbitrary empty subjects can usually be found in adjunct clauses. For example, in Jicama khaana bachchaon ke mashtisk vikas ke liye achcha hai (Jicama eating children 's brain growth for good is) ”Eating jicama is good for children's brain growth”, the subject of "khaana (eating)" is not specified. In these cases, we label both the lexeme and the word feature of the new node "<pro>". In case of doubt ("<pro>" or "<child>"), ask yourself: can I tell from syntax alone what this node means? If no, "<pro>". If yes, fill in the lexeme. In the example sentence Eating jicama is good for children's brain growth, the syntax does not determine the identity of the missing subject (compare Buying Microsoft products is good for Bill Gate's wealth growth).

We now list cases of missing nodes.

John likes beans, and so does Mary
Henry thought he could jump over that wall, but Jules knew he couldn't
VP ellipsis requires an empty verbal head; the auxiliary is deletd in the usual manner and replaced as needed by features. In addition, add all missing arguments (but not any adjuncts!), as described above. The lexeme and word of the empty head should be filled in from the antecedent between brackets, e.g. "<play>" forMary plays with cats and so does Tony.
John ne do kutte khareede aur Mary ne teen (John did two dogs buy and Mary did three) “John bought two dogs, and Mary bought three”
Here, put in an empty noun head (in this case, for kutta (dog).

 



Punctuation

Remove all punctuation, except meaningful punctuation. Examples:

Do remove: