Monday, October 8, 2007

Annotating text - questions raised

Hey, it's been almost 2 weeks.
But I do have some results for you. They look like that:

"Leaving one’s own king under attack, exposing one’s own king to attack and also ’capturing’ the opponent’s king are not allowed."

This would be the original text from the specification.
In order to transform this into a graph, thematic (theta) roles have to be given to each necessary part of the sentence. The redundant/disposable words are just "sharpened-out" by marking them with a #.

The output of this sentence would be something like this:

[ { [ Leaving|ACT one`s1|POSS #own king|{HAB,STATII} under_attack|STAT, ] , [ exposing|ACT one`s2|POSS #own king|{HAB,STATII} to_attack|STAT ] , [ #and #also capturing|ACT #the opponent`s|POSS king|HAB ] }|MODII #are not_allowed|MOD. ]

one`s1 <= They
one`s2 <= They

This might look a little confusing in the beginning, but it is also quite impressive, how easy it is for us humans, to understand relations and concepts of a sentence. But sitting there and annotating the text by hand quickly shows that many things are processed by our brain implicitly and are actually quite hard to put on paper.

To put it short:
Reading the above sentence does not make you think of possessors, habitums and stati at once, does it? We recognize verbs and nouns, the rest just seems to come "naturally".

This is in my opinion the biggest obstacle when it comes to machine understanding.

Anyway, several question especially concerning reasoning where raised. Those were:
  • How will we be dealing with numerals after all?
  • When is a word a numeral, when an article?
    • e.g.: "one can find ..." or "you can move with only one player"
      • one == same?
      • one == 1?
      • one == one/you?
  • How can realations in numerals be detected?
    • e.g.: "The chessboard has 8x8 field. Those 64 fields ..."
  • What happens to prepositions which seem unnecessary while annotation but actually do or can change the semantics of the sentence?
    • e.g.: "the near corner square to the right of the player is white"
    • "to the right of the player" (shows a location) is different from "the right of the player" (could also mean the right in a jurisdictional way)
  • Difference between verbs and their tense:
    • e.g.: "checkmate" vs. "checkmated" which mean something different
Well, a lot of new stuff to think about I guess ...

No comments: