AutoModel - Processing natural language

Monday, October 29, 2007

Summing up my work in 19 slides

I just finished summing up the work of the last 12 months into 19 slides for my PhD-Meeting tomorrow.
I will upload the files asap when I got the web server back online on which I can put the files.

Tomorrow will be another "Try to get used to working with (Research-)Cyc" day. That thing is so powerful and yet so awkward to use once in a while ... and the stubby Java interface they issue with it doesn't help this a lot.
Well, I guess I'll have to deal with it.

I will give you a short sum-up of my slides later tonight after I refined them once more.

So long.

Wednesday, October 24, 2007

Preparing for PhD Meeting

I got a talk with my co-PhD-Students and my professor on Tuesday evening.
There I will present my results, what I did so far, what I achieved and what the future possible holds for me.
This also includes feedback to my work and giving directions for the next few weeks months of work.

I will offer my presentation as download here as soon as it's done. I should be done with it this Sunday. At least that's my plan.

See you guys soon.

Tuesday, October 9, 2007

Bordering aspects

The system we dream of is supposed to generate program code from natural language.
So far, so good.

My job is to put enough "common sense" into the processing task so that most of the mistakes machines still make can be avoided.
Or to cite Voltaire here: The problem with common sense is, that it is not so common.

But what if we succeed in this task? Somebody will still have to work with the piece of Software we generated. Somebody will have to use it. So what are the interfaces? How can our software interact? How easy will it be to understand that piece of software?
Will it make up for the time we saved generating the code? What do we have to look out for?

To sum it up: What happens around our scenario? Do we have to address that or can we just ignore how to deal with what the machine is supposed to deliver? I don't know yet.

Questions over questions which I will try to address in the next weeks as well.
Thanks to my friend Georg for this valuable tip.

Monday, October 8, 2007

Annotating text - questions raised

Hey, it's been almost 2 weeks.
But I do have some results for you. They look like that:

"Leaving one’s own king under attack, exposing one’s own king to attack and also ’capturing’ the opponent’s king are not allowed."

This would be the original text from the specification.
In order to transform this into a graph, thematic (theta) roles have to be given to each necessary part of the sentence. The redundant/disposable words are just "sharpened-out" by marking them with a #.

The output of this sentence would be something like this:

[ { [ Leaving|ACT one`s1|POSS #own king|{HAB,STATII} under_attack|STAT, ] , [ exposing|ACT one`s2|POSS #own king|{HAB,STATII} to_attack|STAT ] , [ #and #also capturing|ACT #the opponent`s|POSS king|HAB ] }|MODII #are not_allowed|MOD. ]

one`s1 <= They
one`s2 <= They

This might look a little confusing in the beginning, but it is also quite impressive, how easy it is for us humans, to understand relations and concepts of a sentence. But sitting there and annotating the text by hand quickly shows that many things are processed by our brain implicitly and are actually quite hard to put on paper.

To put it short:
Reading the above sentence does not make you think of possessors, habitums and stati at once, does it? We recognize verbs and nouns, the rest just seems to come "naturally".

This is in my opinion the biggest obstacle when it comes to machine understanding.

Anyway, several question especially concerning reasoning where raised. Those were:

How will we be dealing with numerals after all?
When is a word a numeral, when an article?

e.g.: "one can find ..." or "you can move with only one player"

one == same?

one == 1?

one == one/you?

How can realations in numerals be detected?

e.g.: "The chessboard has 8x8 field. Those 64 fields ..."

What happens to prepositions which seem unnecessary while annotation but actually do or can change the semantics of the sentence?

e.g.: "the near corner square to the right of the player is white"
"to the right of the player" (shows a location) is different from "the right of the player" (could also mean the right in a jurisdictional way)

Difference between verbs and their tense:

e.g.: "checkmate" vs. "checkmated" which mean something different

Well, a lot of new stuff to think about I guess ...

Wednesday, September 26, 2007

ResearchCyc as the ontology of choice

Hi folks,

first of all, I have to say that I won my battle against TeX. The initial setup for the dissertation with which I will start annotating the papers is done.
I wrote short summaries of all papers and marked which parts of the articles I would like to mention to motivate my goals.

Next week we're gonna sit down with a student working on the concept of annotating theta-roles to textual specifications. This is going to be extremely interesting since we do not quite know what to expect.
Many concepts just take place in a human's brain while reading text, but when one actually has to mark the words with the right roles which apply in the given context... well, that seems to be something different and quite strenuous. Let's see what insight we gain during this test. This will most likely affect the direction in which the research in our area will head in the next couple of weeks/months.

Additional to all that, I jumped back on the (Research-)Cyc Ontology for reasoning purposes. It might already help while annotating text. But it could also be very helpful on the transformation side, when the sentence has already being parsed into a graph. Well, I guess we'll find out.

Saturday, September 22, 2007

Thematic roles introduction

As I promised you a couple of days ago - here's the explanation of the graph transforming on thematic roles and how we plan to reason on this models.

Thematic roles - also known as theta-roles - are best described in the following two articles here and here. I do not want to go too deep into explaining that, since the articles are quite voluminous.
If your capable of understanding German, you might even get a more detailed, easier to understand and better covered, please see here and here.

I will add an example of how text is annotated latest on Monday and how this looks like in an easy example.

Other than that, I've been struggling with TeX quite a bit summing up the knowledge of about 55 articles which might be of use for the "state of the art/related work" part of the dissertation.

Well, I'll be back on Monday with the news I promised you. Have a safe weekend.

Tuesday, September 18, 2007

A short explanation of how this is all supposed to work

Hey folks,

for those of you who have followed my website, here an update of what the solution is going to look like.
For those of you who see this for the first time - well, be glad you don't have to witness the omnipotence of change in our business *smile*.

The slide above shows a rough sketch of how the tools are supposed to interconnect, exchange their data and finally lead to the UML/Program code of our choice.
The orange "NLP" box represents all possibilites to process natural language. As I already mentioned, there are many - one of which will be taken into closer consideration for AutoModel. We are still comparing the various methods, trying to find the one that works best for us. That's going to be a student thesis.

At this moment, textual transformation (and therefore understanding) takes place by annotating the thematic/theta-roles in the given text. This is still a process which needs a lot of manual labour, but we are already working on an automatic approach to that.

How thematic roles look like, what this is all about and how we later transform these into graphs (all parts of Tom's work), I will tell you in the next post.

Another thing (and quite annoying I have to admit) I was struggling with today was creating the initial dissertation template in LaTex so that I can already start casting my thoughts on paper.
As probably everybody else on this planet using Windows I use MikTeX and TeXnicCenter to get this job done. I still haven't managed to include my JabRef bib-files as literature-list. Well, I guess Rome wasn't built in a day either.

That's it for tonight.