AutoModel - Processing natural language: September 2007

Wednesday, September 26, 2007

ResearchCyc as the ontology of choice

Hi folks,

first of all, I have to say that I won my battle against TeX. The initial setup for the dissertation with which I will start annotating the papers is done.
I wrote short summaries of all papers and marked which parts of the articles I would like to mention to motivate my goals.

Next week we're gonna sit down with a student working on the concept of annotating theta-roles to textual specifications. This is going to be extremely interesting since we do not quite know what to expect.
Many concepts just take place in a human's brain while reading text, but when one actually has to mark the words with the right roles which apply in the given context... well, that seems to be something different and quite strenuous. Let's see what insight we gain during this test. This will most likely affect the direction in which the research in our area will head in the next couple of weeks/months.

Additional to all that, I jumped back on the (Research-)Cyc Ontology for reasoning purposes. It might already help while annotating text. But it could also be very helpful on the transformation side, when the sentence has already being parsed into a graph. Well, I guess we'll find out.

Saturday, September 22, 2007

Thematic roles introduction

As I promised you a couple of days ago - here's the explanation of the graph transforming on thematic roles and how we plan to reason on this models.

Thematic roles - also known as theta-roles - are best described in the following two articles here and here. I do not want to go too deep into explaining that, since the articles are quite voluminous.
If your capable of understanding German, you might even get a more detailed, easier to understand and better covered, please see here and here.

I will add an example of how text is annotated latest on Monday and how this looks like in an easy example.

Other than that, I've been struggling with TeX quite a bit summing up the knowledge of about 55 articles which might be of use for the "state of the art/related work" part of the dissertation.

Well, I'll be back on Monday with the news I promised you. Have a safe weekend.

Tuesday, September 18, 2007

A short explanation of how this is all supposed to work

Hey folks,

for those of you who have followed my website, here an update of what the solution is going to look like.
For those of you who see this for the first time - well, be glad you don't have to witness the omnipotence of change in our business *smile*.

The slide above shows a rough sketch of how the tools are supposed to interconnect, exchange their data and finally lead to the UML/Program code of our choice.
The orange "NLP" box represents all possibilites to process natural language. As I already mentioned, there are many - one of which will be taken into closer consideration for AutoModel. We are still comparing the various methods, trying to find the one that works best for us. That's going to be a student thesis.

At this moment, textual transformation (and therefore understanding) takes place by annotating the thematic/theta-roles in the given text. This is still a process which needs a lot of manual labour, but we are already working on an automatic approach to that.

How thematic roles look like, what this is all about and how we later transform these into graphs (all parts of Tom's work), I will tell you in the next post.

Another thing (and quite annoying I have to admit) I was struggling with today was creating the initial dissertation template in LaTex so that I can already start casting my thoughts on paper.
As probably everybody else on this planet using Windows I use MikTeX and TeXnicCenter to get this job done. I still haven't managed to include my JabRef bib-files as literature-list. Well, I guess Rome wasn't built in a day either.

That's it for tonight.

It's papers, papers and guess what? - Papers.

Hey, haven't written much in the last 5 days.
They were all about reading papers concerning NLP.

Next thing - I will tell you about our approach and what the difference to others is so far (Thematic roles vs. well-known NLP approaches).
After that I might just explain our graph theory approach and where the challenge lies.

The other questions are:

What's the state of the art?
Where are the gaps?
Which are the big questions?
Where do we have to narrow our field (since we do not claim to have a solution which is universal)?

All these questions have to be cumulated and casted into papers. And once you have a bunch of papers you align 'em and make your dissertation out of those.
Well, it's note quite as easy, but that is the approximate approach.

More infos to come these days. Have a good one.

Tuesday, September 11, 2007

Mindmapping the papers, ideas for an article

I spent the last two days gaining an overview of another 10 or so papers about commonsense reasoning on natural language and processing of natural language and possible user interaction.
Quite a wide field.
I also talked to my colleague Tom and we have several approaches which we need to address in the next couple of weeks:

First of all we found the perfect set of word / text which we would like to interpret. It's a strict rule, it does make sense, everybody know it and can relate to it and we will not have to bother too much with ambiguities and weird word usages.
What am I talking about? Well, I am talking about the official rules of chess.
We will come up with an article concerning the state-of-the-art of processing natural language and converting into program code or anything similar.
We will then add our thoughts and extensions which we mean to introduce in the next couple of months to round up the complete picture

So far, there are many approaches of dealing with natural language.
One uses the semantics of the English language and transfers those into programmatic semantics. Others rely only controlled languages and specified domains. Some out there intertwine their concepts with MDA and some started to reason/infer on natural sentences.

All of these ideas bring us closer to what we want, but the complete picture is now clearer:
We want give the progam the chess-specification and we shall receive a UML-model out of which code could be generated that can actually "play chess".

Tom's approach with graphs (I will explain that in a later post) abstracts from many other solutions because it initially relies on thematic roles. From then on, it's all graph transformations including reasoning. The latter one will be my part. No need of specified objects, etc. is necessary after the initial prose has been annotated.

The disadvantage of many approaches so far is, that they mostly rely on the specifics of the English language. We understand that this whole concept has to work with any language out there. Or at least a great deal of them.

The steps to be fulfilled and realized therefore are:

Annotation:
(Half-)Automatically annotate the initial text with its thematic roles
Processing
Process the annotated text and create an inital graph
Use graph-transformation to create an initial UML-model
Reasoning
Use reasoning to get rid of ambiguities or double-fetched objects which belong together.
Also use reasoning to split obvious "objects" into other objects with certain properties, e.g. "The cold bottle" could be one object. But what if a "warm bottle" comes around the corner later? Is this a new object, or do you just have an object "bottle" which has the property "temperature" with its possible values "cold, hot"?
Good question, huh? Well, we'll try to do the latter one - it just makes more sense.
Reasoning will supposedly also take place by just having graph-transformations done.
Processing
Process the results of reasoning again and create the new UML-model.
Then transfer this model into code using any of the popular method to create code from UML.

That's it for today - more at the end of the week. I still have loads of papers in front of me which I have to read ...

Monday, September 10, 2007

Finally back from my trip to Australia/USA

Alright.
I am finally back from my 7 week trip which led me all the way up and down the Australian east coast and the outback. After another 9 day stopover in California, I eventually arrived back here in "kind-a-cold" but good old Germany.
This week I will try to get myself back up to speed with what I've been dealing before I left 2 months ago.
So far I need to have another look at Natural Language Interpretations and Representations from other colleagues from around the world. Some medical informatics guys already achieved quite a lot when dealing with natural language reports on patients (see here, here and here).

I also have to meet up with my friend and co-worker Tom to get our target aligned again. This will include involving, teaching and mentoring some graduate students which want to support us in our work. Well, I'll give you an update as soon as I know more.

AutoModel - Processing natural language