AutoModel - Processing natural language

Saturday, March 1, 2008

Orchestration of Concepts

The paper we need to get ready is finally out of its initial stages.
We addressed the related work and most importantly the reasoning part so far. Latter of which still has to be written down for the paper in TeX, but that should not take longer than a day.
I have focused on dealing with RCyc in the last 4 days and have to say that I personally think I am pretty savvy with the system(s) now.
The only thing I haven't gotten up and working so far is the newest Cyc-Version (KB 7130, that is) which still throws numerous exceptions when using it. I assume there's something wrong with the Cyc-World (Ontology).

Next steps are to list the results and progress in a table which should then be sufficiently described for the paper. At the moment we plan submitting the paper on march, 12th. That's still some time to go and I think we can make the deadline.

As of now, I just found out that the ICSC conference extended its submission deadline to march, 16th. So we might even want to chose which conference to deliver the paper to. We'll see.

Just to show you how the information gathering during the model creation process works, here two illustrations that should explain it.

The text (text.txt) is annotated manually. The annotated text (now text.sale) is then automatically transformed to a GrGen Graph-Rewriting Script (text.grs).
Having provided the Graph-Model for the specific domain - in our example UML - and using the GrShell to create the initial graph.
To give you a glimpse of what this looks like, check out the next picture which shows exemplary data in the various files.

And that's it for now. I'll come back to this later. I'll explain the whole orchestration in more detail once we have the paper for the conference.

Wednesday, February 13, 2008

Getting GrGen to dump data

Worked on the GrGen-Graph-Models today.
So far I've started to get familiar with the procedures necessary to work the tool. And yes, I have to admit, it is fast.
I already know which data to fish from the SALE-Models we already pushed into GrGen. The problem is, that GrGen is not behaving deterministically in any way. I guess I'll have to ring the maintainer of the project tomorrow to find how this could be resolved.
If I can answer all those questions by tomorrow evening, I see a very good chance that the paper for the ICSC could become a success. I am sure we have to start writing on it at the end of this week, otherwise, the night-shift might kill us (again).

So long, I am out of here for today, gotta be back early tomorrow morning.

Tuesday, February 5, 2008

Using reasoning to disambiguate natural language

Natural language always is ambiguous. There is no denying the fact.
Now when transforming specifications written in natural language into UML, one might want to make sure, that misinterpretations and misunderstandings are as seldom as possible.
But what if the words "user" and "client" are used to describe the same object? How might a electronic system grasp the similarity and understand? How am I telling the system to combine the actions (or methods if you want to call it so) and properties into one object (or class, if we stayed in OO-Code-speak).
Using an underlying ontology we will try to show that specifications can be made better, proper and severely more elaborated with the "insight" of common sense.
The first approach on this work has to be finished in the next 4 weeks since we do want to publish this on the ICSC conference 2008. Once again - no time to lean back. Full steam ahead!
For this week I will try to get the necessary information from the system.
Next weeks job will be to integrate the APIs so that the graph modeling framework GrGen we're using to transform annotated text into UML will be able to take advantage of the common sense we revived.
As always - it stays exciting!

Tuesday, January 29, 2008

Done with paper - next paper

Hi Folks,

we finally survived last weeks deadline delivering the paper (as to be expected!) right on time. Sure we had to sneak in some night shifts which lead to approximately 4h sleep/day. But hey - the weekend was close and as I already mentioned. We made it. Let's see what the committee has to say about our approach.
I leave you the paper for download here.
As you can see, it was a co-operation of a couple of people working here. Mathias being the one who's study thesis is closely attached to this article, Tom and me being the one supervising Mathias and adding up to the work necessary for this paper and finally Andreas - the man for the statistics. Him being an empirical researcher for Software Engineering lead us to the conclusion, that it might be best to consult him when it comes to making statistically relevant statements.

But as the title of this post shows, we're heading right to the next conference paper which will be due in mid-march. We'll try to put the plan on the rails this week and then go from there. I hope we can get the train up to speed by then and deliver this paper about reasoning on annotated English natural language. There are several issues to address, a number of which I have to focus on for the next paper.
I am excited about where this might lead us.

Saturday, January 19, 2008

Preparing a paper for MiSE 2008, Leipzig

Hi Folks,

it's been quite a while since my last post.
I have put most of my work force into http://searchingforsense.blogspot.com/ which is not a public blog.
At the moment we are working on a paper for the MiSE in Leipzig this May. Submission date is Jan, 24th and therefore there's still quite some stuff to do.
Until this date there won't be much happening here.
But the week after next week should be the big comeback of this blog. I am revisiting the mind maps and layouts for the research project and adopting newly gained knowledge to the concepts. Let's see what we find out and which direction we'll be heading at from next week on.

I'll tell you more soon.

Thursday, November 29, 2007

Another week - another success

Finally got Cyc to work just like I wanted it to!
I contacted Larry from Cyc and he was able to give me a version which did actually work the way I wanted it to.
Now I am back on track.
Did lots of jobs for the "Searching for Sense" blog which is private, so you won't be able to see those results.

Matter of fact I will now check my reasoning ideas in the live environment of my Cyc-world.
The game of Ludo will serve as template for eventual problems.

But first I will have to look into yet some other papers to get this straight.

For those who've been following my blog:
Annotation of text is necessary to "transfer" text into something the computer can process. This process does not work that well fully automated. Parsers cannot cope with semantics and therefore make many mistakes.
My goal is to minimize those mistakes by using knowledge/reasoning on the texts that are processes. So far the average detection rate is close to 30%.
If my reasoning can lift this rate up to 50% or higher - my goal would be reached.

Human beings tend to regularly hit the 99-100% mark, by the way.
But then they don't precisely evaluate a 200 pages document in 15mins, but more 5 weeks.

Have a great weekend.

Wednesday, November 21, 2007

Damn I was busy ...

Hey folks,

sorry for not having posted in almost 2 weeks.
I was busy doing stuff which does not directly concern AutoModel.

I finally managed to get Cyc back on track, but my machine is going to kill me. It just takes forever to process. I do need more RAM. And now comes the fun part: My 10months old machine can only handle 2GB of RAM. Well, that is everything but not sufficient for what ResearchCyc demands form my main memory. Too bad.
The 8 core servers in our institute are not useful since Cyc is not running properly under win2k3 or Ubuntu. It does need Win2k/XP or Suse9.x
Again - too bad.

I have updated my mindmap and added some words to my diss.

As for what I am working on in the next few days, that would be:

gaining deeper insight in handling inferences with Cyc
using the Cyc Java API to handle requests and publish those in a very rudimentary way
find out which queries/request would be most helpful while annotating the text.

That should be it for today.