English questions

Started by
9 comments, last by Kylotan 18 years, 10 months ago
In building alpha (please contact me if you want to join), i've got a few little problems. i need to recognise and differentiate between, statements, questions, and comments. Statements are just that 'the sky is blue' Questions are 'what colour is the sky' comments are 'hi' The reson why, is that for statements, they need to be recorded 'specially' The questions don't need to be remembered, and the comments should be shoved in a seperate 'bin'. Can anyone help me on this? From, Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Advertisement
No punctuation? In that case, you can't distinguish between "The sky is blue." and "The sky is blue?" But, putting that aside... Most questions will feature a 'wh-' word (who, what, where, when, why, or how) followed by a verb, and then a noun phrase describing the object of the question. Some other questions are merely affirmative statements re-ordered ("Was I awake?" contains the same words as "I was awake.") and some are tag-questions (eg. "Kylotan is great, isn't he?") which are affirmative statements with a contradictory tag appended.
I don't know anything about your "Alpha" project, so sorry if this post is irrelevant to it.

If your program will understand the grammatical structure of sentences, the grammar should start with something like

Sentence ::= Statement | Question | Comment

Then you can have something like

Statement ::= NominativeStatement | TransitiveStatement | IntransitiveStatement
NominativeStatement ::= Subject NominativeVerb NominativeObject
Subject ::= NominalGroup
DirectObject ::= NominalGroup | Adjetive
NominalGroup ::= Pronoun | NounAndModifiers
NounAndModifiers ::= Determinant Adjetive* Noun
Question ::= WH_Question | DirectQuestion
DirectQuestion ::= NominativeVerb Subject NominativeObject "?"
...
This grammar would recognize the sentence "The sky is blue" as a Statement and "Is the sky blue?" as a Question. Even if you erase punctuation the grammar can still deduce that it is a question by the order of the words.

Of course this is a very difficult way to process natural language, but I guess if you really want to understand sentences a grammar of this sort is unavoidable. There are statistical approaches to resolving ambiguities, which generally work well, although in some hard cases "true understanding" is required to be able to parse a sentence correctly. Things get even worse when you try to understand what pronouns refer to.
Quote:Original post by Kylotan
No punctuation? In that case, you can't distinguish between "The sky is blue." and "The sky is blue?" But, putting that aside... Most questions will feature a 'wh-' word (who, what, where, when, why, or how) followed by a verb, and then a noun phrase describing the object of the question. Some other questions are merely affirmative statements re-ordered ("Was I awake?" contains the same words as "I was awake.") and some are tag-questions (eg. "Kylotan is great, isn't he?") which are affirmative statements with a contradictory tag appended.


I could use punctuation, but i'm not sure if it would always work. (like sometimes they forget a ? or something).

I would also not like to be constrained to a dictionary. (which of cource, does nothing to help with misspellings.)

Currently, my idea is this.

For questions's,
Check to see if it has the interogitives, or a ?. Or one of a set number of clauses at the end. like "arn't they", ect.

For eg.
"Where[/b are the lollies?"
"the lollies, where are they?"
"Ain't it great?"

For statements,
i get rid of the first and last words.
I then look for words that end in s, es, ies, (not 's), ed, and if it has one, then it is a statement. (finding participles)
I also look for words like 'is' or 'are'.
Or the word 'will' or a full stop.

"C++ is great"
"Bob picks dasies"
"Bob will pick dasies"
"Bob picked dasies""

For comments, there just anything else. (that isn't a statement or question).

The problem is that there might be some things where it buggs up. (hence why theres the punctuation-overrides-otherthings.)

From,
Nice coder

[Edited by - Nice Coder on June 16, 2005 4:34:18 AM]
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Quote:Original post by alvaro
I don't know anything about your "Alpha" project, so sorry if this post is irrelevant to it.

If your program will understand the grammatical structure of sentences, the grammar should start with something like

Sentence ::= Statement | Question | Comment

Then you can have something like

Statement ::= NominativeStatement | TransitiveStatement | IntransitiveStatement
NominativeStatement ::= Subject NominativeVerb NominativeObject
Subject ::= NominalGroup
DirectObject ::= NominalGroup | Adjetive
NominalGroup ::= Pronoun | NounAndModifiers
NounAndModifiers ::= Determinant Adjetive* Noun
Question ::= WH_Question | DirectQuestion
DirectQuestion ::= NominativeVerb Subject NominativeObject "?"
...
This grammar would recognize the sentence "The sky is blue" as a Statement and "Is the sky blue?" as a Question. Even if you erase punctuation the grammar can still deduce that it is a question by the order of the words.

Of course this is a very difficult way to process natural language, but I guess if you really want to understand sentences a grammar of this sort is unavoidable. There are statistical approaches to resolving ambiguities, which generally work well, although in some hard cases "true understanding" is required to be able to parse a sentence correctly. Things get even worse when you try to understand what pronouns refer to.


That would be a very difficult way to parse it. (mainly since i'm trying to be as dictionary-free as possible).

From,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Quote:Original post by Nice Coder
"the lollies, where are they?"


Compare with:
"The children, who are lost."

All I can suggest is that you run it on a lot of test data and look for anomalies. I expect you can get 90% accuracy with what you've got, and 95-97% with a bit of tweaking.

[Edited by - Kylotan on June 17, 2005 6:50:51 AM]
The children, who are lost ... is not a sentence ... but it is part of a sentence that would be a statement ... such as

The children, who are lost, ran around in circles.

also, things like ed, etc aren't always good indicators. Here's some slightly similar example sentences to through at your parser and see how it does.

The children we're worried by your behavior.
This caused the children to become worried about their future.
This caused the children to become worried about their future?
How the children became worried, nobody knows.
I don't know the children became worried.
How did the children become worried?
Who knows how the children became worried?

- Who is running around in circles?
- The children, who are lost.

Well, that might not be a complete sentence, but people don't always speak in complete sentences, especially in response to a question.

"The children we're worried by your behavior."
What does this mean? That made *my* English parser fail.

"This caused the children to become worried about their future?"
I don't know if people actually speak like this, but I was taught that sentence should be:
"Did this cause the children to become worried about their future?"

People do speak that way.. Turn on your TV and look at any investigation type of show, and you'll see it at least three times in one hour.. :p And everyone takes after TV these days.. :p
Quote:Original post by Kylotan
Quote:Original post by Nice Coder
"the lollies, where are they?"


Compare with:
"The children, who are lost."

All I can suggest is that you run it on a lot of test data and look for anomalies. I expect you can get 90% accuracy with what you've got, and 95-95% with a bit of tweaking.


Maybe if i look for "are they"'s, ect. and use those for the questions.

Basically

Who/What/when/where is it/are they/they are/it is

Bob, Who it is?

Would this work better?

(i'm looking at 100% accuricy or as close as i can get).

From,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.

This topic is closed to new replies.

Advertisement