Jump to content
  • Advertisement
Sign in to follow this  
Nice Coder

English questions

This topic is 4871 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In building alpha (please contact me if you want to join), i've got a few little problems. i need to recognise and differentiate between, statements, questions, and comments. Statements are just that 'the sky is blue' Questions are 'what colour is the sky' comments are 'hi' The reson why, is that for statements, they need to be recorded 'specially' The questions don't need to be remembered, and the comments should be shoved in a seperate 'bin'. Can anyone help me on this? From, Nice coder

Share this post


Link to post
Share on other sites
Advertisement
No punctuation? In that case, you can't distinguish between "The sky is blue." and "The sky is blue?" But, putting that aside... Most questions will feature a 'wh-' word (who, what, where, when, why, or how) followed by a verb, and then a noun phrase describing the object of the question. Some other questions are merely affirmative statements re-ordered ("Was I awake?" contains the same words as "I was awake.") and some are tag-questions (eg. "Kylotan is great, isn't he?") which are affirmative statements with a contradictory tag appended.

Share this post


Link to post
Share on other sites
I don't know anything about your "Alpha" project, so sorry if this post is irrelevant to it.

If your program will understand the grammatical structure of sentences, the grammar should start with something like

Sentence ::= Statement | Question | Comment

Then you can have something like

Statement ::= NominativeStatement | TransitiveStatement | IntransitiveStatement
NominativeStatement ::= Subject NominativeVerb NominativeObject
Subject ::= NominalGroup
DirectObject ::= NominalGroup | Adjetive
NominalGroup ::= Pronoun | NounAndModifiers
NounAndModifiers ::= Determinant Adjetive* Noun
Question ::= WH_Question | DirectQuestion
DirectQuestion ::= NominativeVerb Subject NominativeObject "?"
...
This grammar would recognize the sentence "The sky is blue" as a Statement and "Is the sky blue?" as a Question. Even if you erase punctuation the grammar can still deduce that it is a question by the order of the words.

Of course this is a very difficult way to process natural language, but I guess if you really want to understand sentences a grammar of this sort is unavoidable. There are statistical approaches to resolving ambiguities, which generally work well, although in some hard cases "true understanding" is required to be able to parse a sentence correctly. Things get even worse when you try to understand what pronouns refer to.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kylotan
No punctuation? In that case, you can't distinguish between "The sky is blue." and "The sky is blue?" But, putting that aside... Most questions will feature a 'wh-' word (who, what, where, when, why, or how) followed by a verb, and then a noun phrase describing the object of the question. Some other questions are merely affirmative statements re-ordered ("Was I awake?" contains the same words as "I was awake.") and some are tag-questions (eg. "Kylotan is great, isn't he?") which are affirmative statements with a contradictory tag appended.


I could use punctuation, but i'm not sure if it would always work. (like sometimes they forget a ? or something).

I would also not like to be constrained to a dictionary. (which of cource, does nothing to help with misspellings.)

Currently, my idea is this.

For questions's,
Check to see if it has the interogitives, or a ?. Or one of a set number of clauses at the end. like "arn't they", ect.

For eg.
"Where[/b are the lollies?"
"the lollies, where are they?"
"Ain't it great?"

For statements,
i get rid of the first and last words.
I then look for words that end in s, es, ies, (not 's), ed, and if it has one, then it is a statement. (finding participles)
I also look for words like 'is' or 'are'.
Or the word 'will' or a full stop.

"C++ is great"
"Bob picks dasies"
"Bob will pick dasies"
"Bob picked dasies""

For comments, there just anything else. (that isn't a statement or question).

The problem is that there might be some things where it buggs up. (hence why theres the punctuation-overrides-otherthings.)

From,
Nice coder

[Edited by - Nice Coder on June 16, 2005 4:34:18 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by alvaro
I don't know anything about your "Alpha" project, so sorry if this post is irrelevant to it.

If your program will understand the grammatical structure of sentences, the grammar should start with something like

Sentence ::= Statement | Question | Comment

Then you can have something like

Statement ::= NominativeStatement | TransitiveStatement | IntransitiveStatement
NominativeStatement ::= Subject NominativeVerb NominativeObject
Subject ::= NominalGroup
DirectObject ::= NominalGroup | Adjetive
NominalGroup ::= Pronoun | NounAndModifiers
NounAndModifiers ::= Determinant Adjetive* Noun
Question ::= WH_Question | DirectQuestion
DirectQuestion ::= NominativeVerb Subject NominativeObject "?"
...
This grammar would recognize the sentence "The sky is blue" as a Statement and "Is the sky blue?" as a Question. Even if you erase punctuation the grammar can still deduce that it is a question by the order of the words.

Of course this is a very difficult way to process natural language, but I guess if you really want to understand sentences a grammar of this sort is unavoidable. There are statistical approaches to resolving ambiguities, which generally work well, although in some hard cases "true understanding" is required to be able to parse a sentence correctly. Things get even worse when you try to understand what pronouns refer to.


That would be a very difficult way to parse it. (mainly since i'm trying to be as dictionary-free as possible).

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
"the lollies, where are they?"


Compare with:
"The children, who are lost."

All I can suggest is that you run it on a lot of test data and look for anomalies. I expect you can get 90% accuracy with what you've got, and 95-97% with a bit of tweaking.

[Edited by - Kylotan on June 17, 2005 6:50:51 AM]

Share this post


Link to post
Share on other sites
The children, who are lost ... is not a sentence ... but it is part of a sentence that would be a statement ... such as

The children, who are lost, ran around in circles.

also, things like ed, etc aren't always good indicators. Here's some slightly similar example sentences to through at your parser and see how it does.

The children we're worried by your behavior.
This caused the children to become worried about their future.
This caused the children to become worried about their future?
How the children became worried, nobody knows.
I don't know the children became worried.
How did the children become worried?
Who knows how the children became worried?

Share this post


Link to post
Share on other sites
- Who is running around in circles?
- The children, who are lost.

Well, that might not be a complete sentence, but people don't always speak in complete sentences, especially in response to a question.

"The children we're worried by your behavior."
What does this mean? That made *my* English parser fail.

"This caused the children to become worried about their future?"
I don't know if people actually speak like this, but I was taught that sentence should be:
"Did this cause the children to become worried about their future?"

Share this post


Link to post
Share on other sites
People do speak that way.. Turn on your TV and look at any investigation type of show, and you'll see it at least three times in one hour.. :p And everyone takes after TV these days.. :p

Share this post


Link to post
Share on other sites
Quote:
Original post by Kylotan
Quote:
Original post by Nice Coder
"the lollies, where are they?"


Compare with:
"The children, who are lost."

All I can suggest is that you run it on a lot of test data and look for anomalies. I expect you can get 90% accuracy with what you've got, and 95-95% with a bit of tweaking.


Maybe if i look for "are they"'s, ect. and use those for the questions.

Basically

Who/What/when/where is it/are they/they are/it is

Bob, Who it is?

Would this work better?

(i'm looking at 100% accuricy or as close as i can get).

From,
Nice coder

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!