Jump to content
  • Advertisement
Sign in to follow this  
etothex

non-regular languages

This topic is 5467 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Well, this isn't homework (not directly at least, the homeworks in the future may have topics related to this) but I do have a computer sciencey type question. The problem is my professor and TA disagree on the answer, so while they research it I thought I'd ask here. The question is, given any two languages L1 and L2 that are subsets of sigma*, is the intersection of L1 and L2 closed? That is, is the intersection of L1 and L2 a language? I understand it's closed for regular languages, but is it closed for all languages, regular and non-regular? My inclination is to believe my professor, who says that it's not guaranteed to be a language, though my TA makes a good logical point that since all subsets of sigma* are languages, then the intersection would have to be a language, if you include the empty language. So anyone have any insight?

Share this post


Link to post
Share on other sites
Advertisement
This is not necessarily true for any language. Consider context-free languages. Let L1 = {a^nb^nc^m | m, n >= 0} and L2 = {a^mb^nc^n | m, n >= 0}. These are context-free languages. The first has the same number of a's and b's, the second the same number of b's and c's. Then the intersection is L3 = {a^nb^nc^n | n >= 0}.

L3 is a classic example of a non-context free language. Thus, context-free languages are not closed under intersection.

Magius

Share this post


Link to post
Share on other sites
It will be a language (a subset of Σ*), though, as Magius pointed out, it may not be of the same class as L1 ∩ L2.

Share this post


Link to post
Share on other sites
Quote:
Original post by Magius
This is not necessarily true for any language. Consider context-free languages. Let L1 = {a^nb^nc^m | m, n >= 0} and L2 = {a^mb^nc^n | m, n >= 0}. These are context-free languages. The first has the same number of a's and b's, the second the same number of b's and c's. Then the intersection is L3 = {a^nb^nc^n | n >= 0}.

L3 is a classic example of a non-context free language. Thus, context-free languages are not closed under intersection.

Magius


Ah, I see. There's another level though, the way I understood the question was, is the intersection of two languages(any two kinds) always a language(of any kind, the same or different as L1 and L2)?

EDIT: thanks Fruny, I started writing before you posted. :)

Share this post


Link to post
Share on other sites
I have to say that I am totally confused, :). I'll allow the thread to stay open, since it appears to be related to logic, which has a place in gaming, even though it is school related. (Although if it is logic, then perhaps the AI forum would be a better place.)

I've never taken a logic course. Would someone mind giving me a brief overview, e.g., what is the definition of a language (regular, non-regular, context-free, etc.) in the context of this thread?

Share this post


Link to post
Share on other sites
Quote:
Original post by grhodes_at_work
I've never taken a logic course. Would someone mind giving me a brief overview, e.g., what is the definition of a language (regular, non-regular, context-free, etc.) in the context of this thread?


A language is a set of strings.
Σ is a set of symbol - your alphabet
* is Kleene's closure, thus Σ* is the set of all finite strings you can make with Σ

Regular, context-free, context-sensitive, unbounded languages are classes from the Chomsky Hierarchy of languages

A regular language is a language that can be recognized by a finite state machine - represented by a regular expression.
A context-free language is a language that can be recognized by a push-down automaton (finite state machine + stack).
A context-sensitive language is a language that can be recognized by a linear bounded automaton (FSM + finite list)
An unrestricted language is a language that can be recognized by a Turing Machine (FSM + semi-infinite tape)

Share this post


Link to post
Share on other sites
Political Flamebait Note for people who didn't catch the name dropping: right wing blowhards get busted for drug problems. Left wing blowhards (like Noam Chomsky) are at the bleeding edge of linguisting sciences.

Share this post


Link to post
Share on other sites
We start with an alphabet, which is called Σ ("sigma"). That's just a set of symbols (think letters).

A word is a finite sequence of letters. Any length is fine, including 0 (the empty word). We refer to the set of words as Σ*.

A language is a subset of Σ*.

Some examples of languages, using lowercase letters as the alphabet:
(1) The empty language.
(2) The language whose only word is "gamedev".
(3) The words that end in "z" (like "powjoeifjleinwukrjogijrz").
(4) The words formed with "a" and "b" only.
(5) The words that have the form "abababab...ab".

There are operations that build languages from other languages. If A and B are languages:
- A* is the language of words that are concatenations of any number of words from A. For instance, if A={"a","b"}, A* is the set of words formed with "a" and "b" only. This is consistent with calling the set of all words Σ*.
- A ∪ B, also refered to as A | B, is the union of words in A and B. You can think of it as "a word from either A or B".
- AB is the concatenation of a word from A and a word from B.

If we use a single letter as the language that consist of only one word, which is a word of length one whose only letter is the one that gives the name to the language, then we can express language (2) as `gamedev' (concatenation of the seven languages consisting of individual letters), language (3) as `Σ*z' or `(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)*z, language (4) as `(a|b)*' and language (5) as `(ab)*'.

All languages that can be built from individual letters using the operators repetition (*), union (|) and concatenation are called regular languages.

A more general way to describe a language is through a grammar, which is a set of generating rules that allow replacing symbols with other symbols. The alphabet is extended with abstract symbols, and the language generated by the grammar is the set of words generated by the rules consisting only of non-abstract symbols. Some examples (capital letters represent abstract symbols):

(6) rules:
S -> aSb
S -> c

(7) rules:
S -> TaS
S -> T
T -> b
T -> c

Let's see what languages 6 and 7 represent. We start with "S" and apply one of the rules. For instance, in language 6, we can apply the second rule and get "c". "c" consists only of non-abstract symbols, so it is a word of the language. We start again with "S" and apply the first rule. We get "aSb". We apply the first rule again to the `S' found in the middle, and we get "aaSbb". Now we apply the second rule to that `S' in the middle and we get "aacbb", which is another word in the language. We can easily see that language (6) consists of words that are "aaa...acbbb...b", with the same number of `a's and `b's.

In language (7), we can start with "S", apply the first rule, getting "TaS". The we apply the first rule again to that `S', and we get "TaTaS". Now we apply the second rule and we get "TaTaT". We apply the third rule to the first `T', getting "baTaT". Then the fourth rule to the twice to get "bacac". You can see that language (7) consists of words that have an odd length, `b' or `c' in the odd-order places and `a' in the even-order places. We could have expressed this as the regular language `((b|c)a)*(b|c)'.

Some languages can be described by a machine that determines whether a word is in the language or not. For different types of machines we get different classes of languages. For instance, the languages that are recognized by finite state machines are exactly the regular languages (this is a theorem). The languages that are recognized by a stack machine are the context independant languages (I think, I studied this too long ago).

This is a very quick overview of what I remember from my class on formal languages in college. I hope it helps Graham understand what we are talking about.

Share this post


Link to post
Share on other sites
Quote:
Original post by grhodes_at_work
I have to say that I am totally confused, :). I'll allow the thread to stay open, since it appears to be related to logic, which has a place in gaming, even though it is school related. (Although if it is logic, then perhaps the AI forum would be a better place.)

I've never taken a logic course. Would someone mind giving me a brief overview, e.g., what is the definition of a language (regular, non-regular, context-free, etc.) in the context of this thread?


Languages are technically mathematics, as they are completely mathematical constructs which are closely related to(but not the same as) set theory.

However, languages and the associated automata are pretty much used only in computer science, so it's only taught to CS majors. Languages and automata, combined with grammars, etc., are useful in compiler theory, since all possible programming languages are (essentially) context-free languages.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!