# Regular expressions to tokenize

This topic is 689 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi everyone.

I hope you can give me a hand with this:

I have a list of tuples like this:

table = [
("aaa", "a","b", 0),
("aaa", "a", "c", 1),
("aaa", "b", "a", 2),
("aaa", "b", "c", 3),
("aaa", "c", "a", 4),
("aaa", "c", "b", 5),
("aaa", "a", "*", 6),
("aaa", "b", "*", 7),
("aaa", "c", "*", 8),
...
]


More less like that. I't huge.
Now, the * means one digit.
So, as you can see, this table holds some kind of (not so)regular expressions
The idea is that if you write "aaa a b" you get 0. If you write "aaa c 1" you get 8.

The program actually works. But i want to change it to use python regular expressions.

I managed to write the regular expressions to match the strings and keep it in groups:

r'(?P<matcha>aaa[\s]+(a|b|c)[\s]+(a|b|c|[\d]))'


This, matches all the tuples in the example table.

My question.

Is there a way to get an specific integer from a match(like the one in the table)

Or maybe translate the match into the "table-regular-expression-format".

##### Share on other sites
It looks like some sort of assignment, but why not throw a PLY scanner at it?
That does all the hard work for you.

Otherwise, I am not quite convinced that a RE is a good solution for sequence recognition when you're in a hurry.

##### Share on other sites

I think I'm missing something here.

If you want to retrieve the fourth element of the tuple based on the other three, is there any reason for not using a good ol' dictionary and having "aaa a b" as the key and 0 as the value?

Also, how is a regular expression that matches all the possible strings you are using going to help to retrieve the number associated with a specific string?

##### Share on other sites

Try to avoid regular expressions whenever possible. They are very powerful for what they are designed for (esuring a text matches pattern) but are often overused hurting performance and readability. Your case is not what regex is for. There is no source text and no pattern to match. You will have problems later trying to extend your solution or debugging bizarre edge cases.

##### Share on other sites

Try to avoid regular expressions whenever possible.

That's an error in the other direction. Use regex when it's the right tool and avoid it when it's not. This is a case where it's definitely not.

I think Avalander is on the right path here with just having an associative container, but I think OP may have a very wrong idea about how regex is typically used.

##### Share on other sites

Ok. Thanks for the advice.

I think i will use regular expression for matching some of the "generic characters": numbers to *, etc.

I thought i could use re to get the number.

By the way, thanks for helping me see the a dict is far better than a list of tuples.

##### Share on other sites

That's an error in the other direction

Yes, but I've just seen it too many times. Dev learns about regex then "Wooaa! Shiny! I can do so many things with that!". And you get abominations like parsing HTML to get page title. Bloated beyond repair to eliminate false positives in headers, comments and js. Thanks, but no thanks :) I would rather err in this direction and use old fashioned search if it's viable and use regex only when I actually gain anything always sacrificing readability.

1. 1
2. 2
3. 3
Rutin
18
4. 4
JoeJ
14
5. 5

• 14
• 10
• 23
• 9
• 47
• ### Forum Statistics

• Total Topics
632636
• Total Posts
3007574
• ### Who's Online (See full list)

There are no registered users currently online

×