Sign in to follow this  

Regular expressions to tokenize

This topic is 402 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi everyone.

 

I hope you can give me a hand with this:

 

I have a list of tuples like this:

table = [
("aaa", "a","b", 0),
("aaa", "a", "c", 1),
("aaa", "b", "a", 2),
("aaa", "b", "c", 3),
("aaa", "c", "a", 4),
("aaa", "c", "b", 5),
("aaa", "a", "*", 6),
("aaa", "b", "*", 7),
("aaa", "c", "*", 8),
...
]

More less like that. I't huge.
Now, the * means one digit.
So, as you can see, this table holds some kind of (not so)regular expressions
The idea is that if you write "aaa a b" you get 0. If you write "aaa c 1" you get 8.

 

The program actually works. But i want to change it to use python regular expressions.

I managed to write the regular expressions to match the strings and keep it in groups:

r'(?P<matcha>aaa[\s]+(a|b|c)[\s]+(a|b|c|[\d]))'

This, matches all the tuples in the example table.

My question.

Is there a way to get an specific integer from a match(like the one in the table)

 

Or maybe translate the match into the "table-regular-expression-format".

 

Share this post


Link to post
Share on other sites

I think I'm missing something here.

 

If you want to retrieve the fourth element of the tuple based on the other three, is there any reason for not using a good ol' dictionary and having "aaa a b" as the key and 0 as the value?

 

Also, how is a regular expression that matches all the possible strings you are using going to help to retrieve the number associated with a specific string?

Share this post


Link to post
Share on other sites

Try to avoid regular expressions whenever possible. They are very powerful for what they are designed for (esuring a text matches pattern) but are often overused hurting performance and readability. Your case is not what regex is for. There is no source text and no pattern to match. You will have problems later trying to extend your solution or debugging bizarre edge cases.

Share this post


Link to post
Share on other sites

Try to avoid regular expressions whenever possible.

That's an error in the other direction. Use regex when it's the right tool and avoid it when it's not. This is a case where it's definitely not.

 

I think Avalander is on the right path here with just having an associative container, but I think OP may have a very wrong idea about how regex is typically used.

Share this post


Link to post
Share on other sites

Ok. Thanks for the advice.

I think i will use regular expression for matching some of the "generic characters": numbers to *, etc.

 

I thought i could use re to get the number.

By the way, thanks for helping me see the a dict is far better than a list of tuples.

Share this post


Link to post
Share on other sites

That's an error in the other direction


Yes, but I've just seen it too many times. Dev learns about regex then "Wooaa! Shiny! I can do so many things with that!". And you get abominations like parsing HTML to get page title. Bloated beyond repair to eliminate false positives in headers, comments and js. Thanks, but no thanks :) I would rather err in this direction and use old fashioned search if it's viable and use regex only when I actually gain anything always sacrificing readability.

Share this post


Link to post
Share on other sites

This topic is 402 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this