Given the URL: http://kraid/folder/someotherfolder/Guide/2562.html
[^/]*$ this will find the file name and extenstion.
\.[^\.]*$ this will find the extension and the period.
I could use both of these and it would probably work, but all I really need is the "2562" of the html. I cannot match against the rest of the url (hard-code it) because I need to reuse this for multiple directories.
It would be great if I could match the inverse of [^/]*$ and replace it with "" because then I could just match against the extension and have my magic number.
I don't believe this is too difficult to solve, it's been hard to find a solid resource that provides examples...
Parsing a file name from URL with regex
Hi folks, I'm working on something where I have a list of URLs and need to extract the file name without the extension. I am not experienced with regular expressions enough to get just what I need, but here is what I have so far:
It's kind of sad. I could easily do it with c-string functions.
You just made me want to know regexp a bit better.
(sorry, if couldn't help you - I am trying atm :P)
€:
I cannot do it in one step, but
[0123456789] gets you all numbers - in vim.
It could also be [0123456789]*.
If you use this as the second regexp, it should work - given there is no number in the extension.
[Edited by - hydroo on May 15, 2008 1:48:12 PM]
You just made me want to know regexp a bit better.
(sorry, if couldn't help you - I am trying atm :P)
€:
I cannot do it in one step, but
[0123456789] gets you all numbers - in vim.
It could also be [0123456789]*.
If you use this as the second regexp, it should work - given there is no number in the extension.
[Edited by - hydroo on May 15, 2008 1:48:12 PM]
Another way to look at it is I need to match the section between the last occurrence of "/" and the last occurrence of ".". The problem is that they are matched from left to right, so it ends up matching the first "/" and getting everything up to the extension.
I don't know a lot about regex, too, but this might help you:
www.regular-expressions.info
PCRE Workbench
If you have a library that enables you to get not only the matching text but also the so-called "capture groups", you can use the following regex (might be pretty dumb ^^) and take the capture group #1, it results in "2562".
The only capture group in this regex is the part which stands in parentheses!
www.regular-expressions.info
PCRE Workbench
If you have a library that enables you to get not only the matching text but also the so-called "capture groups", you can use the following regex (might be pretty dumb ^^) and take the capture group #1, it results in "2562".
.*/([^./]*)\.*
The only capture group in this regex is the part which stands in parentheses!
(?<magic>[^/]+)\..+$
I think that will capture what you want it in the "magic" group.
That's .NET Regex syntax, not sure what other regex's use for capturing groups.
I think that will capture what you want it in the "magic" group.
That's .NET Regex syntax, not sure what other regex's use for capturing groups.
I love regex, honestly, that's just so much fun! :)
Yeah for your problem, you may need to use capture groups which are really useful and powerful.
Example:
Applied to your URL, and using the following string for replacement:
The result:
Path: folder/someotherfolder/Guide/ | File: 2562 | Extension: html
So \1 gives what has been captured by the first pair of parenthesis, \2 by the second one, etc.
Applied to:
http://mysite.com/path/to/the/target.directory/test/the.result.html
(Watch out these dots inside the URL!)
Result:
Path: path/to/the/target.directory/test/ | File: the.result | Extension: html
[Edited by - Splo on May 16, 2008 3:24:04 AM]
Yeah for your problem, you may need to use capture groups which are really useful and powerful.
Example:
http://[^/]+/(.+/)([^/]+)\.(.+)$
Applied to your URL, and using the following string for replacement:
Path: \1 | File: \2 | Extension: \3
The result:
Path: folder/someotherfolder/Guide/ | File: 2562 | Extension: html
So \1 gives what has been captured by the first pair of parenthesis, \2 by the second one, etc.
Applied to:
http://mysite.com/path/to/the/target.directory/test/the.result.html
(Watch out these dots inside the URL!)
Result:
Path: path/to/the/target.directory/test/ | File: the.result | Extension: html
Quote:Original post by hydroo[0-9] is faster to type than [0123456789]. :)
I cannot do it in one step, but
[0123456789] gets you all numbers - in vim.
It could also be [0123456789]*.
If you use this as the second regexp, it should work - given there is no number in the extension.
[Edited by - Splo on May 16, 2008 3:24:04 AM]
Would a greedy dot solve the leading slash problem?
I've only really used a lot of Regex's in .Net, so if this isn't the same format as what you need, I apologize. Also, I'm not sure if the backslash is required inside a character selector to escape a dot, either.
.*/(?<capture>[^\\.]+)
I've only really used a lot of Regex's in .Net, so if this isn't the same format as what you need, I apologize. Also, I'm not sure if the backslash is required inside a character selector to escape a dot, either.
.*/(?<capture>[^\\.]+)
Here's a simpler version that takes care about filenames with dots before the extension (such as "foo.bar.html"):
.*/([^/]+)\.[^\.]+$
.*/([^/]+)\.[^\.]+$
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement