# Parsing a file name from URL with regex

This topic is 3720 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi folks, I'm working on something where I have a list of URLs and need to extract the file name without the extension. I am not experienced with regular expressions enough to get just what I need, but here is what I have so far:
Given the URL: http://kraid/folder/someotherfolder/Guide/2562.html

[^/]*$this will find the file name and extenstion. \.[^\.]*$   this will find the extension and the period.

##### Share on other sites

www.regular-expressions.info
PCRE Workbench

If you have a library that enables you to get not only the matching text but also the so-called "capture groups", you can use the following regex (might be pretty dumb ^^) and take the capture group #1, it results in "2562".

.*/([^./]*)\.*

The only capture group in this regex is the part which stands in parentheses!

(?<magic>[^/]+)\..+$I think that will capture what you want it in the "magic" group. That's .NET Regex syntax, not sure what other regex's use for capturing groups. #### Share this post ##### Link to post ##### Share on other sites I love regex, honestly, that's just so much fun! :) Yeah for your problem, you may need to use capture groups which are really useful and powerful. Example: http://[^/]+/(.+/)([^/]+)\.(.+)$

Applied to your URL, and using the following string for replacement:
Path: \1 | File: \2 | Extension: \3

The result:
Path: folder/someotherfolder/Guide/ | File: 2562 | Extension: html
So \1 gives what has been captured by the first pair of parenthesis, \2 by the second one, etc.

Applied to:
http://mysite.com/path/to/the/target.directory/test/the.result.html
(Watch out these dots inside the URL!)
Result:
Path: path/to/the/target.directory/test/ | File: the.result | Extension: html

Quote:
 Original post by hydrooI cannot do it in one step, but[0123456789] gets you all numbers - in vim.It could also be [0123456789]*.If you use this as the second regexp, it should work - given there is no number in the extension.
[0-9] is faster to type than [0123456789]. :)

[Edited by - Splo on May 16, 2008 3:24:04 AM]

##### Share on other sites
Would a greedy dot solve the leading slash problem?

I've only really used a lot of Regex's in .Net, so if this isn't the same format as what you need, I apologize. Also, I'm not sure if the backslash is required inside a character selector to escape a dot, either.

.*/(?<capture>[^\\.]+)

##### Share on other sites
Here's a simpler version that takes care about filenames with dots before the extension (such as "foo.bar.html"):
.*/([^/]+)\.[^\.]+\$

1. 1
2. 2
Rutin
19
3. 3
4. 4
5. 5

• 14
• 30
• 13
• 11
• 11
• ### Forum Statistics

• Total Topics
631782
• Total Posts
3002325
×