Sign in to follow this  

[Python] string parsing, locating specific chars

This topic is 3876 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

In python, how can I get the position of every character x in a string? string.find and string.index seem to find the first instance, I want a list of all of them. For example, given the string "Hello World" and the character 'l', I would want [3,4,10]. Are there any built in functions that can do this? Or a simple way?

Share this post


Link to post
Share on other sites
Nevermind, my god I love python! Sorry for even asking...


def contains(string,char):
list = []
for i in range(0,len(string)):
if string[i] == char:
list = list + [i]
return list



Share this post


Link to post
Share on other sites
A more idiomatic (and shorter!) way would be using a list comprehension:

def indices(string, char) :
return [ i for i in range(0, len(string)) if string[i] == char ]



EDIT: Another one, just for fun:

def indices(string, char) :
return filter(lambda i: string[i] == char, range(0, len(string)))

Share this post


Link to post
Share on other sites
Different language, different idioms :)

Anyway, in C++, an explicit loop would probably be the cleanest solution, although one *might* be tempted to write something like:


#include <string>
#include <vector>

#include <boost/iterator/counting_iterator.hpp>
#include <boost/lambda/lambda.hpp>

std::vector<int> indices(std::string str, char ch)
{
using namespace boost;

std::vector<int> result;

remove_copy_if(counting_iterator<int>(0),
counting_iterator<int>(str.size()),
back_inserter(result),
lambda::var(str)[lambda::_1] != ch);

return result;
}



(Yes, it works. No, it's not pretty :P)

Share this post


Link to post
Share on other sites
Quote:

Original post by Sharlin
A more idiomatic (and shorter!) way would be using a list comprehension:
def indices(string, char) :
return [ i for i in range(0, len(string)) if string[i] == char ]



Instead of range you can use enumerate which returns pairs of (index,value):

def indices(string, char):
return [i for i,c in enumerate(string) if c == char]



Share this post


Link to post
Share on other sites
Quote:
Original post by Oluseyi
string.find and string.index both take start and end indices within which to search (they behave like slices).

Strings are also iterable in Python, so if you merely wish to iterate over the elements (letters) of the string:
for c in string:
...


Yeah, thats what I was doing originally and why I had problems, because I wanted to get the position of each occurence of the char, and iterating just told me I had found the occurence, not where it was.

Share this post


Link to post
Share on other sites
Alright so now I can't help but wonder if theres a better way to do everything I'm doing while I'm writing python code. I'm writing a simple xml parser, so I use the above function to find the instances of '<' and '>' in the file, and now I want to match them up. I have a node class that I want to save everything between each pair to. This is what I was doing. How could I do this better? more idiomatically?


def BuildNodes(string):

startlist = contains(string,'<') #use previous function to find '<'
endlist = contains(string,'>')

nodelist = []

for i in range(0,len(startlist)):
n = node()
n.start = starlist[i]
n.end = endlist[i]

n.content = string[n.start:n.end]

nodelist += [n]

return nodelist


Share this post


Link to post
Share on other sites
Quote:
Original post by glBender
I'm writing a simple xml parser...

First question: why? Python comes with an Expat parser, and there are a number of additional parsers readily available.

Quote:
...so I use the above function to find the instances of '<' and '>' in the file, and now I want to match them up. I have a node class that I want to save everything between each pair to. This is what I was doing. How could I do this better? more idiomatically?

Before we even get to Python idiom, large scale text pattern matching is best done using regular expressions. Python provides the re module for regular expression processing:

import re

# cross-check the regex syntax, and consult the documentation for the re module
regex = re.compile("(<[\w\d_-]+>)(.*?)(<\/\1>)")
m = regex.search(text)
for g in m.groups():
...

Share this post


Link to post
Share on other sites

This topic is 3876 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this