[Python] string parsing, locating specific chars

Started by
11 comments, last by glBender 16 years, 11 months ago
Alright so now I can't help but wonder if theres a better way to do everything I'm doing while I'm writing python code. I'm writing a simple xml parser, so I use the above function to find the instances of '<' and '>' in the file, and now I want to match them up. I have a node class that I want to save everything between each pair to. This is what I was doing. How could I do this better? more idiomatically?

def BuildNodes(string):    startlist = contains(string,'<') #use previous function to find '<'    endlist = contains(string,'>')    nodelist = []    for i in range(0,len(startlist)):        n = node()        n.start = starlist        n.end = endlist        n.content = string[n.start:n.end]        nodelist += [n]    return nodelist
Advertisement
Quote:Original post by glBender
I'm writing a simple xml parser...

First question: why? Python comes with an Expat parser, and there are a number of additional parsers readily available.

Quote:...so I use the above function to find the instances of '<' and '>' in the file, and now I want to match them up. I have a node class that I want to save everything between each pair to. This is what I was doing. How could I do this better? more idiomatically?

Before we even get to Python idiom, large scale text pattern matching is best done using regular expressions. Python provides the re module for regular expression processing:
import re# cross-check the regex syntax, and consult the documentation for the re moduleregex = re.compile("(<[\w\d_-]+>)(.*?)(<\/\1>)")m = regex.search(text)for g in m.groups():    ...
No matter what I do... Python always has a better way :) thanks Olu

This topic is closed to new replies.

Advertisement