def BuildNodes(string): startlist = contains(string,'<') #use previous function to find '<' endlist = contains(string,'>') nodelist = [] for i in range(0,len(startlist)): n = node() n.start = starlist n.end = endlist n.content = string[n.start:n.end] nodelist += [n] return nodelist
[Python] string parsing, locating specific chars
Alright so now I can't help but wonder if theres a better way to do everything I'm doing while I'm writing python code. I'm writing a simple xml parser, so I use the above function to find the instances of '<' and '>' in the file, and now I want to match them up. I have a node class that I want to save everything between each pair to. This is what I was doing. How could I do this better? more idiomatically?
Quote:Original post by glBender
I'm writing a simple xml parser...
First question: why? Python comes with an Expat parser, and there are a number of additional parsers readily available.
Quote:...so I use the above function to find the instances of '<' and '>' in the file, and now I want to match them up. I have a node class that I want to save everything between each pair to. This is what I was doing. How could I do this better? more idiomatically?
Before we even get to Python idiom, large scale text pattern matching is best done using regular expressions. Python provides the re module for regular expression processing:
import re# cross-check the regex syntax, and consult the documentation for the re moduleregex = re.compile("(<[\w\d_-]+>)(.*?)(<\/\1>)")m = regex.search(text)for g in m.groups(): ...
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement