String Split With Different Milti-Character Delimiters

Started by
3 comments, last by Glass_Knife 9 years, 5 months ago

When attempting to parse an XML file, I have run into the issue of how to efficiently extract strings .

Using pure Java, without 3rd party libraries, what would be the best solution for extracting a string from between something like this:


<Foo> Hello Every Foo One </Foo>

Thanks in advance.

I cannot remember the books I've read any more than the meals I have eaten; even so, they have made me.

~ Ralph Waldo Emerson

Advertisement

When attempting to parse an XML file, I have run into the issue of how to efficiently extract strings .

Using pure Java, without 3rd party libraries, what would be the best solution for extracting a string from between something like this:


<Foo> Hello Every Foo One </Foo>

Thanks in advance.

Why not use regular expressions? You can use capturing groups: http://www.javamex.com/tutorials/regular_expressions/capturing_groups.shtml

in your case, the regex would be something like "<Foo>([^<]*)</Foo>" (capturing group in bold)...

If this is the most performant way, IDK... Regular expressions are not that well known for being very performant. Altough they certainly powerful and flexible, they do run slower than other alternatives, so if you need performance and want to use regular expressions, maybe look into ways to make them run faster.

For example: http://www.javapractices.com/topic/TopicAction.do?Id=104

If you already know all that and are looking for a different approach, IDK.... maybe splitting twice with < and > as split chars?


Using pure Java, without 3rd party libraries

Java is not only a language, but a framework and a DOM (document object model) for reading xml files is part of java core since jdk1.4. Take a look here.

There's lots of possible answers to this, depending on what else you need to be able to handle.

One robust answer is to use JAXP:


Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader("<Foo> Hello Every Foo One </Foo>")));
System.out.println(doc.getDocumentElement().getTextContent());

Use the XML parser like dmatter and Ashaman73 said. The reason you use a standard like XML is to take advantage of the libraries and tools available.

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

This topic is closed to new replies.

Advertisement