Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Sadistic library authors (my rant about Xerces for C++)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
27 replies to this topic

#1 wack   Members   -  Reputation: 1331

Posted 24 March 2012 - 04:49 PM

I am going to rant about my experiences with the xerces XML library for C++. I would also like to hear your own personal rants about your most hated software libraries, so feel free to post them here. Well here goes...

First of all, I should mention that I'm not new to programming. I have seen a lot of hairy stuff. I have been using C++ for almost 15 years now. I am experienced in lots of related semi-obscure technologies such as COM, DCOM, CORBA, ODBC...

... But as over-engineeered as all of the above technologies are, I have found that they have absolutely nothing on xerces.

It all started with a hobby project, where I needed to read and write XML files. I started looking around for something that would fit my requirements of:
  • Fast loading of files from disk
  • Validation against a schema during load
  • Writing of XML files to disk
  • Cross-platform
The only one that seemed to fit the bill was xerces. So, I download a copy, and the first thing I notice, is that the compiled, optimized xerces DLL is a full 2.41 MB. Ok, it's not gigantic, but it seemed pretty hefty for something that is basically just manipulating text files.

Soon enough, I started seeing why. Reading of files with the SAX parser turned out to be multiple inheritance galore. And the strings...they have rolled their own string class too. After pulling hairs for a long time, I managed to finally get it to read my XML files, but not thanks to their documentation, which is utter gibberish and assumes you have a PhD in xerces already.

Just before I started writing this post rant, i spent a few hours trying figure out how to to get xerces to write an XML file. I finally gave up, when I realized it would involve all the junk seen here: http://stackoverflow...nd-c-on-windows
And that's just for saving the file, the DOM tree that contains the actual data needs to be generated separately.

Realizing it would gain me nothing to use xerces to write the file also (it doesn't perform validation, etc. when saving) I just gave up and started writing an implementation that writes it directly to a C++ stream object instead.

My final words about this topic is: Screw you xerces, I hate you. And I hate your documentation even more.

Sponsor:

#2 Antheus   Members   -  Reputation: 2397

Posted 24 March 2012 - 05:33 PM

Xerces was written by Java enterprise architects and later also provided with same API in C++. it doesn't take into consideration any common C++ design idioms since it has to be somewhat identical between the two.

It's also a standard-conforming XML parser, meaning all that cruft needs to be there.

But as over-engineeered as all of the above technologies are, I have found that they have absolutely nothing on xerces.


CORBA wins that competition hands down.

Apache projects are best avoided unless you work with Java. WIth possible exception of the server.

Just before I started writing this post rant, i spent a few hours trying figure out how to to get xerces to write an XML file. I finally gave up, when I realized it would involve all the junk seen here: http://stackoverflow...nd-c-on-windows


Is it wrong that I glanced over that and thought: "what's wrong with that"?

It's a disaster, but there's a perverted reason why that design makes sense. it's just somewhat less verbose in Java. It also shows that it is a product of architects.

Xerces is nice reminder of the prime time of Java Architecture astronauts. People who had absolutely no clue of actual coding, but could suddenly build software architectures. While there are nuggets of good practices in there (such as external memory allocation and passing in factories to create types), the final result is a mess.

#3 DoctorGlow   Members   -  Reputation: 824

Posted 24 March 2012 - 06:52 PM

Had good experience with http://www.grinninglizard.com/tinyxml/

#4 Promit   Moderators   -  Reputation: 7342

Posted 24 March 2012 - 07:24 PM

Had good experience with http://www.grinninglizard.com/tinyxml/

Which can't validate, unfortunately. God help the OP if he needs XSLT.

Xerces is nice reminder of the prime time of Java Architecture astronauts. People who had absolutely no clue of actual coding, but could suddenly build software architectures.

To a large extent this is the fault of XML, itself the result of architecture astronauts who are solving fanciful hypothetical problems rather than real ones. Xerces may just be an honest expression of the total fusion of Java and XML.

It's days like these I sympathize with the guys who swear by C. Not because it's a good idea to use C everywhere, but because so much of the world went this route.

#5 SiCrane   Moderators   -  Reputation: 9629

Posted 24 March 2012 - 07:34 PM

There have also been a couple of wrapper libraries for Xerces that greatly simplify common operations. XMLDOM seems to be the only one that's still around though (or at least it's the only one that I can find with a quick look). I don't know if it meets all the OPs needs, but its worth a look.

#6 Eelco   Members   -  Reputation: 301

Posted 25 March 2012 - 04:15 AM

Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.

#7 wack   Members   -  Reputation: 1331

Posted 25 March 2012 - 04:45 AM

Is it wrong that I glanced over that and thought: "what's wrong with that"?


Posted Image



Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.


There have also been a couple of wrapper libraries for Xerces that greatly simplify common operations. XMLDOM seems to be the only one that's still around though (or at least it's the only one that I can find with a quick look). I don't know if it meets all the OPs needs, but its worth a look.


Interesting. Though it looks like it's been a while since that one was updated also. From a quick look, it seems to use the DOM model only. One strong reason for chosing xerces in the first place is that it also supports the SAX method, which doesn't need to create the whole XML tree in memory, and is a lot faster. I am mostly done with the XML handling of my app now anyways and just wanted to vent a little, but I am pissed off enough to perhaps learn xerces well and write my own wrapper around it and release on the unsuspecting public. If I do, I am going to call it Leonidas.

#8 Eelco   Members   -  Reputation: 301

Posted 25 March 2012 - 05:10 AM


Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.

I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.

#9 wack   Members   -  Reputation: 1331

Posted 25 March 2012 - 05:53 AM



Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.

I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.


I think neither Pyhton or C# would help in this case, since as far as I know, both of them only have DOM parsers, no SAX parsers. So it is perhaps Promit is right. It's XML itself that is the problem. Or would you say it's a testament to the shortcomings of C# and Python that there is no SAX parser?

Edit: it seems python has a SAX parser after all, but calling it from C++ would frankly involve a lot more code than just using xerces.

#10 Eelco   Members   -  Reputation: 301

Posted 25 March 2012 - 06:32 AM




Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.

I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.


I think neither Pyhton or C# would help in this case, since as far as I know, both of them only have DOM parsers, no SAX parsers. So it is perhaps Promit is right. It's XML itself that is the problem. Or would you say it's a testament to the shortcomings of C# and Python that there is no SAX parser?

Edit: it seems python has a SAX parser after all, but calling it from C++ would frankly involve a lot more code than just using xerces.

My point is to not write your high level application in C++; I dont know of your other constraints obviously, so these are more theoretical than practical musings, but if at all possible, id write my main application in python, do my parsing from there (which is probably just a tidy wrapper around some efficient C library), and use boost::python to effortlessly integrate any C++ code that id need to write.

If the python ecosystem does not provide a specific kind of parser, yes id consider that a failure of that languages ecosystem, but of all the languages ive worked with, python has the strongest ecosystem, hands down. Unless you are trying to do something very arcane, id be very surprised if python couldnt do it. A quick google seems to indicate this functionality is in the standard library, even.

#11 wack   Members   -  Reputation: 1331

Posted 25 March 2012 - 06:41 AM





Someone recently asked me if I knew how to do string manipulation in C. I replied: why would I want to know how to clip my nails with a sledgehammer?

This sounds like an analogous situation.

There are so many languages in which things like this are completely painless. If you really must use C(pp) (and there are plenty of good reasons), use it as an extension module to a program written in a language, that will not have you pulling your hair out over such trivial things.

My rule of thumb: if you are writing int main(){} anywhere in your code, you are abusing C++ and doing it wrong.


While that sounds good in theory, my experience is that few things in C++ are difficult enough to warrant the complexities in debugging and glue code that combining multiple languages always involve. My rant is mostly about xerces making things more difficult than they need to be, in any language.

I dunno; not having such libraries be part of the language ecosystem seems like a typical C++ problem, that youd run in time and time again. The fact that there apparently is no sensible library available for C++ is also a testament to its shortcomings as a language.

Coming from a python and .net perspective, the glue code has never been much of a barrier, but I suppose those are the exceptions.


I think neither Pyhton or C# would help in this case, since as far as I know, both of them only have DOM parsers, no SAX parsers. So it is perhaps Promit is right. It's XML itself that is the problem. Or would you say it's a testament to the shortcomings of C# and Python that there is no SAX parser?

Edit: it seems python has a SAX parser after all, but calling it from C++ would frankly involve a lot more code than just using xerces.

My point is to not write your high level application in C++; I dont know of your other constraints obviously, so these are more theoretical than practical musings, but if at all possible, id write my main application in python, do my parsing from there (which is probably just a tidy wrapper around some efficient C library), and use boost::python to effortlessly integrate any C++ code that id need to write.

If the python ecosystem does not provide a specific kind of parser, yes id consider that a failure of that languages ecosystem, but of all the languages ive worked with, python has the strongest ecosystem, hands down. Unless you are trying to do something very arcane, id be very surprised if python couldnt do it. A quick google seems to indicate this functionality is in the standard library, even.


Well, let's just say we disagree. I have combined muliple languages many times, and maintained apps where it has been done by others. Almost always you end up regretting and cursing the debugging horror that appears. Most of the time it is better to select one language that has the features you need and stick with it, even if some of the features are better implemented in other languages.

#12 Antheus   Members   -  Reputation: 2397

Posted 25 March 2012 - 07:16 AM


Had good experience with http://www.grinninglizard.com/tinyxml/

Which can't validate, unfortunately. God help the OP if he needs XSLT.

Xerces is nice reminder of the prime time of Java Architecture astronauts. People who had absolutely no clue of actual coding, but could suddenly build software architectures.

To a large extent this is the fault of XML, itself the result of architecture astronauts who are solving fanciful hypothetical problems rather than real ones. Xerces may just be an honest expression of the total fusion of Java and XML.

It's days like these I sympathize with the guys who swear by C. Not because it's a good idea to use C everywhere, but because so much of the world went this route.


I actually take that back.

Xerces was designed long time ago, before C++ standardization, before new design ideas like Alexandrescu's. It's an example of evolved C with classes.

ACE suffers from a similar problem, but we can compare it to asio. With XML, being verbose as it is, nobody is probably going to bother to write a modernized version.

Most standard Java libraries suffer from same problem. They were designed and grew as Java got adopted. But during 15 years, many things changed and most APIs would be approached differently, benefiting from the insight gained.

C# and .Net are actually making use of these experiences and evolving both the language as well as APIs.

As an example - ORM are flawed by design. They map 1:1 table->Table, row->Row, database->Database... Why build abstractions that don't abstract anyway. LINQ is the meaningful step forward, it focuses on what one really wants to do with data, namely query and mangle it.

So even if xerces does not necessarily have an equivalent counterpart, they demonstrate that software design has improved.

#13 Eelco   Members   -  Reputation: 301

Posted 25 March 2012 - 08:25 AM

Well, let's just say we disagree. I have combined muliple languages many times, and maintained apps where it has been done by others. Almost always you end up regretting and cursing the debugging horror that appears. Most of the time it is better to select one language that has the features you need and stick with it, even if some of the features are better implemented in other languages.

It depends on the languages involved I suppose. But mixing C into python is really as easy as breathing, and in the latest .NET the interop is almost completely transparent as well. But again, afaik these are indeed the exceptions; it can get pretty messy for other languages.

Indeed debugging external code can be hard; though in my experience the extensions are usually fairly isolated and small bits of code. boost::python supports automated translation of C++ exceptions into python exceptions; I havnt used that myself, but it sounds nice in theory at least.

#14 wack   Members   -  Reputation: 1331

Posted 25 March 2012 - 09:14 AM


Well, let's just say we disagree. I have combined muliple languages many times, and maintained apps where it has been done by others. Almost always you end up regretting and cursing the debugging horror that appears. Most of the time it is better to select one language that has the features you need and stick with it, even if some of the features are better implemented in other languages.

It depends on the languages involved I suppose. But mixing C into python is really as easy as breathing, and in the latest .NET the interop is almost completely transparent as well. But again, afaik these are indeed the exceptions; it can get pretty messy for other languages.

Indeed debugging external code can be hard; though in my experience the extensions are usually fairly isolated and small bits of code. boost::python supports automated translation of C++ exceptions into python exceptions; I havnt used that myself, but it sounds nice in theory at least.


I do infact plan on using Python in my app, but as an extension language to write small extensions, and not the main language.

I hate to say this, and probably will be flamed by almost everyone for it, but there are good reasons why python (and similar languages) will never be popular for writing large applications. It's easy to write stuff in, but when you start getting into large scale stuff that will need to be maintained for years or even decades, the typing system of Python will cause your app to become an unstable mess. There are many reasons for this, including:
  • People will come and go to the project over the years, and quite honestly, most of them will be morons. There is no static type system that will catch their errors early. You will have to run all of the app to ensure it's stable. Over and over and over again. Anyone who has been involved in testing apps with a few million lines of code, know that just doing one test run is very time consuming, and will not even cover all scenarios.
  • Writing automatic tests are often proposed as a solution for this, but rarely works in practice, because (including, but not limited to):
  • Most people are morons, and can't be trusted to properly evaluate how to write a sufficient test for the code.
  • It is a gross waste of time to write tests for stuff that the compiler would easily catch if a statically typed language was used.
  • It is extremely common that projects go on for longer than planned, becaue of unforeseen difficulties or bad estimates. When time starts running short, the things that are not "strictly necesary" are skipped. Yes, this means automated testing.
In Python, it is too easy to make a change that breaks your app in interesting ways, without anybody noticing it until much later. Maybe only when it's too late.

So, in summary, Python is fine for small stuff, but attempting to do anything large is doomed to fail, even if I'm sure someone can manage to find examples where people have succeeded against all odds.

#15 Eelco   Members   -  Reputation: 301

Posted 25 March 2012 - 01:46 PM

Dunno, ive never much worked on huge applications maintained over long periods. If thats the aim, C++ seems like the stuff of nightmares though. Nor have I ever missed typing in python for anything other than code completion. And of course, enthought.traits gives type checking in python plus more. Stupid programmers are not going to be saved by using another language; least of all C++.

Ive never heard the developers of mercurial complain about their work being impossible. And I suppose the reason why you dont see much python in commercial applications is because it is so easy to reverse engineer.

But true, I dont really know what im talking about from experience.

Still, if you are looking for a non-quircky and strict language to serve as a long term backbone for the high level structure of your project, how is C# for instance not a far better choice than C++?

#16 wack   Members   -  Reputation: 1331

Posted 25 March 2012 - 02:39 PM

Still, if you are looking for a non-quircky and strict language to serve as a long term backbone for the high level structure of your project, how is C# for instance not a far better choice than C++?


Basically, as I see it, there are three somewhat sane language choices for larger projects today. They are C++, Java and C#. The D language that once looked promising seems to have failed. The similarities between these languages are far greater than the differences, so a lot of it comes down to personal preference. Since the thing I'm working on is still a personal project, that did have something to do with the choice. But I did use a set of pro/con criteria that are important to me when selecting the language also:

C#
  • +The syntax and language features have improved a lot lately. The only thing I still hate is the exception handling (which, to be fair, is equally terrible in C++)
  • +Extensive standard libraries.
  • -Not actally cross platform, it would be pretty stupid to use C# if you even suspect you need to run on non-Microsoft environments now, or at any point in the future. Especially the server part of my app is intended to run on Linux, without using the sub-par Mono environment.
  • Long term future: Uncertain, as you can see, Microsoft is losing traction in lots of area that are not desktop computing. Microsoft also has a proven track record of dumping languages when it suits them. Remember VB6?
Java
  • +Run-time environments are available for all major platforms.
  • -Language and associated libraries are starting to feel quite dated.
  • -Integration with the underlying platform is clumsy, users usually feel there is something "different".
  • Long term future: Uncertain. If there is one thing that can be counted on, it's that having anything to do with Oracle will come back and bite you in the ass somehow.
C++
  • +Flexible language, the news in C++11 add a lot of things that have been sorely missing.
  • +Cross platform, if you want it to be.
  • +No run-time environments required.
  • -Standard libraries contain the bare essentials only, you will likely need third-party libs.
  • Long term future: Seems stable. There are multiple good implementations, and no single company controlling the language. They have a proven track record of making sure to break as few things as possible between new versions of the standard
But generally speaking, depending on what you are trying to accomplish, any of the three is a fine choice. It is easy enough to find people who can program in them, but the people who know C++ tend to be better programmers in general than those who know only Java for instance.

As you can see, the "famous speed" of C++ wasn't even a deciding factor in this particular project, it's just a nice bonus.

#17 Eelco   Members   -  Reputation: 301

Posted 25 March 2012 - 03:34 PM

I dont think microsoft deserves a bad rap for support. And no way they will drop C#, considering its widespread use in some fields. But if crossplatform is important to you, yeah... mono seems cute, but now there is some support I dont trust. That said, if MS ever dropped the ball on C# for inexplicable reasons, the momentum behind mono would quickly swell.

Java, I wouldnt want to use. Indeed the libraries suck, the language is cluncky, and breaking out of safe code and writing some C is a major pain in the butt.

D is quite nice actually. Ive used it a lot when it was still under development, and stopped using it eventually, but the recent releases are very stable and functional, and the toolchain has improved a lot too. The only thing that still sucks is library availability, which is what drove me away in the first place...
That said, it does have xml parsing in the stdlib, and not only that, but it blows the best C++ parser out of the water, in terms of performance : http://dotnot.org/blog/archives/2008/03/10/xml-benchmarks-updated-graphs-with-rapidxml/

Still, I cant imagine doing a large project in C++. The build times make my compile-and-correct coding style completely impossible, and it seriously pisses me off to repeat myself in a header file, performing the kind of automatable tasks that computers were invented to perform in the first place. And then there is the ecosystem, which got us started here; there are large gaps in functionality, and what is out there often takes longer to configure, compile and reverse-engineer than it takes to roll your own solution. Ugh.

#18 wack   Members   -  Reputation: 1331

Posted 26 March 2012 - 12:01 PM

D is quite nice actually. Ive used it a lot when it was still under development, and stopped using it eventually, but the recent releases are very stable and functional, and the toolchain has improved a lot too. The only thing that still sucks is library availability, which is what drove me away in the first place...
That said, it does have xml parsing in the stdlib, and not only that, but it blows the best C++ parser out of the water, in terms of performance : http://dotnot.org/bl...-with-rapidxml/


The D language has completely failed to gain any traction whatsoever since they started. It seems well on it's way to becoming a minor footnote in programming history at this point. Which is a shame, becaue it did indeed seem promising.

The failure of D has, as far as I can see (having followed it from a safe distance) is mainly because of two reasons:
  • No support from any of the large OS vendors, who aw greater benefit in peddling their own stuff instead.
  • Internal bickering. Instead of making a decision and sticking with it, there are now two different standard libraries for it. Whopee.


There are many XML-parsers out there that are faster than Xerces, but they are mostly toy XML-parsers for people who often have no idea of why they are using XML in the first place. The one benchmarked in D seems to be one of them.

#19 Promit   Moderators   -  Reputation: 7342

Posted 26 March 2012 - 12:56 PM

I'd like to remind all of you that we are talking about XML, Xerces, and libraries. Not languages. If this becomes a language thread, I will end it.

#20 Matt328   Members   -  Reputation: 240

Posted 26 March 2012 - 06:51 PM

Whats wrong with TinyXml++? Its a few source files you can drop into your project and code written with it is about as concise as parsing xml can be. The few times I've had to parse xml in C++ its worked pretty well for me.

It might not be the fastest thing there is, but the decision to use xml pretty much admits that you favor human readability over performance anyway.

Edit: Doh i see you were looking for something that validates against a schema which I don't think TinyXml++ does. I validation is something that you decide can fall off the cart, check out TinyXml++.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS