• Create Account

Banner advertising on our site currently available from just \$5!

Why XML is all the rage now?

65 replies to this topic

#21nevS  Members   -  Reputation: 150

Like
0Likes
Like

Posted 02 August 2013 - 05:26 AM

Imho XML is the better suited format to represent data. JSON is worthwhile when you use JavaScript, or want to save some bits (ajax-traffic). JSON comes at a cost - the lost flexibility and the technology which XML offers:

XSD - you can validate the data easily before reading it with your application.

XSLT - you can transform the data into almost any other representation. You can even create JSON out of your XML with a pretty small XSL-Transformation, or merge multiple XML-Files to create a new one. There are almost no limits!

XPath - you can search/access single/multiple fields of data. This can be used in your code, in XSLTs or just by other tools (editors, IDEs for example).

XML takes alot of space! - Use compression. The difference to JSON is pretty low after it.

XML is painful to edit! - Use a proper editor with auto completion and auto validation. JSON with a deep hierarchie is also not easy to edit due to alot of brackets. JSON also requires to escape more characters than XML, which can cause alot of problems when editing by hand.

#22Sik_the_hedgehog  Crossbones+   -  Reputation: 2148

Like
2Likes
Like

Posted 02 August 2013 - 06:31 AM

XML is overly complicated, redundant, bloated, etc...? Read again the last paragraph. You need not look at it if you don't like it. You need not edit it, The Program will read/write its data just fine without you interfering.

It matters though. If you make the structure complex, the program will become just as complex. Granted, the issue is not so much with XML, but rather with the fact it got abused like crazy, but that still makes the point stand. Just because something can be relegated to a program doesn't mean it's going to be easier to maintain.

XML takes way too much storage space? Wait, did you hear that? That's the world's saddest song playing on the world's smallest violin. Seriously, you have an office package installed that takes half a gigabyte of disk space only for a text editor and a spreadsheet, you have 2 TiB of MP3s on your harddisk, and you worry whether a puny XML file is 4 kiB or 8 kiB? Tell you what, there is WinZIP if you need to worry about 4KiB. Right, the XML files in your content pipeline aren't precisely 4 kiB, they're more like 40 MiB. Good grief, I'm shocked.

Just checked. About 6GB for 8385 files, and I know there's some redundancy there. If you have 2TB in music you probably have other matters to worry about (especially since 2TB are some of the largest hard disks available - 3TB is not that common yet).

And that kind of thinking is what results in modern computers feeling just as crap as early ones even though they're thousands of times more powerful (or in the case of memory, millions of times). I know some stuff does indeed require more power, but this idea that we should waste resources just because we can waste is just plain stupid.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#23swiftcoder  Senior Moderators   -  Reputation: 13680

Like
0Likes
Like

Posted 02 August 2013 - 06:47 AM

And that kind of thinking is what results in modern computers feeling just as crap as early ones even though they're thousands of times more powerful (or in the case of memory, millions of times). I know some stuff does indeed require more power, but this idea that we should waste resources just because we can waste is just plain stupid.

QFE.

My quad-core i7 should be able to launch Microsoft Word faster than a 386 in the mid-90's. And yet... it takes 10x longer.

How much of that is due to picking inferior approaches just "because"?

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]

#24Olof Hedman  Crossbones+   -  Reputation: 3836

Like
0Likes
Like

Posted 02 August 2013 - 07:14 AM

What is so horribly bloated with:

<someobject arg0="value" arg1="value"/>

the ":s?  the slash?

Not much to argue about I think...

Sure, a bit more bloat if the object needs a variable amount of sub-objects (like an array of whatever):

<someobject arg0="value" arg1="value">

<subobject arg0="value"/>

<subobject arg0="anothervalue"/>

</someobject>

But I think it is rather easy-to-read...  And any sane editor will help you with closing tags and such...

If you only need the JSON-style data definitions, you don't really need much more from XML then that...

Then of course you can fuck up and make something like:

<obj type="someobject">
<argument name="arg0" value="value"/>
<argument name="arg1" value="value"/>
<subobjects>
<obj type="subobjecttype">
<argument name="arg0" value="value"/>
</obj>
<obj type="subobjecttype">
<argument name="arg0" value="anothervalue"/>
</obj>

</subobjects>
</obj>

But thats just stupid.... and not really XMLs fault...

Edited by Olof Hedman, 02 August 2013 - 07:23 AM.

#25Bacterius  Crossbones+   -  Reputation: 11386

Like
5Likes
Like

Posted 02 August 2013 - 08:19 AM

My quad-core i7 should be able to launch Microsoft Word faster than a 386 in the mid-90's. And yet... it takes 10x longer.

How much of that is due to picking inferior approaches just "because"?

Honestly that's because most programmers are just lazy and would rather commit code as soon as the unit test bar turns green and receive their "most productive employee of the month" award than read over their code and spend some time reviewing it to see what could be made better. Back a few decades software was optimized because it had to be. Now that computers are a lot faster, this problem doesn't exist any longer, so anything that works is "good enough". And if someone complains that it's unusably slow even though it does essentially the same thing it did ten years ago, no problem, just argue that it "actually does a lot more behind the hood" or "it will be faster in 18 months" or "it's the operating system's fault" or the good old "if you're not happy with the product then don't use it". Another popular one these days, especially with games, is "it's an alpha", as if that somehow excused everything.

Usability (for end users) is ultimately a measure of work done over time taken. Notice that doesn't include development time. The end user doesn't give a crap about how you whipped up that program in twenty minutes, he only cares that it gets the job done quickly enough. By sacrificing usability for development time, you are not doing your user a favor. And the instant the user's priorities take a back seat is the instant you have failed. Your job is to write software for other people to use, not to pump out lines of code as quickly as possible. Don't lose sight of what actually matters.

The current trend towards mediocrity and complacency is rather depressing. This "fast-paced" world sucks. Whatever happened to craftsmanship. Anyway, rant over.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

#26swiftcoder  Senior Moderators   -  Reputation: 13680

Like
0Likes
Like

Posted 02 August 2013 - 08:22 AM

Whatever happened to craftsmanship. Anyway, rant over.

I feel like I give this talk daily in my workplace. Sometimes people even listen.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]

#27BGB  Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 02 August 2013 - 09:04 AM

reworking to a slightly more concise syntax:

[obj type="someobject"
[argument name="arg0" value="value"]
[argument name="arg1" value="value"]
[subobjects
[obj type="subobjecttype"
[argument name="arg0" value="value"]]
[obj type="subobjecttype"
[argument name="arg0" value="anothervalue"]]]]

so, yeah, not a huge difference...

a bigger saving is (for performance) eliminating things like free-floating text, omitting support for full namespaces, and adding support for explicit numeric values (this is essentially what my XML-based compiler AST formats did, though retaining the normal external syntax). also sometimes useful is options for encoding raw binary data (in ASCII form, generally dumped out as Base64 or a Base85 variant). a lot here depends on the exact in-memory node representation, ...

as for reducing size (via compression/serialization), there are a few options:
XML+Deflate, which is relatively straightforward, and compresses fairly well, but is slightly more expensive to encode/decode;
WBXML, basically works, but has some limitations, and results in bigger files than XML+Deflate;
EXI, never got around to fully evaluating, compresses well but I found the spec difficult to make much sense out of;
...

I had a few of my own variants, one example was SBXE, which was a "slightly improved" alternative to WBXML (slightly more compact, and more features).
SBXE+Deflate was generally slightly more compact than XML+Deflate, but the difference was fairly small.

another was related to the (never fully implemented or used) XML-coding mode of my "BSXRP" protocol, which would have used Huffman compression and VLC coding for values. (as-is, the protocol is mostly used for encoding S-Expression like data...).

both formats were loosely based (in concept) on LZP, in particular the encoding tries to predict the following tag or attribute (based on recent history), allowing this case to be coded more efficiently (and does not depend on the use of a schema), otherwise (should this prediction fail) there is the option of reusing a recently-coded value, or (as-needed) explicitly encoding the tag or attribute name (as a string). SBXE used an LZP variant for strings, whereas BSXRP used LZ77 (and an otherwise Deflate-like data representation).

#28samoth  Crossbones+   -  Reputation: 5920

Like
0Likes
Like

Posted 02 August 2013 - 10:30 AM

And that kind of thinking is what results in modern computers feeling just as crap as early ones even though they're thousands of times more powerful (or in the case of memory, millions of times). I know some stuff does indeed require more power, but this idea that we should waste resources just because we can waste is just plain stupid.

QFE.

My quad-core i7 should be able to launch Microsoft Word faster than a 386 in the mid-90's. And yet... it takes 10x longer.

How much of that is due to picking inferior approaches just "because"?

But that's not because some guy in the content pipeline used a tool that uses XML to lay out a dialog or such. Even the fact that they use zip-compressed XML in their office documents now doesn't bog things down (except if you use LibreOffice, which for some reason totally stinks importing these).

It's because Office first compiles a ton of C#, then loads three dozen of libraries half of which probably aren't needed at all while it shows an animation that nobody wants to see, then connects to Live.Microsoft.com and Facebook, and whatnot, and  because every single thing goes through 4 or 5 layers of legacy code and libraries.

And you know what? Nobody cares. Companies buy Office, and will contiue to buy Office, so all is good. The next incantation (Office 360) will be even worse when everything is "cloud only". And again, nobody will care, because "cloud is cool", and nobody wants to be less cool than everyone else.

It's like Windows 8, which is on all accounts much worse than Windows 7. Nobody cares. People will buy.

#29swiftcoder  Senior Moderators   -  Reputation: 13680

Like
0Likes
Like

Posted 02 August 2013 - 10:48 AM

The next incantation (Office 360) will be even worse when everything is "cloud only".

Maybe. Google docs launches in 3.5 seconds for me, though...

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]

#30achild  Crossbones+   -  Reputation: 2069

Like
2Likes
Like

Posted 02 August 2013 - 01:22 PM

YAML is taking off? I have yet to see anything use it. JSON though, yes, it seems that lately everybody and their dogs are using JSON now. Probably because we're in full HTML5 mode, and JSON data is valid javascript, so using it is a non-brainer as you don't even need a parser (I wonder if anybody understands the implications of loading data as code though).

Personally, I prefer INI files anyway (well, INI-like at least). Yeah, call me old-fashioned, but they're a lot easier to deal with. XML is good when you need tree-style nesting, but most of the time you don't, really (and even then, those using XML more often than not abuse it resulting in ridiculously complex formats for no real reason).

Added systems for localization AND fairly comprehensive theming in a single night for our next release... using ini files. Still need a good sit-down or two to link remaining text, gui elements, etc to these systems, but the functionality itself is complete and works great.

Honestly I've been paranoid that the design choice was naive or missing something critical because it was so darn simple and ini files are all but unheard of these days - but it works like a charm! Not to mention it takes like 30 lines of code to have a cross platform ini parser.

I can definitely see the need to support human readable hierarchies in some cases, but it seems to happen way too often when it is unneeded.

I hold the firm belief that given time, the open-source world will achieve its ultimate goal of reducing every piece of software in the world down to operations on a key/value store (see the rise of plist, JSON, Lua, and NoSQL).

Then we can resurrect the INI file, and be done with it.

Haha.

#31alnite  Crossbones+   -  Reputation: 2598

Like
0Likes
Like

Posted 02 August 2013 - 02:50 PM

<obj type="someobject">
<argument name="arg0" value="value"/>
<argument name="arg1" value="value"/>
<subobjects>

<obj type="subobjecttype">
<argument name="arg0" value="value"/>
</obj>

<obj type="subobjecttype">
<argument name="arg0" value="anothervalue"/>
</obj>

</subobjects>
</obj>

In YAML:

someobject:
arg0: value
arg1: value
subobjects:
- arg0: value
- arg0: anothervalue


In JSON:

{ "someobject" : { "arg0": "value", "arg1": "value", "subobjects": [ { "arg0": "value" }, { "arg1": "anothervalue" } ] } }


#32swiftcoder  Senior Moderators   -  Reputation: 13680

Like
0Likes
Like

Posted 02 August 2013 - 03:03 PM

And just to reinforce my point, in INI:
[obj]
id=someobject
arg0=value
arg1=value

[obj]
parent=someobject
arg0=value

[obj]
parent=someobject
arg0=anothervalue


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]

#33alnite  Crossbones+   -  Reputation: 2598

Like
2Likes
Like

Posted 02 August 2013 - 03:09 PM

And just to reinforce my point, in INI:

Just when I thought "maybe I should do the INI too!"

#34patrrr  Members   -  Reputation: 1171

Like
0Likes
Like

Posted 02 August 2013 - 03:12 PM

In YAML:

someobject:
arg0: value
arg1: value
subobjects:
- arg0: value
- arg0: anothervalue


In JSON:

{ "someobject" : { "arg0": "value", "arg1": "value", "subobjects": [ { "arg0": "value" }, { "arg1": "anothervalue" } ] } }


Hm, that doesn't seem right. Shouldn't it be:

arguments:
arg0: value
arg1: value
subobjects:
- arguments:
arg0: value
- arguments:
arg0: anothervalue


Edited by patrrr, 02 August 2013 - 03:16 PM.

#35frob  Moderators   -  Reputation: 29905

Like
0Likes
Like

Posted 02 August 2013 - 04:23 PM

Points about JSON > XML aside...

As a game developer, the big deal to me is that it's a flexible standardized text format which means:

• I don't have to create my own libraries to read, write, or navigate it.
• At least for the purposes of developing and debugging tools that use, generate, or convert it, it's human readable.
• It's diff-able and potentially merge-able, which to me makes it first-class revisionable.

^^ THIS!

It is a shame this is a Lounge post and we cannot upvote.

As a person who remembers the bad-old-days of the 90s and late 80's, I easily recall when most tools and technologies relied entirely on binary formats.

I don't care one bit if the language is XML, JSON, YAML, or Maya Ascii or Collada or anything else.  That does not matter.

I absolutely care about factors like those above.

I absolutely care that as a developer I can understand and interpret the file without a binary-file parser.  I can crack open a file, run a text search, and find the data I need.  Even better, sometimes I can run find-and-replace on that file using simple tools.

I absolutely care that I can run diff against two versions of the file and see the difference. I can open a diff of two revisions with an artist sitting next to me, and we can both glance at the file and see that they moved a joint a little to the left.

I don't care about the exact format, but I strongly care that these human-readable, standardized, diff-able files are used.  There was a time when storage space was expensive and everything needed to be encoded.  We are past those days.

Edited by frob, 02 August 2013 - 04:23 PM.

Check out my book, Game Development with Unity, aimed at beginners who want to build fun games fast.

Also check out my personal website at bryanwagstaff.com, where I write about assorted stuff.

#36Sik_the_hedgehog  Crossbones+   -  Reputation: 2148

Like
2Likes
Like

Posted 02 August 2013 - 04:24 PM

Add some newlines to that JSON example O_O

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#37Olof Hedman  Crossbones+   -  Reputation: 3836

Like
0Likes
Like

Posted 03 August 2013 - 12:55 PM

In YAML:
someobject:
arg0: value
arg1: value
subobjects:
- arg0: value
- arg0: anothervalue
In JSON:
{ "someobject" : { "arg0": "value", "arg1": "value", "subobjects": [ { "arg0": "value" }, { "arg1": "anothervalue" } ] } }

Yeah, and in XML, the sane definition would be:

<someobject arg0="value" arg1="value">

<subobject arg0="value"/>

<subobject arg0="anothervalue"/>

</someobject>

Granted, there is some repetition of "subobject" and "someobject", but the bloat isn't really that much is it?

#38slicer4ever  Crossbones+   -  Reputation: 4812

Like
0Likes
Like

Posted 03 August 2013 - 02:21 PM

In YAML:
someobject:
arg0: value
arg1: value
subobjects:
- arg0: value
- arg0: anothervalue
In JSON:
{ "someobject" : { "arg0": "value", "arg1": "value", "subobjects": [ { "arg0": "value" }, { "arg1": "anothervalue" } ] } }

Yeah, and in XML, the sane definition would be:

<someobject arg0="value" arg1="value">

<subobject arg0="value"/>

<subobject arg0="anothervalue"/>

</someobject>

Granted, there is some repetition of "subobject" and "someobject", but the bloat isn't really that much is it?

this is why i like xml over yaml.  json is interesting, but i'm sticking to xml personally.  yea their's a bit of bloat, but imo xml's minor bloat is made up for in terms of readability.  of course xml is obviously more easily abusable(as was pointed out above), but that shoudn't be a reason not to use something.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

#39Sik_the_hedgehog  Crossbones+   -  Reputation: 2148

Like
0Likes
Like

Posted 03 August 2013 - 03:10 PM

Granted, there is some repetition of "subobject" and "someobject", but the bloat isn't really that much is it?

Nah, and as a bonus it'll be also easier to parse for the program, so everybody wins (it's easier to hand-edit and the program is easier to maintain).

The problem is that XML tends to be abused in the worst ways possible x_x; It's like programmers see it's a tree so they need to turn everything into a deep as possible tree no matter what, just in case (some may argue it's for expandability). Ugh. Sadly, I wouldn't be surprised if that also reflects the complexity of the program themselves... (one reason why I always have trouble with third party code, more often than not it's way more complex than it needed to be)

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#40phantom  Moderators   -  Reputation: 8718

Like
5Likes
Like

Posted 03 August 2013 - 05:30 PM

Honestly that's because most programmers are just lazy and would rather commit code as soon as the unit test bar turns green and receive their "most productive employee of the month" award than read over their code and spend some time reviewing it to see what could be made better.

Really?
You are going to pull the 'programmers are lazy' bullshit out of your arse?

Seriously, I expect this on a 'gamer' forum but on a developer one? jesus...

Has it ever occurred to you that the reason the code goes in when it is ready and isn't gone over to hell and back is because said programmer has various managers higher up the chain breathing down their neck to get shit done yesterday?

You ask what happened to craftsmanship? People want shit yesterday that's what happened.

Plenty of talented people out there are still trying to do the best they can but don't get a chance because higher up the chain 'good enough' is what they want and its on to the next feature. I've lost track of the number of things I've had to check in where I know I could have improved it but the time wasn't there because 'feature Y' needs to be done in a week now. You fight the battle, some times you win and more often than not you lose.

Christ...

PARTNERS