• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
TheChubu

Why XML is all the rage now?

62 posts in this topic

 


brief history (approximately of the past 14 years):

at one point, I wrote a Scheme interpreter, and it naturally uses S-Expressions.

later on, this project partly imploded (at the time, the code became mostly unmaintainable, and Scheme fell a bit short in a few areas).

by its later stages, it had migrated to a form of modified S-Expressions, where essentially:

macros were expanded; built-in operations used operation-numbers rather than symbols; lexical variables were replaced with variable-indices; ...

there was also a backend which would spit out Scheme code compiled to globs of C.

 

Quite the array of language projects you have there!

 

I too am fond of the use of S-expressions over that of XML, and have had experience using them for data and DSLs in a number of projects. You can't beat the terseness and expressive power, and it's not hard to roll your own parser to handle them.

 

I share many of the opinions from: http://c2.com/cgi/wiki?XmlIsaPoorCopyOfEssExpressions

 

As for my own projects, I've also built a custom R6RS parser in C++, and have done some interesting things with it. For specifying data as maps/sets/vectors, I added support for handling special forms which yield new data-structure semantics, added Closure-like syntactic sugar to the lexer/parser where braces and square brackets can be used to define such data structures, and added a quick tree-rewriting pass to the data compiler to convert from the internal list AST node representation to the appropriate container type.

 

For simple data, sometimes I just go with simple key-value text files if I can get away with it (less is more! strtok_r does the job good enough), and I've recently been experimenting with using parsing expression grammar generators to quickly create parser combinators for custom DSLs that generate more complex data or code as s-expressions or C++.

 

A shame that many of the "big iron" game studios still use XML for a lot of things, although I've managed to convince a number people that it's time to move on. I dread the days where I am tasked with working on anything touching the stuff.

 

In short, if you're still using XML, you're needlessly wading through an endless swamp of pain, suffering and obtuse complexity. Things can be better.

 

 

I was working at the time with R5RS.

by the time R6RS came out, I had mostly stopped using Scheme, and looking at it briefly, it looked like a bit of a jump from what R5RS was.

 

the AST format later used for BGBScript was based partly on R5RS, but differs in a lot of ways, namely in those which make it a better fit for an HLL with a more C/JS/AS3/... like syntax, like different special-forms for defining things, ones representing control-flow constructs (for/while/switch/...), ...

also, generally it moved to the use of explicit special-forms for things like function calls and using operators, ...

 

some elements of Scheme also were worked into the HLL design as well (tail-calls / tail-position, implicit return values, lists, ...).

early on, both Self and Erlang were also influences for the language design.

later on, Java, C#, and AS3 became influences.

 

basically, while it started out dynamic and prototype based, static-types, classes, packages, ... were later glued on, partly for performance reasons, and also because they are more effective for a lot of use-cases (can do stronger compile-time checking, ...).

 

though, the language still retains most of its dynamic funkiness (including a Self-derived scoping model, scoping semantics are fun in my language...). not going to try to explain the type-system and scoping model here though.

 

 

for parsers, I have most often used hand-written recursive-descent.

I started out with RD, and pretty much every non-trivial syntax I have encountered seems to work fine with RD.

 

 

XML and S-Expressions both have some use-cases.

 

granted, my XML APIs have since diverged somewhat from DOM, becoming generally a lot more operation-centric, and much less about treating XML nodes as objects (and generally, the "Document" metaphor is all but absent in-use). basically, the API focuses a lot more on composition and decomposition of data, rather than on node manipulation. ironically, it isn't used much at all with external tools (typically about the only time most of this is actually seen is in debugging dumps).

 

theoretically, it could also matter if/when I needed to interact with other things which use XML, or if by some off-chance I decide to use XML-RPC again (currently unlikely...).

 

 

granted, from an ease-of-use perspective, lists are hard to beat, as they are generally a lot easier to work with with a lot less code.

granted, my approach to this (C-side) has been to build a big chunk of Lisp-like APIs in C (basically, a bunch of Lisp and CLOS-like stuff glued onto C).

 

granted, it took several iterations before really settling on a usable set of tradeoffs (getting something that is both usable and performs well).

 

a lot of the infrastructure is shared between my script-language and C parts of the project.

 

 

I had considered (binary) XML for my network protocol, but ended opting instead with lists.

 

basically, my network protocol consists of basically large nested list structures, generally passed along to/from specific "targets" (such as between client-side and server-side versions of an entity, ...). initial versions had used Deflated textual serializations, but I later implemented a direct entropy-coded binary serialization.

 

this protocol is also used for my voxel-terrain, though it is sort of a hybrid (generally, the actual voxel-chunk data is passed using large byte arrays, with the chunk-data being flattened out and RLE compressed). partly this is because passing every voxel as a list-based message would be a bit of a stretch...

 

(chunk-delta (origin -240  416 48) (size 16 16 16) ... (voxeldata (voxel :type dirt :aux 0 :slight 240 :vlight 0 ...) (voxel :type dirt ...) ...))

 

it is basically a problem of 16x16x16 * 32*32*8 * 4 * ... which would take some fairly absurd numbers of cons-cells...

 

so, passing the chunk data in a byte-serialized format seemed like a "reasonable" compromise here.

 

so, instead it is something more like:

(wdelta

    ...

    (voxdelta ...

        (rgndelta ... #Ah( ... ))

        (rgndetla ...)

        ...)

    ...)

 

where wdeta=world-delta, voxdelta=voxel-delta, rgndelta=region-delta, and #Ah(...) is a 1D byte array.

Edited by cr88192
0

Share this post


Link to post
Share on other sites


Then, honestly, I'd argue your pipeline is a tad broken ;)

It's more a matter of our process being broken :(

 

We don't sit next to our artists (nor even in the same time zone), so if an asset comes through buggy, you either learn to use the DCC tool, wait 6 hours for a fresh edition of the asset, or patch the XML up by hand. The latter option wins surprisingly often (hint: programmers mostly don't like using DCC tools).

0

Share this post


Link to post
Share on other sites

Honestly, I think XML might be hated a bit too religiously these days. Many of the biggest things make the biggest targets for criticism. Unless you're going full-featured in your XML usage, it's plenty readable and if you make proper use of attributes, it isn't that much bigger than things like JSON.

1

Share this post


Link to post
Share on other sites

I consider XML to be in the same category as COLLADA. ie, designed to reliably convey data between different systems, and nothing else.

  • They are both text-based and inefficient at storing data when compared to binary formats.
  • They can have complex structures that lead to slow parsing especially on larger files.
  • Despite what people say (and the original design intention), XML is absolutely not human readable.

The only difference I see is that people realised what COLLADA was intended for and treated it accordingly, whereas XML was (and still is) abused.

 

I've worked on a project where the original authors thought it would be a good idea to create an entire XML document on the fly using string concatentation, pass it to a stored procedure and query it as a table to pull out a few parameters.

 

Guess what brought down the entire system?... "&".

 

Granted, that's not XML's fault, but still...

0

Share this post


Link to post
Share on other sites

Well, COLLADA is an XML application so it makes sense they are similar. I agree very much that in most respects XML is best left on the near side of your build, with the exception being when you actually need human-readable markup as part of your program's content (or arguably, its not the worst choice you could make for configuration data *if* you've already taken the dependency anyways).

 

I tend to disagree that XML isn't human readable though -- the language itself is plenty readable, but many of its applications are too complex and/or verbose for that to be true in practice. Another sin some XML applications commit is not using the language correctly -- using attributes when children would be more apropos (or vice-versa), introducing too-many/not-enough "container" elements, improper use of namespaces, or failing to provide a means of validation for the application.

 

A straight-forward, well-designed, and well-supported XML application is usually a joy to use, modify, and build tooling around.

0

Share this post


Link to post
Share on other sites

Edit: Nevermind. Accidentally replied to something months old. I'm an idiot.

Edited by ambershee
0

Share this post


Link to post
Share on other sites

Honestly, JSON is my preferred data serializer. I use it in everything in lieu of XML.

 

The existing YAML parsers are, from experience of trying to integrate with C++, pretty bad or incomplete haha but its also a good format when its working. YAML 1.2 actually falls back on JSON which is cool (but I havent tested it). I tend to prefer YAML for config type files, and JSON for just about everything else (data stores, data transfer, web services, etc)

0

Share this post


Link to post
Share on other sites

From a strictly professional/commercial standpoint - I do EDI development as my day job.  Things such as EDIFACT, X12, HL7, FiX, etc. 

 

Back several years ago, many of these large, business-type data standards decided to try and push the market from using Length-Encoded textual files to markup files via XML tags.  It went horrible.  Those that implemented it probably wished they hadn't, and those that didn't still have to deal with those that did.  Here's an example of HL7v2 and HL7v3(XML-based).  Can you pick which one you'd rather try and troubleshoot and view data in?  I pick Option #1. I honestly wish XML would die.

 

HL7v2

MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4
PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^
^STATESVILLE^OH^35292||(206)3345232|(206)752-121||||AC555444444||67-A4335^OH^20030520
OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730|||||||||
555-55-5555^PRIMARY^PATRICIA P^^^^MD^^|||||||||F||||||444-44-4444^HIPPOCRATES^HOWARD H^^^^MD
OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105|H|||F<cr>

HL7v3

 <POLB_IN224200 ITSVersion="XML_1.0" xmlns="urn:hl7-org:v3"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<id root="2.16.840.1.113883.19.1122.7" extension="CNTRL-3456"/>
<creationTime value="200202150930-0400"/>
<!-- The version of the datatypes/RIM/vocabulary used is that of May 2006 -->
<versionCode code="2006-05"/>
<!-- interaction id= Observation Event Complete, w/o Receiver Responsibilities -->
<interactionId root="2.16.840.1.113883.1.6" extension="POLB_IN224200"/>
<processingCode code="P"/>
<processingModeCode nullFlavor="OTH"/>
<acceptAckCode code="ER"/>
<receiver typeCode="RCV">
   <device classCode="DEV" determinerCode="INSTANCE">
     <id extension="GHH LAB" root="2.16.840.1.113883.19.1122.1"/>
     <asLocatedEntity classCode="LOCE">
       <location classCode="PLC" determinerCode="INSTANCE">
         <id root="2.16.840.1.113883.19.1122.2" extension="ELAB-3"/>
       </location>
     </asLocatedEntity>
   </device>
</receiver>
<sender typeCode="SND">
   <device classCode="DEV" determinerCode="INSTANCE">
     <id root="2.16.840.1.113883.19.1122.1" extension="GHH OE"/>
     <asLocatedEntity classCode="LOCE">
       <location classCode="PLC" determinerCode="INSTANCE">
         <id root="2.16.840.1.113883.19.1122.2" extension="BLDG24"/>
       </location>
     </asLocatedEntity>
   </device>
</sender>
<! –- Trigger Event Control Act & Domain Content -- >
</POLB_IN224200>
Edited by DocBrown
0

Share this post


Link to post
Share on other sites

 

From a strictly professional/commercial standpoint - I do EDI development as my day job.  Things such as EDIFACT, X12, HL7, FiX, etc. 

 

Back several years ago, many of these large, business-type data standards decided to try and push the market from using Length-Encoded textual files to markup files via XML tags.  It went horrible.  Those that implemented it probably wished they hadn't, and those that didn't still have to deal with those that did.  Here's an example of HL7v2 and HL7v3(XML-based).  Can you pick which one you'd rather try and troubleshoot and view data in?  I pick Option #1. I honestly wish XML would die.

 

HL7v2

MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4
PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^
^STATESVILLE^OH^35292||(206)3345232|(206)752-121||||AC555444444||67-A4335^OH^20030520
OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730|||||||||
555-55-5555^PRIMARY^PATRICIA P^^^^MD^^|||||||||F||||||444-44-4444^HIPPOCRATES^HOWARD H^^^^MD
OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105|H|||F<cr>

HL7v3

 <POLB_IN224200 ITSVersion="XML_1.0" xmlns="urn:hl7-org:v3"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> 
<id root="2.16.840.1.113883.19.1122.7" extension="CNTRL-3456"/>
<creationTime value="200202150930-0400"/>
<!-- The version of the datatypes/RIM/vocabulary used is that of May 2006 -->
<versionCode code="2006-05"/>
<!-- interaction id= Observation Event Complete, w/o Receiver Responsibilities -->
<interactionId root="2.16.840.1.113883.1.6" extension="POLB_IN224200"/>
<processingCode code="P"/>
<processingModeCode nullFlavor="OTH"/>
<acceptAckCode code="ER"/>
<receiver typeCode="RCV">
   <device classCode="DEV" determinerCode="INSTANCE">
     <id extension="GHH LAB" root="2.16.840.1.113883.19.1122.1"/>
     <asLocatedEntity classCode="LOCE">
       <location classCode="PLC" determinerCode="INSTANCE">
         <id root="2.16.840.1.113883.19.1122.2" extension="ELAB-3"/>
       </location>
     </asLocatedEntity>
   </device>
</receiver>
<sender typeCode="SND">
   <device classCode="DEV" determinerCode="INSTANCE">
     <id root="2.16.840.1.113883.19.1122.1" extension="GHH OE"/>
     <asLocatedEntity classCode="LOCE">
       <location classCode="PLC" determinerCode="INSTANCE">
         <id root="2.16.840.1.113883.19.1122.2" extension="BLDG24"/>
       </location>
     </asLocatedEntity>
   </device>
</sender>
<! –- Trigger Event Control Act & Domain Content -- >
</POLB_IN224200>

 I'd have to pick none of the above. Honestly, that first one just looks like gibberish. Not that XML is any better, but still... Just total gibberish.

1

Share this post


Link to post
Share on other sites

I'd have to pick none of the above. Honestly, that first one just looks like gibberish. Not that XML is any better, but still... Just total gibberish.

 

It's actually rather simplistic and all your information is viewable in a tight, concise manner.

 

Each message that comes across a network interface has an MSH segment, a Message Header.  The message header displays information like who sent the message, who's supposed to receive it, what type of information are you going to find in the message, when was it sent, etc. 

 

MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4
 

 

Each line of the message consists of a 3 letter identifier that tells you what information is going to be housed in the following line. Such as the MSH(Message Header), PID (Patient Identification), OBR(Oberservation Request), OBX (Oberservation Result), etc. These lines are known as Segments.

 

Each segment contains a host of fields, sub fields, sub-sub fields, etc.

 

Each field is separated by a | delimiter, which each field containing a particular set of standardized data.

 

 

MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4

 

Take for instance the MSH segment - Field number 9.  This field states the Message Type (what type of information you can expect to see in the rest of the message).  MSH-9 in this example has a field and subfield delimited by the ^ symbol.  The caret delimits fields and subfields.

 

So MSH-9.1 is the Message Type, and MSH 9.2 (Subfield) is the Message Event.

 

MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4

 

The message structure looks like this:

 

[Message (Type & Event]

[Segment]

[Field]

[Subfield]

[Field]

[Subfield]

[Segment]

[Field]

[Subfield]

[Field]

[Subfield]

[Segment]

[Field]

[Subfield]

[Field]

[Subfield]

...and so on...

 

Anyways, figured I'd share.  It's not often I get to speak about my profession as it's so niche. tongue.png

Edited by DocBrown
0

Share this post


Link to post
Share on other sites

So you have to memorize a bunch of three-letter acronyms as well as memorizing standard field layouts in order to make sense of it? I can see how that would be somewhat easier for the expert, but not so much for anyone who hasn't spent their career memorizing such things.

 

Not quite.  There are a load of tools out there that does this for you.  Most of the tools you use to build the interfaces have this automatically added to them.  Much like trying to figure out where everything is in the BCL in .NET, Intellisense works wonders, as well as the Library Explorer. 

 

That being said.  Most people revert back to the standard specifications that are released by HL7.org, as this positional data tends to change slightly depending on the version of the standard you're using. (Again, much like the different .NET Framework versions)

Edited by DocBrown
0

Share this post


Link to post
Share on other sites
I've gatta agree with JT and bact, the v2 is complete gibberish without some form of documentation to assist you reading it. At least the xml version has human readable tags that can somewhat assist in figuring out whats's going on.

Also the tool argument is pretty moot since the purpose of a tool is to abstarct away from the underlying serializing format.
0

Share this post


Link to post
Share on other sites

Those records reminded me of my days as an abinitio developer.   In AI you have record files that define the fields along with the field type and whether or not it is fixed length or deliminated. You could probably do the same thing in c# with annotations and a parser. 

 

A record description file might look like this:

 

record  monster

{

int id(3)

string name("\0")

int hp (5)

int attack (3)

int defense (3)

}("\n")

 

then in the data file you get the following:

001Goblin\000250020005\n

002Spider\000150010007\n

003Rat\000050002002\n

004Goblin King\059000350135\n

 

Generally I find that the data storage structure you use depends on your needs.  XML was popular because it was well defined and well structured.  Xml readers and writers are common and easy to use and so you can quickly parse the data you want with linq or xpath.  You can also use style sheets to define the data and check that its well formed which makes it useful for api's and other B2B applications.  Its also useful if you only new need a subset of the data in the file.

 

But these days JSON is pretty popular and widely used in the web which means there are plenty of tools to read and write it.  The files are also much smaller than xml files which is useful when you have to worry about the size of data.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0