suburbia

Premshree's (品速力) Personal Weblog

etc.

Previous Entry Share Next Entry
XML, YAML, REBOL here
suburbia
premshree

Comparing XML, YAML, REBOL file sizesI wrote a script to generate and analyze XML, YAML, and REBOL file sizes—with different depth values. The structure I used is pretty simple:

<?xml version="1.0"?>
<node>
   <node>
      <node>
         ...
            <node>value</node>
         ...
      </node>
   </node>
</node>

Apart from being inflexible, the reason why indentation (YAML block structure) is bad is pretty obvious.

Of course, to get a better idea, I’d have to analyze various sample structures. Nonetheless, what I’m trying to say is that there’s definitely better ways to represent semantic data. File size becomes important when it comes to serving feeds—RSS, et al.—bandwidth is a major concern.

Consider the following RSS 2.0 file:

<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      <title>My Blog</title>
      <link>http://myblog.com/blog</link>
      <description>My life</description>
      <language>en-us</language>
      <pubDate>Tue, 01 Jan 2004 04:00:00 GMT</pubDate>
      <lastBuildDate>Tue, 01 Jan 2004 21:30:00 GMT</lastBuildDate>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs>
      <generator>My Generator</generator>
      <managingEditor>me@myblog.com</managingEditor>
      <webMaster>webmaster@myblog.com</webMaster>
      <item>
         <title>Hotel Foo</title>
         <link>http://myblog.com/archives/2005/hotel-foo.html</link>
         <description>I love hotel foo</description>
         <pubDate>Tue, 01 Feb 2005 09:50:00 GMT</pubDate>
         <guid>http://myblog.com/archives/2005/hotel-foo.html</guid>
      </item>
   </channel>
</rss>

It’s YAML equivalent would be:

channel:
 title: My Blog
 link: http://myblog.com/blog
 description: My life
 language: en-us
 pubDate: Tue, 01 Jan 2004 04:00:00 GMT
 lastBuildDate: Tue, 01 Jan 2004 21:30:00 GMT
 generator: My Generator
 -item:
  title: Hotel Foo
  link: http://myblog.com/archives/2005/hotel-foo.html
  description: I love Hotel Foo
  pubDate: Tue, 01 Feb 2005 09:50:00 GMT
  guid: http://myblog.com/archives/2005/hotel-foo.html

In its least verbose form (eliminating unnecessary whitespaces), the XML format takes little more than 70% more space than its YAML equivalent. That’s a big difference.

I have started using YAML for certain work-related activities. If I find Time (which is extremely difficult), I’ll try to dig into this some more.


REBOL link

(Anonymous)

2005-02-01 06:56 pm (UTC)

Also points to yaml...
BTW, looks like XML wins when depth is big, right?
JJ

When compared to YAML, yes.

Of course, this is _not_ a thorough analysis—I have considered only one structure.

Re: REBOL link

(Anonymous)

2005-02-01 08:33 pm (UTC)

so, you have to have a structure nested more than 15 deep to have YAML be worse than XML? I've never seen a real-life structure that deep -- have you?

Regardless. AS some other poster mentioned, if you are looking for small downloads -- just use gzip.

I was just experimenting with different depth levels. I haven’t seen a real-life structure that deep. :)

File size becomes important when it comes to serving feeds—RSS, et al.

Your RSS feed should be gzip compressed.

Lose Weight, Save Money with Compression!
Leknor.com - Code - gziped?

A gziped RSS feed would take more than 40% more space than its gziped YAML equivalent.

illegal yaml

(Anonymous)

2005-02-01 08:42 pm (UTC)

In your example YAML you have some illegal YAML, generator: My Generator -item: title: Hotel Foo perhaps you meant to write, generator: My Generator item: - title: Hotel Foo link: ... - title: Bingles link: ...

illegal yaml

(Anonymous)

2005-02-01 08:44 pm (UTC)

In your example YAML you have some illegal YAML,
generator: My Generator
-item:
 title: Hotel Foo
perhaps you meant to write,
generator: My Generator
item:
- title: Hotel Foo
  link: ...
- title: Bingles
  link: ...

item is meant to be a sequence. No problems parsing it using yaml4r.

I'm not the anonymous poster, but you forgot about the space after the dash.

Ah, my bad. [I managed to use it with yaml4r, though :-)]