suburbia

Premshree's (品速力) Personal Weblog

etc.

XML, YAML, REBOL here
suburbia
[info]premshree

Comparing XML, YAML, REBOL file sizesI wrote a script to generate and analyze XML, YAML, and REBOL file sizes—with different depth values. The structure I used is pretty simple:

<?xml version="1.0"?>
<node>
   <node>
      <node>
         ...
            <node>value</node>
         ...
      </node>
   </node>
</node>

Apart from being inflexible, the reason why indentation (YAML block structure) is bad is pretty obvious.

Of course, to get a better idea, I’d have to analyze various sample structures. Nonetheless, what I’m trying to say is that there’s definitely better ways to represent semantic data. File size becomes important when it comes to serving feeds—RSS, et al.—bandwidth is a major concern.

Consider the following RSS 2.0 file:

<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      <title>My Blog</title>
      <link>http://myblog.com/blog</link>
      <description>My life</description>
      <language>en-us</language>
      <pubDate>Tue, 01 Jan 2004 04:00:00 GMT</pubDate>
      <lastBuildDate>Tue, 01 Jan 2004 21:30:00 GMT</lastBuildDate>
      <docs>http://blogs.law.harvard.edu/tech/rss</docs>
      <generator>My Generator</generator>
      <managingEditor>me@myblog.com</managingEditor>
      <webMaster>webmaster@myblog.com</webMaster>
      <item>
         <title>Hotel Foo</title>
         <link>http://myblog.com/archives/2005/hotel-foo.html</link>
         <description>I love hotel foo</description>
         <pubDate>Tue, 01 Feb 2005 09:50:00 GMT</pubDate>
         <guid>http://myblog.com/archives/2005/hotel-foo.html</guid>
      </item>
   </channel>
</rss>

It’s YAML equivalent would be:

channel:
 title: My Blog
 link: http://myblog.com/blog
 description: My life
 language: en-us
 pubDate: Tue, 01 Jan 2004 04:00:00 GMT
 lastBuildDate: Tue, 01 Jan 2004 21:30:00 GMT
 generator: My Generator
 -item:
  title: Hotel Foo
  link: http://myblog.com/archives/2005/hotel-foo.html
  description: I love Hotel Foo
  pubDate: Tue, 01 Feb 2005 09:50:00 GMT
  guid: http://myblog.com/archives/2005/hotel-foo.html

In its least verbose form (eliminating unnecessary whitespaces), the XML format takes little more than 70% more space than its YAML equivalent. That’s a big difference.

I have started using YAML for certain work-related activities. If I find Time (which is extremely difficult), I’ll try to dig into this some more.