Premshree Pillai ([info]premshree) wrote,
@ 2005-01-25 22:31:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Current music:Deep Purple - Black Night

REBOL: A better XML?
Carl has always been critical about XML. XML is verbose—that’s not news: XML, as a means for information exchange, contains a lot of redundant information. Compression of XML documents for information exchange is common.

In his recent article, Carl talks about XML, comparing it (inevitably, and for good reason too) with REBOL. Using REBOL blocks to represent semantic information makes the document readable, like XML, and at the same time is less verbose.

As an example, quoting Carl:

XML uses its metadata tags as “grouping” quotes everywhere. This extends to all levels of structure. So, to create a customer record in XML you write:
<customer>
    <name>Bob Smith</name>
    <email>bob@example.com</email>
    <site>http://www.example.com/bob</site>
    <age>27</age>
    <phone>555-1212</phone>
    <city>Ukiah</city>
</customer>
The <customer> tags indicate the bounds of the customer data record.

In REBOL, there is a single mechanism for all groups of data: blocks. The above example would be:
customer: [
    name:  "Bob Smith"
    email: bob@example.com
    site:  http://www.example.com/bob</website>
    age:   27
    phone: #555-1212
    city: "Ukiah"
]
Here REBOL’s block symbols [ ] are used to indicate the bounds of the customer data record. This method works for all levels of structure.
As Carl mentions, in certain cases semantic may be implied too:
The XML situation becomes even worse when a sequential series of values needs to be expressed. Suppose the customer record above is extended to indicate products of customer interest:
<interests>
    <product>cpu</product>
    <product>memory>/product>
    <product>disk</product>
</interests>
Even if this record were 1000 products long, the same redundant tags would be applied. That's because XML uses tags as delimiters (quotes).

In REBOL, you would recognize that the products all come from the same semantic domain, so you can imply the semantics in such cases and just write:
interests: [cpu memory disk]
Now that’s something you couldn’t do with XML.

XML attributes don’t need a different way of representation in REBOL blocks:
<search type="advanced" query="premshree" results="2">
    <result>foo</result>
    <result>foo</result>
</search>
The REBOL equivalent of the above would be:
search: [
    [type "advanced" query "premshree" results 2]
    ["foo" "bar"]
]
Ah, now comes the issue of parsing. As you can see, the second block within the search block is a hash. To be able to distinguish a hash block from an array block, some sort of convention could be adopted: having every first block within an element necessarily to be an attribute block—a hash block, that is—an empty first block in case of absence of any attributes. Umm, come to think of it, this is not really a good idea. But surely, there are ways.

Heh, I sound like REBOL is replacing XML.


(Post a new comment)


sriramb
2005-01-25 05:00 pm UTC (link)
what about good old CSV files, with CRC checks for the header and the data blocks?

(Reply to this)(Thread)


[info]premshree
2005-01-25 05:06 pm UTC (link)
Umm, but how would you represent the information semantics?

(Reply to this)(Parent)(Thread)


sriramb
2005-01-26 05:04 am UTC (link)
nested/tree based schemas are not good fits. CSV files work great when you have to import data into a warehouse and churn cubes. Of course, you have to have a good ETL process defined too. In most projects that i architect, i provide the ability to extract data in XML formats, but stick to CSV for imports. LZW with CSV works like a charm even on dial-up accounts.

(Reply to this)(Parent)(Thread)


[info]premshree
2005-01-26 09:38 am UTC (link)
But nested/tree based schemas are common enough to warrant existence of ways to represent semantic information.

Yes, when the schema allows for it, I would stick to CVS imports/exports. (That’s what I did in one of my earlier projects.)

(Reply to this)(Parent)(Thread)


sriramb
2005-01-26 06:37 pm UTC (link)
We think of nested/tree based schemas and semantic representations because of our need/mindset to model the data to object models. We wish to achieve object<->data mappings as cleanly and efficiently as possible. Nothing wrong with this design approach. However, i come from a school of thought (and generation) that maps the data to the persistence layer first. Once the data is persistent, mixing data and objects becomes easy. I guess it is a way of thinking. I for one will change my ways when i have a viable, production-level object oriented database. Good discussion. We must meet IRL and exchange ideas.
(maybe i'll apply for those yahoo bangalore openings!)

(Reply to this)(Parent)(Thread)


[info]premshree
2005-01-27 06:39 am UTC (link)
Ah, when you think that way, there’s no arguing. People often tell me that Ruby isn’t as intuitive as Python. But, hey, if you can think in closures, what’s the problem!

Umm, but I must hasten to add that it’s different with programming languages. We have the choice (and it certainly is going to be that way) to choose a particular language. But when it comes to representing information—semantic information, to be more specific—we hope for a standard. That is, in a way, we hope not to have to choose; not to have a choice to choose, rather. However, when that standard no longer seems viable, it leads to proliferation—YAML comes to mind.

As you said, it’s about how most people think. Others have to make a few compromises.

If you’re applying (which I know you’re not), send your résumé to me. :D

(Reply to this)(Parent)


[info]swaroopch
2005-01-25 05:37 pm UTC (link)
Looks nice :)

Hopefully, Python can catch up in the XML game as well.
Few links that seem to suggest this:

http://www.advogato.org/article/810.html
http://effbot.org/zone/element-index.htm
http://www-106.ibm.com/developerworks/library/x-matters28/

- Swaroop
www.swaroopch.info

(Reply to this)(Thread)


[info]premshree
2005-01-26 10:19 am UTC (link)
Umm, this wasn't really about _programming_ languages and XML. But about a better way to represent semantic information.

(Reply to this)(Parent)(Thread)


[info]swaroopch
2005-01-26 11:02 am UTC (link)
Looking at it from a different angle, I was thinking of the same but keeping in mind how to inter-operate in a heterogeneous system...

- Swaroop
www.swaroopch.info

(Reply to this)(Parent)(Thread)


[info]premshree
2005-01-26 11:38 am UTC (link)
Ah, inter-operation. IAC, there has to be an independent entity (way) for information representation.

You’re not at work, are you? :-?

(Reply to this)(Parent)(Thread)


[info]swaroopch
2005-01-26 11:42 am UTC (link)
Nope, I'm at home.

What about you!

(Reply to this)(Parent)(Thread)


[info]premshree
2005-01-26 11:43 am UTC (link)
Office. :) (I thought you could join me for coffee.)

(Reply to this)(Parent)(Thread)


[info]swaroopch
2005-01-26 11:46 am UTC (link)

Dude, don't get used to going to office on holidays! (..speaking from experience ;-)

(Reply to this)(Parent)


(Anonymous)
2005-01-26 11:31 am UTC (link)
It would be just great to have REBOL blocks instead of XML all around the internet BUT this could only be possible if REBOL turns into a free tool as XML is so more people become not afraid to use it IMO. carlos.lorenz at gmail.com, a reboler

(Reply to this)(Thread)


[info]premshree
2005-01-26 11:34 am UTC (link)
Yes, REBOL as a replacement to XML is something I cannot imagine easily—for multiple of reasons, including the one you mentioned.

(Reply to this)(Parent)


[info]stillcarl
2005-01-27 06:15 am UTC (link)
RT are dipping their toes in the water with regards to open-source. From here: http://www.rebol.net/article/0086.html

During 2005 we plan to make the C source code to a variety of REBOL subsystems available to developers. These modules include console handling code, windowing, graphics, fonts, event handling, time/timing, file interfaces, networking, and probably more.

Here at RT we want our primary focus to be on improving the REBOL language itself - adding features and making fixes. Whenever possible we would prefer not to spend our time working on the text console, fixing strange bugs in X Windows, debugging fonts in Mandrake Linux, or diving into the details of porting REBOL to other operating systems (OS X and Win CE immediately come to mind but perhaps you can think of others). Don't get us wrong. We want to see all these projects happen, we just don't have the resources to do all of them at the same time.


and here: http://www.rebol.net/article/0057.html

REBOL/Services will be provided as an open standard with the source code available from the REBOL web site. The license will allow free use for personal, commercial, and educational purposes. We hope developers will contribute their own enhancements, fixes, and independent command-set modules, as well as documentation, tutorials, and examples.

Time will tell how far they intend to take this.

(Reply to this)(Parent)(Thread)


[info]premshree
2005-01-27 08:32 am UTC (link)
Open-source is the first step. REBOL is promoted as a messaging language. However, I’d imagine something more needs to be done to position itself as a way to represent information. REBOL, being a means to exchange information, they’ll have to do some kinda balancing act.

There I go again... REBOL replaces XML! :-)

(Reply to this)(Parent)

Alternate representations to XML
(Anonymous)
2005-01-26 04:54 pm UTC (link)
Have you looked at SLiP? http://www.scottsweeney.com/projects/slip.
It's designed to be easier to represent information if you
are typing it by hand. The parser can then blurt out XML or
(technically should be possible) to any other format.

Another popular attempt has been YAML (http://yaml.org),
though more to represent portable data across various
dynamic languages.

--Hemanth P.S.

(Reply to this)(Thread)

Re: Alternate representations to XML
[info]premshree
2005-01-27 05:57 am UTC (link)
I, somehow, am not comfortable with the idea of indentation to represent data. The advantage of something like REBOL is that it is less verbose, and also flexible.

(Reply to this)(Parent)


[info]mannu
2005-01-30 07:21 am UTC (link)
What programs care about is not how the data is stored but how it is made available. If data can be stored in this format without breaking existing programs that are written to read XML, then it makes sense. What you'd need is REBOL versions of some of the popular XML libraries.

(Reply to this)(Thread)


[info]premshree
2005-02-01 08:09 am UTC (link)
As of now, REBOL has a “limited” parse-xml.

There’s a SAX XML parser. If it wouldn’t have been for archive.org... :)

(Reply to this)(Parent)


Create an Account
Forgot your login?
Login w/ OpenID
English • Español • Deutsch • Русский…