Posts Tagged ‘Roxy’

Grokking MarkLogic’s Roxy Framework

August 17, 2012
 
Quick Walk Through of the MarkLogic Roxy Framework

Intro

For the past few months, I’ve been heads down working in the exciting Big Data software development world. I prefer to call it the post-relational document database world. My focus has been the technology around Big Data but there’s also an amazing social aspect. We see groups like Code For America, Data without Borders (DataKind) and NYC Open Data using Big Data to drive social change.

But social aspects are a topic for a later time. For now, it’s all about the code.

MarkLogic

Of course, MarkLogic is my preferred big data platform. I use it to de-normalize data so that the database engine and search engine can be the same thing.

For those unfamiliar with MarkLogic, it’s a document data store that was designed to handle extremely large amounts of unstructured data using the XML technology stack (XML, XQuery, XSLT, etc.).

The term “unstructured data” is often a topic for debate. We typically see structured data which I consider to be relational data or schema validated XML documents. There’s also semi-structured and unstructured data. One could argue that all data has structure. I typically consider semi-structured data to be partially validated XML or JSON documents. I consider unstructured data to be a set of XML or JSON documents that may have a common header but also have a payload that has no structure and can contain anything (xml, text, binary).

Document databases like MarkLogic, MongoDB, and Couchbase do away with the need to shred data into rows and columns. Aside from being unnecessary, it’s also not feasible when dealing with petabytes of data.

A key capability that MarkLogic provides is agile database development. A MarkLogic developer has the ability to ingest large amounts of data while having very little knowledge of the underlying data structures. Once the data is ingested, indexes can be added and data structures tweaked to provide the desired results. This agile development process ultimately leads to higher developer productivity, higher quality, and quicker time to market.

Semantic Linking

Document databases are also ideal for document linking. We’re just starting to realize the value of linking documents semantically. See Kurt Cagle’s Balisage 2012 paper for an interesting approach to linking documents by appending an “assertion node” to each document. The assertion node contains a “triple store” that’s used to describe the document’s relationship with other documents. These Semantic links can then be used for semantic reasoning which is a topic for another day.

Roxy Framework

Now that I gave some background, let’s talk about building MarkLogic apps using Roxy. I build most of my MarkLogic apps using the Roxy framework.

Roxy (RObust XquerY framework) is a well-designed Model-View-Controller framework for XQuery.

For now, MarkLogic’s primary API is XQuery. However, stay tuned. A rich Java and C# API is coming soon. Of course, there’s also the MarkLogic RESTful API called Corona.

I’m personally a big advocate for XQuery. XQuery is a fully fledged dynamic functional programming language. You can accomplish a lot with a small amount of code. For more info see Nuno Job’s XQuery Presentation.

Roxy’s big 3 features that makes developers immediately productive are:

  1. MVC – Write code using Model View Controller (MVC) pattern.
  2. Test Facility – facilitates Test Driven Development (TDD)
  3. Deployer – simplifies the deployment process.

You can get Roxy here. => http://github.com/marklogic/roxy

The Roxy MVC utilizes ideas from:

  1. Ruby on Rails – http://guides.rubyonrails.org/
  2. Cake PHP – http://cakephp.org/
  3. DRY (don’t repeat yourself)

I’ll won’t drill down on the MVC mechanics right now but its worth noting the following image. It shows the Ruby on Rails style “convention over configuration” URL to MVC routing. I’ll discuss further in a future screencast.

roxy-url

The screencast above will show a simple example of ingesting blog post data from the blog site Boing Boing. You can get a copy of the boing boing blog archive here. This blog archive file contains 63,999 blog posts.

I built two simple MarkLogic search apps using this data set.

  1. App Builder Version – good for a quick demo of the search capability but difficult to extend.
  2. Roxy Version – more flexible, easier to modify.

App Builder Version is here. => http://ps.demo.marklogic.com:8043/

Roxy Version is here. => http://ps.demo.marklogic.com:8090/

The code is zipped as boing-roxy-code.zip and is posted in the following directory.

     => http://sdrv.ms/12HMD6y