While at MarkLogic World, I also had the great pleasure of reconnecting with the wonderful RSuite team. I’m a big advocate of the RSuite content management system.
This past summer, I had an amazing opportunity to work at Harper Collins where I helped to deploy their new content management services using RSuite. My curiosity with RSuite stemmed from my work as a professional services consultant at MarkLogic. MarkLogic has a loyal base of customers in the media/publishing industry.
Media/publishing customers choose MarkLogic to store their content which consists of text based documents and binary assets (photos, audio, video) with the respective metadata.
MarkLogic lowered the pricing in 2013. The new pricing made it more affordable to store binary assets in MarkLogic. Keeping the binary assets together with the text based content also greatly simplifies the infrastructure and management overhead.
The RSuite secret sauce is the DITA Open Toolkit. The other key component is MarkLogic.
The workflow engine is provided by jBPM which uses MySQL to store the workflow configurations and drives the finite state machine.
The DITA Open Toolkit provides the “multi-channel output” feature needed by most publishers. This is the ability to render the content to many formats such as PDF, ePub, XHTML, Adobe In Design, Word docx, any format.
The DITA acronym is Darwin Information Typing Architecture. It is an XML Data Model for Authoring and Publishing.
Eliot Kimber is the force behind DITA. Here’s some useful links.
Publishers should avoid using XHTML as the storage format of book content for many reasons. The industry standard format is DITA or DocBook because it provides a higher level of abstraction that makes it much easier to have “multi-channel output”. These standards are also more flexible when providing custom publishing services.
The DITA format is especially interesting because of the specialization feature that makes XML structures polymorphic.
Some key points about DITA:
Topic Oriented
Each Topic is a separate XML file
DocBook is Book Oriented
DITA Initial Spec in 2001
DocBook Initial Spec in 1991
Core DITA Topic Types are Concept, Task, and Reference
Specialization: This is subtyping where new topics are derived from existing topics.
Darwin term is used because the polymorphic specializations provide an evolution path.
DITA Map XML document is used to stitch the Topic XML documents.
I had an opportunity to chat with Norm Walsh about it at this week’s MarkLogic World event. Norm is the author of DocBook: The Definitive Guide. He’s also an active member in a few of the XML/JSON standards committees.
DITA is a competing standard to DocBook. Norm wrote this interesting blog post about DITA back in October 2005.
My question is which one has better support for semantic annotations. Most content these days is semantically enriched using multiple Ontologies. This is needed so that SPARQL queries can be used to provide Dynamic Semantic Publishing services.
In addition to the screencast above, the following screencasts will take a deeper dive into the RSuite software and DITA Open Toolkit.
Please take a look. Hopefully, the screencast will shed some more light on the need to store content using a higher level of abstraction (DITA or DocBook).
The screencasts will also show the value of RSuite as a full blown Content Management and Digital Asset Management (DAM) Solution.
Be sure to watch the embedded screencast at the top of the page too.
Background
As noted in the readme, this is a simple application that was built using a customized Application Builder app. Unfortunately, it uses the older MarkLogic 5 Application Builder code. The latest Application Builder that comes with MarkLogic 6 (ML6) has undergone a significant architectural change. The ML6 version generates code that is much more declarative as it utilizes XSLT with the ML6 REST API.
A good write up on how to customize the ML6 App Builder Code is posted here.
I’ll have a newer version of the Document Discovery app that is built with the ML6 App Builder in a future blog post. For now, this version is better as a tutorial because it makes it easier to follow the code for:
CPF
Binary File Metadata
Field Value Query
How to annotate and add custom ratings?
Content Process Framework (CPF)
I like to refer to the MarkLogic CPF as a Finite State Machine (FSM) for documents. In this case, the document’s state will change or transition from one state to another via a triggering event or condition.
A typical FSM is defined by a list of its states, and the triggering condition for each transition.
For MarkLogic CPF this is done with Pipelines. A pipeline is a configuration file that contains the triggering conditions with respective actions for each state transition.
This application will use CPF for entity enrichment. In this case the entity will be an arbitrary binary file.
I find that entity enrichment is the most common use case for CPF.
Of course, the online Content Processing Framework Guide is the ultimate source to help you become fluent in CPF.
In the mean time, this post will provide the quick start steps needed to get this Document Discovery app configured to use a CPF pipeline.
In this case, the CPF pipeline will execute a document “scan” on an arbitrary binary document whenever it is inserted into the database.
The triggering event for this will be the xdmp:document-insert() which puts the document into the “initial” state.
Binary File Support
A new ability to ingest a wide range of binary files was added to MarkLogic in version 5. This feature is referred to as an ISYS Document Filter.
The ISYS (Document) Filter was provided by a company called ISYS Search Software, Inc. The company has since changed its name to Perceptive Search, Inc.
The new ISYS filter extracts metadata from a wide range of binary documents (word, powerpoint, jpg, gif, excel, etc.). The list of supported binary files are listed here.
Please note the attributes used in the meta elements shown above.
The CPF “initial” pipeline code will modify the above meta elements to be as follows.
The 3 elements in the blue circle were added later when the user adds a comment and/or rating to the document.
Field Value Query
In this app, the Author facet was implemented as a Field Value Query. A Field Value is used because the search (facet) also wanted to include the <typist> node.
Example:
The following XML shows 3 elements that are valid Authors (Last_Author, Author, and Typist).
Adding a Field Value Index to the database provides an ability to unify the values during a search. It also provides a weighting mechanism where the <Last_Author> element can be boosted to have a higher relevance than the <Typist> element.
Here’s a snapshot of the admin page used to configure it.
JavaScript for the annotation and custom ratings is in a file called /src/custom/appjs.js.
Here’s the source code directory structure.
The JavaScipt code will display the User Comments dialog box to capture the Commenter’s Name and Comment as shown.
The code that provides the web form post action is in /src/custom/appfuncations.xqy
The post action sends the request to a file called/insertComment.xqy which then does either the xdmp:node-insert() for new comments or xdmp:node-replace() for updates.
A similar approach is done for the user ratings capability. Here’s a snapshot of the UI.
Installation
The source code that is currently posted on GitHub did not work initially.
I made the following changes to make it work.
Config File – modify the Author field value index
Config File – make compatible with MarkLogic 6
Config File – include a dedicated modules and triggers DB.
Pipeline – modify pipeline xml to use code in the /cpf/ingest directory.
Deployed source code to a modules databases instead of the file system.
Added additional logging to verify pipeline code process.
Here’s the steps that I used to get the application running properly.
Installation Steps:
Import the configuration package that’s located in the /config directory. This package will create and configure the 3 databases (DocumentDiscovery, doument-discovery-modules, and Document-Discovery-Triggers).
Install the pipeline that is in the /ingest directory.
This is done by copying the 2 files in the /ingest directory to the /Modules/MarkLogic/cpf/ingest directory. You may need to create this directory directly under the modules directory of your MarkLogic installation (see next image). Once copied, then follow the normal pipeline loading process.
Ensure that only the following pipelines are enabled for the default domain in the DocumentDiscovery database (DO NOT ENABLE DOCUMENT CONVERSION).
Install: Document Filtering (XHTML), Status change handling, and the custom Meta Pipeline. You’ll need to load the Meta Pipeline configuration file.
Deploy Source Code – be sure to modify the http server to use the proper modules database.
Load data using the admin console, Information Studio or qconsole.
Observe ability to search and view the binary documents!
Image: Shows CPF Pipeline Code located in the MarkLogic installation directory.
Image: Document Discovery App showing Search, Facets, and User Star Ratings
Conclusion
Hopefully, this post will help you get started using MarkLogic for Document Discovery.
The exciting news is the soon to be released MarkLogic 7 (ML7).
ML7 will have new semantic technology features that will take this simple document repository to a whole new level.
This is because ML7 will have the ability to add richer metadata to each document. The richer metadata is triples.
A triple is a way to model a fact as a "Subject, Predicate, Object". Many triples can be added to a document as a sets of facts. These facts can then be incorporated into queries that that can make inferences about the “subjects”.
From these facts, the following inferences can be made:
Henry James and Jane Doe co-authored document ISBN-125.
Sandra Day and Jane Doe were college classmates
In this example, the facts (triples) are used for knowledge discovery.
In this case, the knowledge discovery or inferences can be accomplished with a minimal coding effort using the simple data structures and a rich query API.
Learn how to add authentication to a MarkLogic Roxy App.
This blog post, I will show the code needed to add a simple authentication service.
In the previous post, I created the two column layout where the top of the home page had a login form. At the time the login form was not wired up.
For this post, I will wire up the login form. To do this, I’ll show how to build a simple authentication service that searches a user database, verifies the password, and generates an authentication token that will expire after 5 minutes.
This demo application will also show how to provide a simple RESTful API for search. This Search API will utilize the authentication token to restrict access to the search service.
The app will not show any role based restricted views. A more fully featured role based access control will be shown in a future post.
A zip file containing the source code for this demo application is posted here. => source code
Overall Approach
The solution will use the following items to authenticate a user and create an application token that gives the user access to the RESTful API for a 5 minute period.
Registration Form – used to create the user profile documents. This form should only be visible to an admin user but is currently visible to all for demo purposes.
User Directory – Each user will have a dedicated user directory in the MarkLogic database.
User Profile Document – User profile document (/users/janedoe/profile.xml) will reside in the user’s directory. It will contain the username/password. The username must be unique. It can also be used to store the user’s role and organization information. This demo will not utilize a party management solution but it can be extended to do so.
Session Document – A session document will be created when a user successfully logs in. The Session Document will be stored in the User Directory.
Authentication Token – the token will be stored in the session document. Each RESTful API request must include the token in its header.
ROXY Router – Will be the checkpoint or the single point of entry for each request. This is where the user token is verified for each RESTful API request. A key function called Find-Session-by-Token() verifies the session and dispatches the request if valid.
Login – If the username and password is valid, the token is created. If a token already exists and has not yet expired then the same token will be used.
Session Expiration – Session Expiration will be 5 minutes from initial login. The 5 minute duration is for demo purposes. Typical session expiration duration is 24 hours. Session expiration time will be UTC based. UTC is Coordinated Universal Time.
Logout – Terminates the session by deleting the session document that contains the token.
Related Notes:
Passwords are never part of the RESTful API transport except the Login API request.
“Remember Me” cookie – This solution can support a “Remember Me” cookie where the token is stored in the cookie and not the password. Remember Me cookies typically expire after 90 days which is longer than the token expiration period.
Verify API – A good approach for refreshing the token stored in a “90 day login cookie” is described here. A Verify API is typically used to verify the Username and Token. If they match then a new token is generated whenever the existing token has expired. The 90-day cookie web app will need to call the Verify API to refresh the token stored in the cookie.
Passwords are currently stored in the User Profile doc but they are MD5 hashed.
Current solution shows how to use the MarkLogic Search API with the user profile document, session document and token to provide an adequate security solution.
OAuth2 – Open Authentication version 2 (OAuth2) is a widely used protocol that provides a federated user profile solution. The key benefit for this example is that user passwords do not need to be stored in MarkLogic. However, this is a topic for a future post. The OAuth2 developer details are here: https://developers.google.com/accounts/docs/OAuth2
1. Registration Form
The registration form creates the user profile data.
The registration form above creates the user profile data that is stored in a User Profile Document in the respective user directory. The Session Document is also stored in the User Directory.
The above document URI has the expiration date/time appended to it. Some JavaScript client code will use the appended expiration date/time to trigger a token refresh.
Roxy code that creates and deletes the session document is:
web login controller – /apps/controllers/appbuilder.xqy
This string needs to be added to the request header of each RESTful API request. If not the response will be a “401 unauthorized” error. The token must be prefixed with “X-Auth-Token” as follows.
The source code that extracts the X-Auth-Token value is in the router code. See line 87 of /src/app/lib/router.xqy.
let $token := xdmp:get-request-header("X-Auth-Token")
If using Firefox Poster tool, the header can be added as shown.
6. ROXY Router:
As discussed above, the router is the checkpoint for all http requests. It is the ideal place to apply a security policy logic such as:
Token check
Requests per minute
Maximum Requests per day
This post only handles the token check but this code can be extended to support all security policy logic.
The following xquery code handle the token check. Please note that certain request (e.g., login, ping) bypass the token check.
let $valid-request :=
if(fn:not($config:SESSION-AUTHENTICATE)) then fn:true()
elseif(xs:string($controller) = ("ping")) then fn:true()
elseif(xs:string($controller) = ("login")) then fn:true()
elseif(xs:string($controller) = ("logout")) then fn:true()
elseif(xs:string($controller) = ("verify")) then fn:true()
else
(
let $token := xdmp:get-request-header("X-Auth-Token")
returnif($token) then
(
let $valid-session := auth:findSessionByToken($token)
returnif($valid-session) then
(
fn:true(),
auth:cacheSession($valid-session)
)
else
fn:false()
)
else fn:false()
)
7. Login Code:
The login code does the following:
Find user profile document – Searches the the user profile documents using the username.
Check password – If a document with the username is found then the password is checked.
Find session document by username – If the password matches then code looks for a session document with its respective expiration date.
Session Document – If the session expiration has not expired then use current session document. If session document has expired then delete it and then create a new session document containing new Authentication Token.
The username and password is bundled into the request using the Authorization Header.
So the request header will need this:
Authorization: Basic Z3J1c3NvOnBhc3N3b3Jk
The encrypted string after the word Basic contains the base64 encoded username and password.
Here the code that extracts the username/password is in the login controller (/src/app/controllers/login.xqy).
declare function c:main() as item()*
{
let $userPwd :=
xdmp:base64-decode(
fn:string(
fn:tokenize(
xdmp:get-request-header("Authorization"), "Basic ")[2]
)
)
let $username :=
fn:string(
(xdmp:get-request-header("username"),
fn:tokenize($userPwd, ":")[1])[1]
)
let $password :=
fn:string(
(xdmp:get-request-header("password"),
fn:tokenize($userPwd, ":")[2])[1]
)
let $result := auth:login($username, $password)
return
(
ch:add-value("res-code", xs:int($result/json:responseCode) ),
ch:add-value("res-message", xs:string($result/json:message) ),
ch:add-value("result", $result),
ch:add-value(
"res-header",
element header {
element Date {fn:current-dateTime()},
element Content-Type
{
req:get("req-header")/content-type/fn:string()
}
}
)
)
};
The code that searches for a session document by username and its expiration date/time uses the following function. Please note the element range index query.
declare function auth:findSessionByUser($username)
{
let $query :=
cts:and-query((
cts:directory-query(auth:sessionDirectory($username),"infinity"),
cts:element-range-query(
xs:QName("expiration"),">",
auth:getCurrentDateTimeUTC())
))
let $uri := cts:uris("",("document","limit=1"), $query )
return
fn:doc($uri)
};
8. Session Expiration:
The code to check the session expiration is invoked by the router code.
See line 89 in /src/app/lib/router.xqy
auth:findSessionByToken($token)
Here’s the code. Please note the element range query.
declare function auth:findSessionByToken($token as xs:string)
{
let $query :=
cts:and-query((
cts:element-attribute-value-query(
xs:QName("session"),
xs:QName("user-sid"),
$token
),
cts:element-range-query(
xs:QName("expiration"),">",
auth:getCurrentDateTimeUTC()
)
))
let $uri := cts:uris("",("document","limit=1"), $query )
let $doc := fn:doc($uri)
let $current := fn:current-dateTime()
returnif ($doc) then
(
let $expiration := xs:dateTime($doc//expiration)
let $diff := ($expiration - $current)
return
(
if($diff < ($auth:SESSION-TIMEOUT div 2) ) then
xdmp:node-replace(
$doc//expiration/text(),
text{fn:current-dateTime()}
)
else (),
$doc/session
)
)
else ()
};
9. Logout Code:
The logout code terminates the session by deleting the session document. Here’s the code.
declare function auth:logout($username as xs:string)
{
let $session := auth:findSessionByUser($username)
let $user := auth:userFind($username)
let $token :=
if($session) then
$session/session/@user-sid/fn:string()
else ()
let $__ := auth:clearSession($username)
return
<json:object type="object">
<json:responseCode>200</json:responseCode>
<json:message>Logout Successful - Token Deleted</json:message>
<json:authToken>{$token}</json:authToken>
<json:username>{$user/username/text()}</json:username>
<json:fullName>
{
fn:string-join
((($user/firstName,$user/firstname)[1],
($user/lastName,$user/lastname)[1]), " ")
}
</json:fullName>
</json:object>
};
Conclusion
Hopefully, the authentication code described in this demo application has been informative. It shows an approach that I have recently used in a ROXY Application.
I will be building on this solution in future posts. The most pressing next step is to add support for OAuth v2 and role based restricted views. So stay tuned.
As always, please let me know if any further clarifications or details are needed in the comments section.
Once a highly searchable MarkLogic data repository has been created, a common request is to provide a Search Widget. A search widget is a simple web search box that can be added to any web page. Users can then use the search box to submit a search request to the MarkLogic database.
The client side widget consists of HTML, CSS and JavaScript. The JavaScript calls a MarkLogic Rest API asynchronously. The MarkLogic Rest API processes the request and then sends the results back to the web page. Client side JavaScript receives the results and renders accordingly.
The search widget test page shown in this screen cast is on the following link.
The proper way to write custom snippet code is to override the transform-results function. The screencast shows the search option with the respective code. For more info, see Chapter 2 of the Search Developers Guide.
4. Add a new Search Controller using Roxy command line
Use the Roxy create command to create a new controller with the respective view code.
See the Roxy command line help for the details.
ml create –help
ml create controller –help
5. Cross Domain AJAX using JSONP
Cross Domain AJAX Requests can be a security risk and are restricted in most modern browsers. The secure work around is to wrap the JSON string into a JavaScipt function. This approach is called JSON-with-Padding or JSONP.
The over-the-wire transport from MarkLogic to a browser client is most efficiently done using the lightweight JSON format. The JSON structure used in this screencast consists of 3 parts:
The screencast builds from the previous session. The objective is to educate and raise awareness of MarkLogic’s agile database development capabilities.
There are many more topics to cover so please stay tuned.
Quick Walk Through of the MarkLogic Roxy Framework
Intro
For the past few months, I’ve been heads down working in the exciting Big Data software development world. I prefer to call it the post-relational document database world. My focus has been the technology around Big Data but there’s also an amazing social aspect. We see groups like Code For America, Data without Borders (DataKind) and NYC Open Data using Big Data to drive social change.
But social aspects are a topic for a later time. For now, it’s all about the code.
MarkLogic
Of course, MarkLogic is my preferred big data platform. I use it to de-normalize data so that the database engine and search engine can be the same thing.
For those unfamiliar with MarkLogic, it’s a document data store that was designed to handle extremely large amounts of unstructured data using the XML technology stack (XML, XQuery, XSLT, etc.).
The term “unstructured data” is often a topic for debate. We typically see structured data which I consider to be relational data or schema validated XML documents. There’s also semi-structured and unstructured data. One could argue that all data has structure. I typically consider semi-structured data to be partially validated XML or JSON documents. I consider unstructured data to be a set of XML or JSON documents that may have a common header but also have a payload that has no structure and can contain anything (xml, text, binary).
Document databases like MarkLogic, MongoDB, and Couchbase do away with the need to shred data into rows and columns. Aside from being unnecessary, it’s also not feasible when dealing with petabytes of data.
A key capability that MarkLogic provides is agile database development. A MarkLogic developer has the ability to ingest large amounts of data while having very little knowledge of the underlying data structures. Once the data is ingested, indexes can be added and data structures tweaked to provide the desired results. This agile development process ultimately leads to higher developer productivity, higher quality, and quicker time to market.
Semantic Linking
Document databases are also ideal for document linking. We’re just starting to realize the value of linking documents semantically. See Kurt Cagle’s Balisage 2012 paper for an interesting approach to linking documents by appending an “assertion node” to each document. The assertion node contains a “triple store” that’s used to describe the document’s relationship with other documents. These Semantic links can then be used for semantic reasoning which is a topic for another day.
Roxy Framework
Now that I gave some background, let’s talk about building MarkLogic apps using Roxy. I build most of my MarkLogic apps using the Roxy framework.
Roxy (RObust XquerY framework) is a well-designed Model-View-Controller framework for XQuery.
For now, MarkLogic’s primary API is XQuery. However, stay tuned. A rich Java and C# API is coming soon. Of course, there’s also the MarkLogic RESTful API called Corona.
I’ll won’t drill down on the MVC mechanics right now but its worth noting the following image. It shows the Ruby on Rails style “convention over configuration” URL to MVC routing. I’ll discuss further in a future screencast.
The screencast above will show a simple example of ingesting blog post data from the blog site Boing Boing. You can get a copy of the boing boing blog archive here. This blog archive file contains 63,999 blog posts.
I built two simple MarkLogic search apps using this data set.
App Builder Version – good for a quick demo of the search capability but difficult to extend.
This is a follow-up to Jesse Liberty’s Answering A C# Question blog post which compares two equivalent code examples to illustrate the value of interfaces:
Both examples use fictitious Notepad functionality with File and Twitter capability.
Example #1 does not use an interface and the line of code (LOC) count is 49.
Example #2 uses a Writer interface with Parameter Dependency Injection. The Notepad’s dependent objects (e.g., FileManager and TwitterManager) are passed as parameters (aka injected) to the worker method. In this case, the LOC count is 57.
It’s interesting to note that the interface example has slightly more code. The big win is less coupling which is much easier to maintain and more testable. I’ll have more about the testability in a future post.
Example 1 – No Interface
using System.IO;
using System;
namespace Interfaces
{
class Program
{
static void Main( string[] args )
{
var np = new NotePad();
np.NotePadMainMethod();
}
}
class NotePad
{
private string text = "Hello world";
public void NotePadMainMethod()
{
Console.WriteLine("Notepad interacts with user.");
Console.WriteLine("Provides text writing surface.");
Console.WriteLine("User pushes a print button.");
Console.WriteLine("Notepad responds by asking ");
Console.WriteLine("FileManager to print file...");
Console.WriteLine("");
var fm = new FileManager();
fm.Print(text);
var tm = new TwitterManager();
tm.Tweet(text);
}
}
class FileManager
{
public void Print(string text)
{
Console.WriteLine("Pretends to backup old version file." );
Console.WriteLine("Then prints text sent to me." );
Console.WriteLine("printing {0}" , text );
var writer = new StreamWriter( @"HelloWorld.txt", true );
writer.WriteLine( text );
writer.Close();
}
}
class TwitterManager
{
public void Tweet( string text )
{
// write to twitter
Console.WriteLine("TwitterManager: " + text);
}
}
}
Example 2 – Writer Interface with Parameter Dependency Injection
using System.IO;
using System;
namespace Interfaces
{
class Program
{
static void Main( string[] args )
{
var np = new NotePad();
var fm = new FileManager();
var tm = new TwitterManager();
np.NotePadMainMethod(fm); // parameter injection
np.NotePadMainMethod(tm); // parameter injection
}
}
class NotePad
{
private string text = "Hello world";
public void NotePadMainMethod(Writer w)
{
Console.WriteLine("Notepad interacts with user.");
Console.WriteLine("Provides text writing surface.");
Console.WriteLine("User pushes a print button.");
Console.WriteLine("Notepad responds by asking ");
Console.WriteLine("FileManager to print file...");
Console.WriteLine("");
w.Write(text);
}
}
// Writer Interface
interface Writer
{
void Write(string whatToWrite);
}
class FileManager : Writer // Inherits Writer Interface
{
// Implements Write Interface Method
public void Write(string text)
{
// write to a file
Console.WriteLine("FileManager: " + text);
}
public void Print(string text)
{
Console.WriteLine("Pretends to backup old version file." );
Console.WriteLine("Then prints text sent to me." );
Console.WriteLine("printing {0}" , text );
var writer = new StreamWriter(@"HelloWorld.txt", true);
writer.WriteLine(text);
writer.Close();
}
}
class TwitterManager : Writer // Inherits Writer Interface
{
// Implements Write Interface Method
public void Write( string text )
{
// write to Twitter stream
Console.WriteLine("TwitterManager: " + text);
}
}
}
There are many meteorological pun’s associated with the term “cloud computing”. The term represents a huge paradigm shift in the way backend software services are delivered.
This article covers much of the confusion associated with this ambiguous and overused phrase.
From a software developer perspective, the deployment model and elasticity are the key differentiators for cloud services.
I consider a cloud service to be a system that can host my software and hide the complexity of the server farm (e.g., routers, load balancers, SSL accelerators, etc.).
Amazon popularized the term “Elastic Cloud” when they launched their core cloud component called EC2 back in August 2006. EC2 stands for Elastic Compute Cloud (EC2). Elasticity is the infrastructure’s ability to automatically scale up and scale down as needed.
Elasticity is a big deal. It dramatically simplifies the deployment and administration process. It means that software developers don’t need to worry much about infrastructure as much and can focus on coding the business process.
I consider Amazon, Google and Microsoft to be the big 3 cloud vendors. They have the elasticity expertise and server farms to support high volume cloud apps.
There’s Oracle, Salesforce.com, Rackspace and others but IMO are not generic cloud platforms.
For more about the non-developer cloud computing perspective, this Wikipedia article is a great reference.
Rob Enderle’s blog post has it right. 2010 will be the year and start of the cloud decade.
I’d like to take it a step further. The coming wave of ubiquitous ‘democratized’ data services with eager clients waiting to consume will take the internet to a dramatic new level. Microsoft’s three screens and a cloud vision speaks to it but I believe its more about “4 screens with data services”. I consider the data services to be more relevant. The cloud is the engine but the 24/7 data services it provides will be life changing/business transforming.
Thanks to 3G, pending 4G and whatever comes after, the data services will come from highly reliable mobile data pipes that can be consumed while driving a car, riding a bicycle, at the doctor’s office or exercising at the gym.
The data services are democratized because the data being provided was once only available to a select few. Opening up the data to software developers and entrepreneurs can be a catalyst for positive change. The democratization of data trend is an unstoppable force that has the power to accelerate innovation to help solve some of the world’s problems and improve the quality of life for all.
The US Chief Information Officer, Vivek Kundra, understands the power of democratized data. He spearheaded a new web site for this called Data.gov. Another great example is the City of New York’s recent NYC Big Apps Contest. Microsoft is also getting involved with their new Dallas service.
Regarding the 4 screens, not 3, I expect the data services to be designed to support the following clients.
Listing the Car Dashboard may be a bit premature but I expect to see at least 25 million "connected" cars sold during this coming decade. In less than a year, the Microsoft Ford Sync system has already exceeded 1 million in US only sales. These systems are just starting to go global with Kia’s UVO and Fiat’s Blue&Me systems. I expect “Connected Cars” consuming mission critical data services to become the norm within 5 years.
Examples of the mission critical and revenue generating data services are the real-time location-aware contextual ads or electronic billboards. Some of this is already available in the Ford Sync system. I consider it the first commercially viable Augmented Reality solution. I expect Car Dashboard solutions to eventually provide windshield “heads up display” driving directions that can also show the nearest movie listings, nearest Thai restaurants, closest hospitals, etc.
Aside from the Car Dashboard services, data services will come in many flavors. The more popular services will be the entertainment and news services:
Video
Netflix, Hulu, Youtube, Boxee
Music
iTunes, Pandora, Zune
Games
Xbox Live, SONY Playstation Network
Books
Amazon Kindle, Nook, PDF, Audible
News
NY Times, CNN, MSNBC, ABC, CBS and all of the Radio News Feeds
Sports
ESPN
There will be Quality of Life services such as:
Health medical record services – HealthVault
Real-time Traffic – Calculate Quickest Travel Time
Air/Pollen Quality – What will the air/pollen be like on December 31st at 5:30 PM.
Population Growth versus Food Supply – Expected food supply in Somalia over the next 3 years.
Malaria Cases/Birth Rates/Life Expectancies by Region
Violence Levels in Iraq and by Region
Airport Security Wait Times – Security Check Wait Time at Gate #4 in LAX, etc.
Crime Stats by Region
High School Education Quality by Region
The list of potential services is endless.
Much of this data is already available but is not in a format that can be easily used or consumed by the 4 screens mentioned.
I’ll leave it to the developers and entrepreneurs to pioneer.
Ten years from now, I am confident that we’ll all be grateful for this new cloud computing/data services era.
In a few hours the 2009 PDC will officially kick off. For software developers, every PDC keynote has its share of surprise announcements. There was a rumor this morning about some Windows Mobile 7 related announcements coming this week but I doubt it.
I think Microsoft is not ready to discuss Windows Phone 7 (WP7) publicly yet. My guess is they’re shooting to make WM7 announcements on January 6th, 2010 at the CES 2010 keynote event.
NeoWin sometimes gets the inside scoop and they say Microsoft will discuss plans for IE9 and Silverlight 4 in the morning.
Regardless of surprise, the big news is the Dawn of Microsoft’s Cloud Era. It also sets a tone for Google and Amazon. Microsoft will disclose all details on their cloud strategy in the morning. Financial analysts will be listening in carefully so they can tweak their MSFT revenue forecast models.
I predict/expect Ray Ozzie to rock the house in the morning.
It’ll be a busy 3 days. I’m expecting deep dives in the following areas.