Whatever the eventual outcome of the proposed acquisition of SPSS for US$1.2B (This may be less than 1% of IBM’s current market capitalisation – but it is still a tidy sum for such a niche market player) one thing is clear; It represents a significant punt by one of the leading IT products and services players on the potential value of Predictive Analytics.
An important piece of the acquisition is that IBM have the potential to bring the kind of service delivery –particularly in consulting and implementation - to bear that we haven’t seen in this area to date. IBM Global Services is a major player in Business/IT professional services and Big Blue has already announced the creation of a number of “Analytics Solutions Centers” the latest of which is in China .
This is also part of a broader strategic move into what might be termed “Business Analytics” that started in earnest with the acquisition (for about 4 times the current offer for SPSS) of Cognos in November 2007.
For me there is some déjà vu here. Oracle scanned the market for a Data Mining tool and ended up buying Darwen from Thinking Machines back in 1999. Since then Oracle have integrated Data Mining technology into their database server in what is now known as Oracle Data Mining (ODM). The products and market have matured somewhat since ‘99 and the definition of Predictive Analytics (which is/was always a subset of Data Mining) has helped us present a clearer value proposition to that market.
ODM, however, has always been a small planet relative to the considerably larger Oracle Sun (if you can excuse the pun). Though the IBM move may prompt them to respond. SAS already responded with a very early shot across the bows (SAS warns SPSS' users... ) of their traditional rival (SPSS) and IBM, who - at least in the past - have provided a significant platform for SAS themselves.
So this really does look like a major step up for the world of Predictive Analytics. We’re waiting eagerly for more clues to the IBM strategy.
Arguably the hottest topic in computing at the moment is around the notion of Cloud Computing which typically involves the availability of Software as a Service (SaaS) . Like many “new” IT related concepts it has largely been around for a while in different forms – applications have been hosted on the Internet for some time and the idea of Analytical Service Providers (ASPs) was around in the early 2000s – but recently the ideas have crystallised into the idea of “The Cloud” as the platform. One of the numerous benefits to the user is that true SaaS is delivered on-demand and the overall cost diminishes.
Whereas the majority of Web Analytics is delivered via a hosted service from the likes of Omniture, WebTrends and Google, Zementis were probably the first advanced analytics vendor to offer their products in the cloud (they are using the Amazon Elastic Compute Cloud – EC2). This fits with their growing reputation for innovation around predictive analytics software delivery.
Aside from that we haven’t seen much from the more established vendors yet. Not surprisingly SAS have been the first major player to at least outline their vision - SAS builds own cloud . For reasons of data privacy it sounds like they will end up owning their own cloud infrastructure. Will this be the “SAS SaaS”? I wonder!
SPSS has had a web based framework - SPSS Web App – available for several years. Until now this seems to have been largely used to build and deliver custom applications for specific clients but this could conceivably be used to deliver more generic applications in cloud form. It will be interesting to see what they come up with.
At Analytical People we are finalising a Geo-spatial/Statistical modelling application in collaboration with the LAMP team at Liverpool City Council. For the modelling part we are currently using R. We’re still discussing the deployment but it is likely that we’ll also use Amazon EC2.
So for this reason - and as part of our broader interest - we are keenly watching how this type of delivery evolves. The clouds are forming … but we can’t really feel the rain … yet.
As I write this entry I'm keeping an eye on circa113 million records exporting from a SQL Server database and thinking "why do i have to do this?". The answer is that i probably don't.
My objective is to use a particular analytical tool that needs to have the data in its own file format so I'm extracting the fields that i think i need to produce some Rule Sets but the chances are that i may need to go back to the database and get some more later as I iterate in the usual style.
There has long been a recognition that this data extraction step - when the data is already inside a database - is, for the most part, a wasted one. Hence the leading database vendors have implemented data mining algorithms inside their database servers. Rather like this project though my sense is that the majority of analytical modelling still happens outside the database (even though more data than ever sits inside one). I suspect that one of the main reasons for this is that the native User Interfaces that exist to these database server algorithms are not as mature as they could be and that most analytical tool vendors don't typically enable us to do the in-database mining. They may have better interfaces but you are usually going to have to extract the data to use them.
There are exceptions to the rule and SPSS Clementine is a notable one. In Clementine you can model using the algorithms inside SQL Server, Oracle or IBM DB2. Moreover when you manage/transform data - sort it for example - then that will also happen inside the database. However Clementine is the Rolls Royce among data mining tools.
This is very much at the front of my mind because we've just re-connected with Oracle after a number of years and I'm hopeful that we are going to be able to start using the Oracle Data Mining (ODM) algorithms more often in our projects. Many moons ago Oracle acquired a data mining tool called Darwin (it was so long ago that there isn't even a Wikipedia entry on it). Over the years they have integrated (or should I say "evolved" - sorry!) that technology into the Oracle database server and developed some new - and frequently innovative - algorithms into what is now ODM.
So a belated resolution for Analytical People this year is to get stuck into the database resident algorithms more. 30 million and counting ...