The start of the Cloud Race?

May 4, 2009 05:30 by John McConnell

Arguably the hottest topic in computing at the moment is around the notion of Cloud Computing which typically involves the availability of Software as a Service (SaaS) . Like many “new” IT related concepts it has largely been around for a while in different forms – applications have been hosted on the Internet for some time and the idea of Analytical Service Providers (ASPs) was around in the early 2000s – but recently the ideas have crystallised into the idea of “The Cloud” as the platform. One of the numerous  benefits to the user is that true SaaS is delivered on-demand and the overall cost diminishes.

Whereas the majority of Web Analytics is delivered via a hosted service from the likes of Omniture, WebTrends and Google, Zementis  were probably the first advanced analytics vendor to offer their products in the cloud (they are using the Amazon Elastic Compute Cloud – EC2). This fits with their growing reputation for innovation around predictive analytics software delivery.

 Aside from that we haven’t seen much from the more established vendors yet. Not surprisingly SAS have been the first major player to at least outline their vision - SAS builds own cloud . For reasons of data privacy it sounds like they will end up owning their own cloud infrastructure. Will this be the “SAS SaaS”? I wonder!

SPSS has had a web based framework - SPSS Web App – available for several years. Until now this seems to have been  largely used to build and deliver custom applications for specific clients but this could conceivably be used to deliver more generic applications in cloud form. It will be interesting to see what they come up with.

At Analytical People we are finalising a Geo-spatial/Statistical modelling application in collaboration with the LAMP team at Liverpool City Council. For the modelling part we are currently using R. We’re still discussing the deployment but it is likely that we’ll also use Amazon EC2.

So for this reason - and as part of our broader interest - we are keenly watching how this type of delivery evolves. The clouds are forming … but we can’t really feel the rain … yet.


First thoughts from Predictive Analytics World

March 1, 2009 03:54 by John McConnell

PAW … What is it good for?...

… Well quite a lot actually.

The inaugural Predictive Analytics World (PAW) ran in San Francisco from Feb 18th -19th. Originated, chaired and orchestrated by Eric Siegel of Prediction Impact and supported by the ever-efficient Rising Media.

To my mind PAW distinguishes itself among the various conferences on similar topics primarily because of its business-focus and software vendor-neutrality. The vendors were there (more on that below) and there was certainly some theory in the mix but plenty of visionary thought, conceptual thinking and real-life applications.

Over the coming blogs I hope to expand on some of the themes which emerged. But let me start with a twitter-style front-of-mind download of some of the quotes/points I remember. As usual I didn’t see all the presentations because of the streaming. So I am sure I missed a lot and I’m looking forward to seeing the other stuff. In the meantime, and in no particular order …

·         Eric Siegel hopped in (he's had a skiing accident)

·         Ann Milley of SAS followed him and said “at the end of the day it is about the people”.  I couldn’t agree more!

·         David McMichael of MetLife, in his presentation with Kathy Konkel of SPSS on Insurance Claim Fraud Detection, emphasised how important it is that any Predictive Analytics effort is aligned with the priorities and agendas of senior management. That there needs to be a commitment to a project team which covers all the disciplines required to make a PA project successful.  In particular he identified the need for an IT advocate in the mix.

·         John Elder (Elder Research) gets to ply his trade across an interesting variety of industries and business/research areas and he presented them with great humour. He was the first to make the point – of what became a recurrent theme - that you have to be cynical about any model and try to “break it in the lab”. Linked to this he said he looks to recruit analysts with humility. A most under-rated characteristic in my opinion.

·         Related to Evaluation and Quality Assurance was the importance of “out-of-time” testing.

·         Various definitions of the differences/similarities between Data Mining (DM) and Predictive Analytics (PA). The correct consensus (I believe) is that PA is a subset of DM.

·         The bright young Austrian guys from Commendo (project team "BigChaos") explained how they lead the Netflix Prize.

·         Decision Tree models seemed to be in the majority.

·          There weren’t too many new algorithms/tools in evidence but there were plenty of examples of how familiar algorithms are used collaboratively. Ensemble modelling in particular was a frequent topic.  That said the number of applications with Support Vector Machines in the mix also seems to be growing. Likewise for Survival Analysis.

·         The cluster visualisation work (some of which was also based on the Netflix data) presented by Todd Holloway of Ingenuity was probably the most interesting to me from a technology perspective. It is a minor observation but i felt we could have done with a bit more visualisation throughout.

·         One of the ensemble modelling fans is Dean Abbott of Abbott Analytics who presented his National Rifle Association case study. I found Dean to be the most infectiously enthusiastic speaker of the 2 days (and I had to follow him!).  Like many of the US based practitioners in attendance he is remarkably understated in many ways. You’d trust him with your data.

·         The almost subversive (in a good way) presence of R and many discussions about what it really means for PA and Data Mining in the future – particularly from the perspective of the established software vendors.

The REvolution badges drove the point home.

·         The keynotes were from Usama Fayyad, formerly of Yahoo and now with Open Insights and Andreas S. Weigend, Ph.D. who used to be the Chief Scientist at Amazon.  Both took an appropriately higher level view of where PA could be heading in the near future. On-line behavioural applications seem to dominate their visions. Andreas speculated about the largely untapped potential of mobile devices and the implicit location data they carry. It made me wonder if I really need to work so hard on Google to find the best Dim Sum in Chinatown.

·         As we’ve been doing a lot of on-line work in recent years I found Vincent Granville’s (Click Forensics) presentation on Pay-Per-Click (PPC) optimisation particularly interesting.

The two points that stick in my mind are his scepticism about real-time bid management – because it is moving the goal posts before you have had time to assess them (if I understood correctly) and the way in which he decomposed the textual constructs within ads to build potential predictors of response.

·         Richard G. Vlasimsky,  (Valen Technologies) presentation was interesting from a number of angles.

Foremost among them is that it is a true Predictive Analytics application as it focuses on PA for the Insurance industry.

Moreover Valant use a “consortia” approach to model data across a number of their client’s portfolios. This helps to smooth out some bias arising from the idiosyncrasies of individual customer business models.

Richard was another speaker who emphasised the importance of validating models and data. It makes me think that we are really into an era where the Quality Assurance around models is becoming paramount – for the more valuable ones at least.

Last but not least Valen typically deliver the model and results to their clients using Software as a Service (SaaS) platform.

If you take all of these points together I feel that they are very much at the front of the PA delivery curve.

·         Conferences of like-minded people are always somewhat reassuring to the attendees but I left PAW thinking that there is much more yet to be done with Predictive Analytics. I think I might have referred to the current situation in my presentation as “the tip of the iceberg”.

·         Eric Siegel hopped off into the distance.


The pain of extraction

January 22, 2009 23:36 by John McConnell

As I write this entry I'm keeping an eye on circa113 million records exporting from a SQL Server database and thinking "why do i have to do this?". The answer is that i probably don't.

My objective is to use a particular analytical tool that needs to have the data in its own file format so I'm extracting the fields that i think i need to produce some Rule Sets but the chances are that i may need to go back to the database and get some more later as I iterate in the usual style.

There has long been a recognition that this data extraction step - when the data is already inside a database - is, for the most part, a wasted one. Hence the leading database vendors have implemented data mining algorithms inside their database servers. Rather like this project though my sense is that the majority of analytical modelling still happens outside the database (even though more data than ever sits inside one). I suspect that one of the main reasons for this is that the native User Interfaces that exist to these database server algorithms are not as mature as they could be and that most analytical tool vendors don't typically enable us to do the in-database mining. They may have better interfaces but you are usually going to have to extract the data to use them.

There are exceptions to the rule and SPSS Clementine is a notable one. In Clementine you can model using the algorithms inside SQL Server, Oracle or IBM DB2. Moreover when you manage/transform data - sort it for example - then that will also happen inside the database. However Clementine is the Rolls Royce among data mining tools.

This is very much at the front of my mind because we've just re-connected with Oracle after a number of years and I'm hopeful that we are going to be able to start using the Oracle Data Mining (ODM) algorithms more often in our projects. Many moons ago Oracle acquired a data mining tool called Darwin (it was so long ago that there isn't even a Wikipedia entry on it). Over the years they have integrated (or should I say "evolved" - sorry!) that technology into the Oracle database server and developed some new - and frequently innovative - algorithms into what is now ODM.

So a belated resolution for Analytical People this year is to get stuck into the database resident algorithms more. 30 million and counting ...


Conferences, Events, etc.

December 12, 2008 21:14 by John McConnell

These are the recent (and future) events that we've been (or will be) involved with...

 

PPA Customer Direct Conference and Awards 2008 - London, 20th November 2008

We made a joint presentation with Ed Garcia of Reed Business Information at the annual PPA Customer Direct conference at the Grosvenor House in London. Ed and John presented a case study on the modelling, deployment and results of an analysis of subscribers to Reed's Caterer & Hotelkeeper magazine.

Ed and RBI won the "Best use of data" and the "Best customer retention strategy" awards for their work on Caterer & Hotelkeeper.

 

Predictive Analytics World (PAW) - San Francisco, 18th - 19th February 2009

We'll be speaking at the inaugural Predictive Analytics World (PAW) event in "Fog City" in mid Feb.

 

Data Matters - London, 24th February 2009

At the end of February this Market Research Society (MRS) Data Matters event will focus on the confluence of Market Research and Data Mining/Predictive Analytics. Our partners at Cognicient (who specialise in the data fusion process around research data) are chairing the session that we are involved in. John will be talking about Neural Network applications.

Predictive Analytics World (PAW) - Alexandria, VA, 21st October 2009

John McConnell presented an update on the Subscriber Retention (Segmentation) project with RBI.

 

 

 

 


Well … How did we get here?

November 15, 2008 22:40 by John McConnell

Analytical People was born out of a realisation that – in the growing debate about the value of analytics – something was missing. The ingredients for the expansion in the areas of advanced analytics are varied and many: there is an ever-growing amount of unrefined data in the world; there are more technologies – both commercial and open source - which can potentially make sense of that data than ever before; there are more current (or potential) savvy consumers of the analytical interpretation of that data than ever ; the vendors and analysts have done a great job of communicating what is possible with analytical approaches; and so on…

The only problem is that there aren’t enough people who can be found and deployed effectively enough to make the whole thing work. Not enough resource to join the dots. And certainly not enough help to find and grow that resource.

For me it all crystallised when – in my last job with Applied Insights – we were engaged with a multi national customer (one of the internet pioneers in fact) in the early stages of a fairly strategic piece of consulting about how they could link up all their disparate data around customers, ascribe value (or more importantly potential value) to them and ultimately build a process to systematically anticipate and deliver the most relevant content to them. We were discussing which analytical tools they should consider and it became apparent that, given the volume of ongoing analysis they expected to perform - and hence the number of analytical people they would need - then there really was only one of the leading toolsets where there was enough supply of expertise. In other words, despite the plethora of technologies - with their various strengths and weaknesses – our client really had no choice but to go for the market leader.

So in a nutshell we are aiming to help plug the gap between the demand and supply of professional services and resources. We have started to do this in a number of ways by engaging in projects with our own people and those within our network while at the same time looking to consolidate that network … soon we want to do more to grow and develop the talent pool with more partnerships and training activities. Watch this space …