Who do you think they are? – Segmenting to find out

May 7, 2010 23:00 by jmcconnell

 

2010 has been busy until now (hence the lack of blogging which we’ll try and remedy) but, for the most part, with a type of analysis we didn’t do any of in 2009.

 

We’ve been very occupied in recent times with Prediction as the awareness and demand for Predictive Analytics has grown. Though it is demonstrably powerful Prediction isn’t the only game in town, analytically speaking.

 

Methodologically speaking, standard segmentations are not predictive but they can be described as “Data Mining”. Typically they involve some type of clustering to identify groups (segments) of something (usually people/customers but it doesn’t have to be) which have the most in common with each other. That commonality can be based on behaviours, attitudes, demography, etc.

 

The several we’ve been working on this year have ranged from relatively quick – survey based - analysis of a few hundred consumers (of a consumer packaged good I can’t mention) to a behavioural segmentation of millions of visitors to a web site (the name of which I also can’t mention).

 

This type of analysis very much demonstrates one of the key tenets of CRISP-DM in that it takes a joint effort, particularly between the business, research and analysis, to arrive at a successful outcome.

 

The larger the segmentation the more iterative this becomes and typically more constituents from the business need to be engaged in it. It really is a mixture of hard work (as you work through dozens of profiles and several versions of the segmentation) and fun – as recognisable, and sometimes surprising, segments emerge. When it works – and going in you are less sure that it will than your typical predictive model – the final segments are really clearly defined and can be used as the touchstone for how companies/organisations communicate with their customers, citizens, donors, etc.

 

To illustrate the point we have performed a number of behavioural segmentations over the last few years of visitors/customer to web sites. Typically these have been “Blue Chip” sites with millions of visitors per week. Whenever we perform these on-line segmentations these days we work with Neil Mason’s analytical team at Foviance. Despite the preponderance of Web Analytics tools like Omniture Sitecatalyst, Webtrends, etc.  – which are very useful for many analytical types but web site owners rarely have that higher level strategic view on who their visitors/customers actually are.

 

Now this is where the trouble starts! To enact the segmentation we typically have to suck out months of data from those web analytics tools, typically at the click level, and then reconstitute it to the visitor level so we can start to segment. Luckily we’ve done this several times now so have some re-usable components and a target database schema which is “segmentation ready”. If possible we also like to directly link attitudinal and demographic data to the visits by surveying enough of them and linking them to specific visits. And we like to add data from customer/registration databases (where they exist) into the pot to give us a rich mix with which we can start to uncover these meaningful segments.

 

The outcomes and applications of the resulting groups are various. For a newspaper publisher we identified a, hitherto unknown, and valuable segment of career break mothers (and sometimes fathers) who had vivid visiting habits early in the afternoon. The publisher went to run an off-line acquisition campaign to recruit more of them. For The Royal Mail we jointly found a segment of visitors linked to eBay who were running virtual cottage industries. Segments like these need to be able to use the RM site in a very different way to those who are just there to find a postcode for example. Hence this requires some Information Architecture to tune - and in some cases to completely rebuild – the site.

 

It isn’t all about on-line segmentation of course – though the recency of that channel means that there is a lot to be done there. We’re currently working with a large UK retailer, a research agency and their marketing agency on a multi category, multi brand, multi channel segmentation. When we start a new one it often feels like we’re at the base of a mountain peering upwards. But we usually get to the top. And the view is usually worth the climb.


Mapping the future

January 23, 2010 22:27 by jmcconnell

Chris Villar, who heads the LAMP (Liverpool Asset Management Programme) team at Liverpool City Council, and I presented GENIE to the MapInfo User Group conference in Bristol, UK last Thursday, January 21st.

Chris explained how his research team use Regression and Factor Analysis to reduce 32 social indicators – ranging from the bad stuff e.g. number of ASBOs, drug offences and abandoned vehicles, to  the (hopefully) better stuff like affordability of house prices and proximity to schools – to a single score which encapsulates the ability of a geographic area to stay “healthy” and survive. This score is known as the Sustainability Index.

GENIE is an application which provides local planners and researches with access to their local Sustainability scores. It allows them to geographically map the Index and the underlying attributes thematically and look for areas with high/low scores. They can also perform what-if? simulations to see the effect of planned interventions on an area; for example what would happen if they were to reduce the level of fly-tippng by 50%? What if they reduced rents in social housing?

We’re currently running the latest Beta version on the Amazon Elastic Cloud  but will be migrating it onto Liverpool’s servers in the coming month.

I usually start these presentations with an intro which tries to distinguish between Software tools and Applications. To start to emphasise how “apps” can spread the access to advanced modelling/analytics/insight to a broader community. “Democratising Analysis” if you like. However what becomes apparent to me whenever I dip into the GIS world is that there is a very natural marriage between GIS and advanced analytics and that numerous applications already exist within this area and, as a consequence of that, that GIS practitioners “get” apps more than most. To illustrate this there two other presentations on the day which covered apps that span geography and advanced data modelling; one on Flood Risk mapping/modelling by Nathan Muggeridge of Mouchel and one on  Land Use modelling by Melanie Bosredon of the David Simmons Consultancy.

Of course that is not to say that there isn’t still a huge potential for applications which combine GIS and analytics, just as there is huge potential for such apps in analytics. I believe that we’re only really scratching the surface of both.


Climate Change? It seems we only had to ask

December 5, 2009 01:58 by jmcconnell

Further to my earlier post the UK Met Office announced – a few hours later - that they are going to publish more raw data on climate change.

I think this should be applauded and let’s hope it leads to more analysis and a clearer view on what is, or isn’t, going on.


Climate Change? Show us the data

December 4, 2009 20:00 by jmcconnell

It is interesting that the Met Office in the UK has decided to go back and re-analyse their temperature data. Apparently it will take 3 years. This follows the recent implications that scientists at CRU at UEA (UK) may have engaged in some spin to enhance the results of previous analyses in order to strength the argument that something needs to be done about the man-made causes of Global Warming.  This has become known as Climategate

At this point in time it isn’t clear whether the allegations about CRU are true or not but Climate Change sceptics have seized on the news to strengthen their argument that the man-made effect is not as great as the likes of CRU would have us believe. So nothing is clear yet (hopefully the investigation into Climategate will reveal more) but as this article says, public opinion will have been affected by the allegations.    

All this comes on the eve of the Copenhagen conference, so the timing is not good if you believe that Global Warming is real and that the causes can be identified and fixed.

The Met Office explain the methodology for calculating global average temperature records here. At first glance - to my naive eye - the argument about missing data seems unusual - why not use imputation? - but i'd need to spend more time looking at the method and the data to really form an opinion. This all makes me believe that they need to make all the data available and unleash the Wisdom of Crowds. 

As we all know this could be the single most important issue facing the planet (not to mention humanity). Frankly I think we need more brain power - and certainly more openness - which I firmly believe will drive us towards a truthful  consensus. And we may even get to more accurate results in less than three years!


Bigger boys came - IBM and SPSS

August 8, 2009 21:09 by jmcconnell

Whatever the eventual outcome of the proposed acquisition of SPSS for US$1.2B  (This may be less than 1% of IBM’s current market capitalisation – but it is still a tidy sum for such a niche market player) one thing is clear;  It represents a significant punt by one of the leading IT products and services players on the potential value of Predictive Analytics.

An important piece of the acquisition is that IBM have the potential to bring the kind of service delivery –particularly in consulting and implementation - to bear that we haven’t seen in this area to date. IBM Global Services is a major player in Business/IT professional services and Big Blue has already announced the creation of a number of “Analytics Solutions Centers” the latest of which is in China .

This is also part of a broader strategic move into what might be termed “Business Analytics” that started in earnest with the acquisition (for about 4 times the current offer for SPSS) of Cognos in November 2007.

For me there is some déjà vu here. Oracle scanned the market for a Data Mining tool and ended up buying Darwen from Thinking Machines back in 1999. Since then Oracle have integrated Data Mining technology into their database server in what is now known as Oracle Data Mining (ODM). The products and market have matured somewhat since ‘99 and the definition of Predictive Analytics (which is/was always a subset of Data Mining) has helped us present a clearer value proposition to that market.

ODM, however, has always been a small planet relative to the considerably larger Oracle Sun (if you can excuse the pun). Though the IBM move may prompt them to respond. SAS already responded with a very early shot across the bows (SAS warns SPSS' users... ) of their traditional rival (SPSS) and IBM, who - at least in the past - have provided a significant platform for SAS themselves.

So this really does look like a major step up for the world of Predictive Analytics. We’re waiting eagerly for more clues to the IBM strategy.