Lies, damned lies, and statistics

First some insight into how my twisted mind works. I rarely believe any bar chart, pie-chart, percentage I see presented unless I can access the raw data myself and draw my own conclusions. I’m not a statistician by trade or education but I’ve spent a lot of time running surveys and analyzing large data sets; so I have the benefit of some experience.

Replay Solutions just published a survey about Java Platform usage. The questioning, subsequent analysis and presentation of the results was poor IMO. But they did one thing right – they provided the raw data. Thanks for that.

There’s a posting on TSS entitled “Why is JBoss at the bottom of this list ???”. In typical TSS style – few people actually bothered to read the survey results or question them and a long and rather pointless thread ensues. This post is an expansion of my comment at the end of the TSS thread.

The original report has this chart :

Screen shot 2010-04-09 at 11.07.48 AM.png

Update – Sunday 4/11/10

My initial (very quick) analysis was wrong. My formula for searching for different categories had a basic reg-ex flaw so I was over-counting JBoss by fair bit. I’ve fixed that mistake (spreadsheet here) and also removed duplicates (responses with both “JB + TC” and “Tomcat” or “JBoss” – I’d already admitted to this minor double counting (in the comments) – the original author’s analysis still includes this error.

So the ranking is now the same as the original author’s but the %’s are different. My apologies to IBM for originally stealing their #2 spot😉 By the way – Red Hat fully supports both Tomcat and JBoss AS – so #1 and #3 rankings and being able to satisfy 87% of the market isn’t such a bad result for us.

Screen shot 2010-04-11 at 8.25.17 AM.png

[As percentages of respondents – that’s : Tomcat = 59%, Websphere = 39%, JBoss = 28%, Weblogic = 23%, Other = 19%]

These rankings (WAS above JBoss) are more representative of larger organizations (where both WAS and WLS have traditionally been stronger) – the latest Eclipse survey shows a similar break-out. Unlike the Eclipse survey – this survey doesn’t have any information on the respondents and they seem to be largely self-selected.

My original points still stand – poor questioning, poor analysis and presentation are common in these kinds of surveys. Always ask for the source data !

