<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Lies, damned lies, and statistics</title>
	<atom:link href="http://blog.softwhere.org/archives/1016/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.softwhere.org/archives/1016</link>
	<description>Musings on the world of software from the sharp end of the long tail</description>
	<lastBuildDate>Mon, 19 Dec 2011 10:45:07 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: sharps</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6259</link>
		<dc:creator>sharps</dc:creator>
		<pubDate>Sun, 11 Apr 2010 12:53:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6259</guid>
		<description>@Savio, @Kevin :
a) you&#039;re both wrong; as is the report&#039;s author - but so was I. I&#039;ve now corrected my post. 
b) thanks for the peer-review !
c) we&#039;ve already spent way more time on this than the orginal author !

- Rich</description>
		<content:encoded><![CDATA[<p>@Savio, @Kevin :<br />
a) you&#8217;re both wrong; as is the report&#8217;s author &#8211; but so was I. I&#8217;ve now corrected my post.<br />
b) thanks for the peer-review !<br />
c) we&#8217;ve already spent way more time on this than the orginal author !</p>
<p>- Rich</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eduardo Pelegri-Llopart</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6257</link>
		<dc:creator>Eduardo Pelegri-Llopart</dc:creator>
		<pubDate>Sat, 10 Apr 2010 03:13:41 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6257</guid>
		<description>&gt; my point still stands – the Questioning, analysis and presentation of the data is poor.

I agree 100% w/ you.</description>
		<content:encoded><![CDATA[<p>&gt; my point still stands – the Questioning, analysis and presentation of the data is poor.</p>
<p>I agree 100% w/ you.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin Schmidt</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6256</link>
		<dc:creator>Kevin Schmidt</dc:creator>
		<pubDate>Sat, 10 Apr 2010 02:01:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6256</guid>
		<description>The debate over how to properly count these responses just supports the notion that you can make the data say whatever you want, but I believe the proper counts are:

Tomcat - 625
WebSphere - 410
JBoss - 287
WebLogic - 239
Other - 197
(I&#039;ve ignored the others as their counts are low)

Like Savio suggests, I&#039;ve ignored &quot;multiple answers&quot; and I&#039;ve also avoided double counting for JBoss and Tomcat by making sure that when JBoss or Tomcat is specified in the same response as JBoss+Tomcat I only count that response once.  My results are more consistent with Savio&#039;s so I suspect Rich&#039;s minor double counting may not be so minor.

See http://ktschmidt.blogspot.com/2010/04/further-analysis-of-java-platform.html for some charts from my analysis.</description>
		<content:encoded><![CDATA[<p>The debate over how to properly count these responses just supports the notion that you can make the data say whatever you want, but I believe the proper counts are:</p>
<p>Tomcat &#8211; 625<br />
WebSphere &#8211; 410<br />
JBoss &#8211; 287<br />
WebLogic &#8211; 239<br />
Other &#8211; 197<br />
(I&#8217;ve ignored the others as their counts are low)</p>
<p>Like Savio suggests, I&#8217;ve ignored &#8220;multiple answers&#8221; and I&#8217;ve also avoided double counting for JBoss and Tomcat by making sure that when JBoss or Tomcat is specified in the same response as JBoss+Tomcat I only count that response once.  My results are more consistent with Savio&#8217;s so I suspect Rich&#8217;s minor double counting may not be so minor.</p>
<p>See <a href="http://ktschmidt.blogspot.com/2010/04/further-analysis-of-java-platform.html" rel="nofollow">http://ktschmidt.blogspot.com/2010/04/further-analysis-of-java-platform.html</a> for some charts from my analysis.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sharps</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6254</link>
		<dc:creator>sharps</dc:creator>
		<pubDate>Fri, 09 Apr 2010 20:37:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6254</guid>
		<description>@Savio - scenario 1 is pretty close to what I got - the diffs. could be the double counting on my side.

@Eduardo - I agree there are much bigger questions about the source of the data than I&#039;ve posed. The data is tangible so an easier target for me. 

My own analysis isn&#039;t perfect but my point still stands - the Questioning, analysis and presentation of the data is poor.</description>
		<content:encoded><![CDATA[<p>@Savio &#8211; scenario 1 is pretty close to what I got &#8211; the diffs. could be the double counting on my side.</p>
<p>@Eduardo &#8211; I agree there are much bigger questions about the source of the data than I&#8217;ve posed. The data is tangible so an easier target for me. </p>
<p>My own analysis isn&#8217;t perfect but my point still stands &#8211; the Questioning, analysis and presentation of the data is poor.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Savio Rodrigues</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6253</link>
		<dc:creator>Savio Rodrigues</dc:creator>
		<pubDate>Fri, 09 Apr 2010 20:13:26 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6253</guid>
		<description>Ok, thanks for the clarification - I guess I don&#039;t really know what you mean by &#039;reassigned &quot;Multiple Answers&quot;:
&gt; and re-assigned “JBoss + Tomcat” and “Multiple Answers”.

In any case - I looked at the data in two ways.  

Scenario 1: Follow the Replay Analysis Methodology &amp; allow a user to vote for multiple servers (i.e. Total results &gt; Total valid respondents for the question)
- For instance, if respondent A selected WebSphere and respondent B selected JBoss and WebSphere, the total results would be 2 for WebSphere and 1 for JBoss, for a sum of 3 across the total 2 respondents.
- In keeping with the point above (and Replay Solutions&#039; analysis methodology) I added all of &quot;TC+JB&quot; results (161 by my count) to the &quot;Tomcat&quot; result and also add 161 to the &quot;JBoss&quot; results.  
- Total count = 2036 split as follows:

35% Tomcat &amp; JB+TC
17% JBoss &amp; JB+TC
12% WebLogic
20% WebSphere
 5%  Jetty	
 1%  Jonas, Jrun, Orion
 1%  Resin
10% Other
100% Total

Not sure why the results don&#039;t exactly match yours, but I could very easily have made an error (I did the spreadsheet work quickly while on a call!)

Scenario 2: Each valid respondent&#039;s response is spread across the servers s/he selects (i.e. Total results == Total valid respondents for the question) 
- For instance, if respondent A selected WebSphere and respondent B selected JBoss and WebSphere, the total results would be 1.5 for WebSphere and 0.5 for JBoss across the total 2 respondents.
- Total count = 1054 split as follows:

34% Tomcat &amp; TC portion of &quot;JB+TC&quot;
12% JBoss &amp; JB portion of &quot;JB+TC&quot;
11% WebLogic
24% WebSphere
 4%  Jetty	
 1%  Jonas, Jrun, Orion
 1%  Resin
14% Other
100% Total

I am in no way saying that Scenario 2 is a valid representation of &quot;actual usage&quot; or anything like that.  But interesting none the less...

I&#039;ve uploaded the file here if you&#039;re interested: http://drop.io/6sotiyj# (tried to get it up on gDocs but it kept crashing)

Savio
IBM WebSphere</description>
		<content:encoded><![CDATA[<p>Ok, thanks for the clarification &#8211; I guess I don&#8217;t really know what you mean by &#8216;reassigned &#8220;Multiple Answers&#8221;:<br />
&gt; and re-assigned “JBoss + Tomcat” and “Multiple Answers”.</p>
<p>In any case &#8211; I looked at the data in two ways.  </p>
<p>Scenario 1: Follow the Replay Analysis Methodology &amp; allow a user to vote for multiple servers (i.e. Total results &gt; Total valid respondents for the question)<br />
- For instance, if respondent A selected WebSphere and respondent B selected JBoss and WebSphere, the total results would be 2 for WebSphere and 1 for JBoss, for a sum of 3 across the total 2 respondents.<br />
- In keeping with the point above (and Replay Solutions&#8217; analysis methodology) I added all of &#8220;TC+JB&#8221; results (161 by my count) to the &#8220;Tomcat&#8221; result and also add 161 to the &#8220;JBoss&#8221; results.<br />
- Total count = 2036 split as follows:</p>
<p>35% Tomcat &amp; JB+TC<br />
17% JBoss &amp; JB+TC<br />
12% WebLogic<br />
20% WebSphere<br />
 5%  Jetty<br />
 1%  Jonas, Jrun, Orion<br />
 1%  Resin<br />
10% Other<br />
100% Total</p>
<p>Not sure why the results don&#8217;t exactly match yours, but I could very easily have made an error (I did the spreadsheet work quickly while on a call!)</p>
<p>Scenario 2: Each valid respondent&#8217;s response is spread across the servers s/he selects (i.e. Total results == Total valid respondents for the question)<br />
- For instance, if respondent A selected WebSphere and respondent B selected JBoss and WebSphere, the total results would be 1.5 for WebSphere and 0.5 for JBoss across the total 2 respondents.<br />
- Total count = 1054 split as follows:</p>
<p>34% Tomcat &amp; TC portion of &#8220;JB+TC&#8221;<br />
12% JBoss &amp; JB portion of &#8220;JB+TC&#8221;<br />
11% WebLogic<br />
24% WebSphere<br />
 4%  Jetty<br />
 1%  Jonas, Jrun, Orion<br />
 1%  Resin<br />
14% Other<br />
100% Total</p>
<p>I am in no way saying that Scenario 2 is a valid representation of &#8220;actual usage&#8221; or anything like that.  But interesting none the less&#8230;</p>
<p>I&#8217;ve uploaded the file here if you&#8217;re interested: <a href="http://drop.io/6sotiyj#" rel="nofollow">http://drop.io/6sotiyj#</a> (tried to get it up on gDocs but it kept crashing)</p>
<p>Savio<br />
IBM WebSphere</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eduardo Pelegri-Llopart</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6252</link>
		<dc:creator>Eduardo Pelegri-Llopart</dc:creator>
		<pubDate>Fri, 09 Apr 2010 19:14:43 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6252</guid>
		<description>Looked at the raw data briefly.  Honestly, given the flaws in the data and that we not even know the sample, it does not seem worth spending much time on this but....

* I don&#039;t think you can reduce multiple-choice answers to flat market shares.

* The 18% &quot;Other&quot; and 4% Blank is a big chuck of the data.  I looked and most of the &quot;Others&quot; are by themselves, not &quot;WLS, WAS, Other&quot;, but just &quot;Other&quot;.

* There is no data on deployment vs development.  Does using Jetty in Maven count the same as using JBoss in production?

*  Given that GlassFish is the largest major player omitted, I could argue a big chunk of that 18% is GF :-).  Maybe even some of the +4%.  Or not.  Who knows.

The survey also has weird sets of questions/answers.  The selection for &quot;Key JavaEE Services/Frameworks&quot; is really strange.

Anyhow, unless I hear something more interesting about the survey, I&#039;m going to ignore it...</description>
		<content:encoded><![CDATA[<p>Looked at the raw data briefly.  Honestly, given the flaws in the data and that we not even know the sample, it does not seem worth spending much time on this but&#8230;.</p>
<p>* I don&#8217;t think you can reduce multiple-choice answers to flat market shares.</p>
<p>* The 18% &#8220;Other&#8221; and 4% Blank is a big chuck of the data.  I looked and most of the &#8220;Others&#8221; are by themselves, not &#8220;WLS, WAS, Other&#8221;, but just &#8220;Other&#8221;.</p>
<p>* There is no data on deployment vs development.  Does using Jetty in Maven count the same as using JBoss in production?</p>
<p>*  Given that GlassFish is the largest major player omitted, I could argue a big chunk of that 18% is GF <img src='http://blog.softwhere.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .  Maybe even some of the +4%.  Or not.  Who knows.</p>
<p>The survey also has weird sets of questions/answers.  The selection for &#8220;Key JavaEE Services/Frameworks&#8221; is really strange.</p>
<p>Anyhow, unless I hear something more interesting about the survey, I&#8217;m going to ignore it&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eduardo Pelegri-Llopart</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6251</link>
		<dc:creator>Eduardo Pelegri-Llopart</dc:creator>
		<pubDate>Fri, 09 Apr 2010 18:19:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6251</guid>
		<description>Besides some of problems in the survey itself (like not listing GlassFish!), the survey says nothing about the population being sampled, and, w/o that, the value of the survey is limited.  It says something, sure, but, what does it say?... 

Surveys!  Even the EDC&#039;s surveys had significant methodological problems, but this one...   BTW, did EDC stop doing AppServer surveys?</description>
		<content:encoded><![CDATA[<p>Besides some of problems in the survey itself (like not listing GlassFish!), the survey says nothing about the population being sampled, and, w/o that, the value of the survey is limited.  It says something, sure, but, what does it say?&#8230; </p>
<p>Surveys!  Even the EDC&#8217;s surveys had significant methodological problems, but this one&#8230;   BTW, did EDC stop doing AppServer surveys?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Cameron McKenzie</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6250</link>
		<dc:creator>Cameron McKenzie</dc:creator>
		<pubDate>Fri, 09 Apr 2010 18:09:17 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6250</guid>
		<description>Great analysis Rich. It&#039;s a great followup to the material.

This is always the problem we have with these types of surveys.</description>
		<content:encoded><![CDATA[<p>Great analysis Rich. It&#8217;s a great followup to the material.</p>
<p>This is always the problem we have with these types of surveys.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sharps</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6249</link>
		<dc:creator>sharps</dc:creator>
		<pubDate>Fri, 09 Apr 2010 17:45:25 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6249</guid>
		<description>You&#039;re correct about &quot;Multi Answers&quot; they are included in the server counts (except &quot;JB+TC&quot; and &quot;Jonas...&quot;) - including it on the bar chart was bad form and threw me.

My numbers simply count the occurences of the &quot;server&quot; - this gives the correct count for everything but the &quot;Jonas...&quot; category - but those numbers are small enough to not make a difference. The only other error is that if &quot;JB+TC&quot; and &quot;Tomcat&quot; were specified I double count - but that only accounts for a small error.

Again - this is a flaw in the questions and presentation - that&#039;s my main point.

My %&#039;s are based on the total &quot;server&quot; responses so add up to 100% and I ignore &quot;no answer&quot;.

- Rich</description>
		<content:encoded><![CDATA[<p>You&#8217;re correct about &#8220;Multi Answers&#8221; they are included in the server counts (except &#8220;JB+TC&#8221; and &#8220;Jonas&#8230;&#8221;) &#8211; including it on the bar chart was bad form and threw me.</p>
<p>My numbers simply count the occurences of the &#8220;server&#8221; &#8211; this gives the correct count for everything but the &#8220;Jonas&#8230;&#8221; category &#8211; but those numbers are small enough to not make a difference. The only other error is that if &#8220;JB+TC&#8221; and &#8220;Tomcat&#8221; were specified I double count &#8211; but that only accounts for a small error.</p>
<p>Again &#8211; this is a flaw in the questions and presentation &#8211; that&#8217;s my main point.</p>
<p>My %&#8217;s are based on the total &#8220;server&#8221; responses so add up to 100% and I ignore &#8220;no answer&#8221;.</p>
<p>- Rich</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Savio Rodrigues</title>
		<link>http://blog.softwhere.org/archives/1016/comment-page-1#comment-6248</link>
		<dc:creator>Savio Rodrigues</dc:creator>
		<pubDate>Fri, 09 Apr 2010 16:40:01 +0000</pubDate>
		<guid isPermaLink="false">http://blog.softwhere.org/archives/1016#comment-6248</guid>
		<description>Hi Rich, agree - I am uneasy with any survey data without understanding the methodology and actual questions asked.

I will look at the data in detail later - but I think you&#039;ve made a mistake by counting &quot;Multi Answers&quot; back into the individual server choices.  ***Doing this double counts the results.***

For eg: If a respondent selected &quot;JBoss, Weblogic&quot; for Question JS1 &quot;Which App Servers will you use in 2010?&quot; the data set puts a &quot;1&quot; under JBoss, a &quot;1&quot; under &quot;WebLogic&quot; and &quot;1&quot; under &quot;Multi Answers&quot;.  

So, the simplest thing to do is ignore &quot;Multiple Answers&quot; when looking at the individual server usage.  

Which should get you back to data as reported by Replay Solutions (except with the &quot;Multiple Answers&quot; row removed).

Savio
IBM WebSphere (but a data geek at times)</description>
		<content:encoded><![CDATA[<p>Hi Rich, agree &#8211; I am uneasy with any survey data without understanding the methodology and actual questions asked.</p>
<p>I will look at the data in detail later &#8211; but I think you&#8217;ve made a mistake by counting &#8220;Multi Answers&#8221; back into the individual server choices.  ***Doing this double counts the results.***</p>
<p>For eg: If a respondent selected &#8220;JBoss, Weblogic&#8221; for Question JS1 &#8220;Which App Servers will you use in 2010?&#8221; the data set puts a &#8220;1&#8243; under JBoss, a &#8220;1&#8243; under &#8220;WebLogic&#8221; and &#8220;1&#8243; under &#8220;Multi Answers&#8221;.  </p>
<p>So, the simplest thing to do is ignore &#8220;Multiple Answers&#8221; when looking at the individual server usage.  </p>
<p>Which should get you back to data as reported by Replay Solutions (except with the &#8220;Multiple Answers&#8221; row removed).</p>
<p>Savio<br />
IBM WebSphere (but a data geek at times)</p>
]]></content:encoded>
	</item>
</channel>
</rss>

