SPSS and Big Data, really ?

Posted on 24 Jul 2013 in   Big Data, SPSS

SPSS  (Statistical Product and Services Solution) is still deeply entrenched in  Financial and Market research companies.  The social science research tool is still used by 95% of health, government, education and market researchers. The question nowadays is whether this can be used with the big data problem set.

IBM, obviously will always say that SPSS has Big data capabiliies. This is from there redbooks :

“Internet scale analytics have driven the development of new analytic platform architectures. Hadoop and MapReduce provide a simple, but powerful framework, for parallel analytics. Complex, high computation per record analytics, such as the IBM SPSS® software predictive modeling, can take advantage of the inherent scalability of Hadoop. Examples of SPSS predictive modeling include clustering, latent factor analysis, decision tree, neural nets, and linear regression. In addition, topic analysis, video scene analysis, and semantic analysis of text can use the scalability that Hadoop provides.

From a theoretical standpoint since statistics emphasizes inference, SPSS was originally primarily used as an inference aiding tool. However with the recent emphasize on prediction, my opinion is that we have to understand what other tools are in the same space. These tools could ideally be classified IMHO as Small Data tools. SAS, SPSS, Weka and the R language – allow deep analysis of smaller data sets.

In a future blog post, we will do a side by side comparison of SPSS and R.

But even with the fact that the data is not ‘relatively’ big, one of the key pain points for research organizations and even individual analysts is to present the data back to the customer or their manager.  The most common choice is usually Powerpoint where the data is static and its not easily shareable. We at Secondprism have spent some time trying to solve the SPSS publishing problem.  Give us a try!


Start publishing Web dashboards from SPSS