Statistics

Statistics

Statistical Programs:

  • SPSS (now owned by IBM.  They have an interesting Citrix licensing agreement where you can’t seem to install 1 copy of SPSS (according to our IT person) and you need to essentially buy a server license).
  • SUDAAN (survey statistical software)
  • SAS (used alot in industry)
  • STATA (used alot in graduate education) 

My use with statistics is mainly with the R programming language.

When the question of why should someone use R comes up — it seems that alot of people say because its “free” or “open source”.  My position is that R does some things that other statistical programs cannot.  I first started using R because I wanted to tie statistical analysis into some Java software that we were developing. R provided the ability to interface with java (through the package RServe). 

Things that I have used R for that I probably couldn’t do with other Statistical software:

  • Accessing PubMed references
  • Access to Weka (Java Data Mining software)
  • Integration of R with Java programs

Programming the R Language

The following are good references about programming with the R language.  These references differ from other books on R which may focus on the use of statistics.

Evaluating the Design of the R Language
http://www.cs.purdue.edu/homes/jv/pubs/ecoop12.pdf

Software for Data Analysis: Programming with R (Statistics and Computing)
http://www.amazon.com/Software-Data-Analysis-Programming-Statistics/dp/0387759352

Some topics to be expanded on later.

  • Profiling (TraceR, ProfileR, ParseR)
  • S3 Object Model
  • S4 Object Model
  • Package Creation