There are several different functions and methods available in R to
compute statistics or, more generally, apply functions to groups of
observations within a data set. The tapply() function is one such
function, but there are others that are more flexible and powerful.
These approaches produce results similar to those obtained in SAS when
using a BY statement to generate results in each of several "by-groups".
In this series of three videos we talk about three approaches for such
"by-processing" in R.
In this "Part 2" video, we introduce the summaryBy() function from the doBy package. This function is similar to aggregate(), but it allows more than one function to be applied to blocks of a data frame, which is an advantage. In addition, it has an optional id= argument (similar to the ID statement in SAS' PROC MEANS) that allows variables to be copied to the resulting data frame without change, which is useful.
The summaryBy() function
is more flexible than aggregate(), but perhaps not as elegant or easy to use as the method covered in Part 3 in this series of videos. I recommend watching this video to learn about it. But on the other hand, it is not
necessary to know all three methods of by-processing covered in this
series of videos. So you could skip this video and concentrate on "By-Processing - Part 3" if you strongly wish to
save time.