By Pawel Rzeszucinski, Codewise.com
Descriptive Statistics can provide great amount of insight about data, however it often lays interesting pitfalls in front of us, sometimes causing misinterpretation of the results. One way of mitigating such risks, is to use a combination of more than one technique to reach an unambiguous conclusion. Today we will see how Kurtosis outputs can be supplemented by Skewness in tackling a very interesting challenge.
In one of my previous posts we saw that Kurtosis is as a robust metric for detection of impulsive content within the data, however “an impulse” can have many faces and Kurtosis does not always seem to be drawing the full picture. In the case study described below I will show how to add one additional ‘brush’, that goes by the name of Skewness, to the image painting process.
The scenario is as follows - a shop recording the number of sold goods as a function of time tries to automatically detect the presence of any abnormal demand.
The amplitude distribution of signals shown in Figure 1 and 2 can be seen in Figure 3 and 4 respectively. There is quite a noticeable change to be noticed. Figure 3 shows an almost identically symmetrical distribution, whereas Figure 4 shows a shape which leans towards the left-hand side of the plot. Side note: despite the left-side leaning, such distribution shape is referred to as a right-skewed distribution, because we are really interested in the relative movement of the mean value. In our case it definitely shifted towards the right side, due to the presence of the prominent impulse.
At first shocked, the Business Analyst quickly discovered what was going on. He referred to the formula of Kurtosis (to be seen in Eq. 1) and noted that all the powers in the equation are even numbers and so Kurtosis may be blind to the differentiation between the ‘above mean’ and ‘below mean’ values. Sufficiently greater impulse in just one direction (as in Figure 2) may happily produce the same result as a symmetrical impulse (Figure 1).
This is where the helping hand of the Skewness comes into play. Its formula can be seen in Eq. 1 :
Where n is the total number of samples in the data, xi is the ith sample within the data and x is the sample mean of the data.
Skewness formula is virtually identical to formula of Kurtosis apart from the powers in numerator and denominator and now the distinction between the ‘above mean’ and ‘below mean’ values becomes possible. Skewness outputs values which are close to 0 for symmetrically distributed signals, values between 0 and 1 for right-skewed (aka positively skewed) signals, and values between 0 and -1 for right-skewed (aka negatively skewed) signals. The shapes of such distributions together with their corresponding relation of mean, median and mode are shown in Figure 5 (taken from ). When applied on signals from Figure 1 and 2, Skewness values are 0.06 and 0.58 respectively. At this point, the Business Analyst will always use Kurtosis with conjunction with Skewness values to be able to not only detect the presence of impulses but also determine the direction of their attack.
Figure 5 taken from 
 Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2008
 Ken Black, Business Statistics: Contemporary Decision Making, John Wiley & Sons, 2009
Bio: Pawel Rzeszucinski received MSc in Computer Science from Cranfield University and MSc in Electronics from Wroclaw University of Technology. He subsequently moved to The University of Manchester where he obtained PhD on project sponsored by QinetiQ related to data analytics for helicopter gearbox diagnostics. Upon returning to Poland he worked as a Senior Scientist at ABB’s Corporate Research Center and a Senior Risk Modeler in Strategic Analytics at HSBC. Currently he is a data Scientist at Codewise.
- Descriptive Statistics: The Mighty Dwarf of data Science
- Descriptive Statistics: The Mighty Dwarf of data Science – Crest Factor
- Descriptive Statistics Key Terms, Explained