Comprehensive Guide to Hampel Filter for Outlier Detection | by Louis Chan

[ad_1]

Step by Step Walkthrough with S&P 500 Index

Disclaimer: This is a deep-dive tutorial of Hampel Filter. S&P 500 Index is just an example to demonstrate how to implement Hampel Filter. It should not be considered as any financial advice.

Outliers can significantly impact the results of data analysis, causing incorrect conclusions and decisions. The Hampel filter is a powerful tool for detecting and handling outliers in data. It is widely used in many fields, including finance, engineering, and statistics.

This tutorial will cover all the details about Hampel Filter to get you started using it. Hopefully, this would also allow you to understand when and why Hampel Filter works/doesn’t work. We will also include a step-by-step walkthrough of applying some basic ML models on the Hampel Filter smoothed S&P 500 Index, so make you read it through to see a real-life application.

Key Topics Discussed (TL;DR):

Step-by-step dissection of Hampel Filter
Key assumptions of Hampel Filter
Apply Normal and Laplacian Hampel Filter on S&P 500 Index
Compare ML results on raw S&P 500 Index and Hampel Filter smoothed S&P 500 Index
Reflection on the walkthrough

Hampel filter is an algorithm that tries to smooth out a data series by replacing outliers with an appropriate value.

It calculates the “appropriate value” using some parameters in the following steps:

Parameter 1: the size of the sliding window, m. This will be symmetrical. i.e. if m=2 (illustrated in the diagram above), the actual sliding window size is 2+2+1 = 5.
Parameter 2: outlier threshold n in terms of multiples of the rolling standard deviation.
Parameter 3: scaling constant k for estimating rolling standard deviation using median absolute deviation. As this is assumed to be 1.4826 most of the time, this constant is often overlooked when applying Hampel Filter.

Note: The need for a scaling constant is such that E(k MAD) ≈ E(S) ≈ 𝜎 where 𝜎 is the population standard deviation, S is sample standard deviation and MAD is median absolute deviation.

Step 1: Find the rolling median of the sliding window. Repeat that for every timestamp t in the data series.
Step 2: Calculate the sliding window’s rolling Median Absolute Deviation (MAD) of the sliding window using the formula below. Repeat that for every timestamp t in the data series.

Image by Author

Step 2: Estimate the rolling standard deviation as rolling MAD multiplied by the scaling constant. Repeat that for every timestamp t in the data series.

Image by Author

Step 3: Calculate the difference between the data point at the middle of the sliding window (i.e. x_t) and the rolling median.
Step 4: Whenever the difference calculated in step 3 is greater than the predefined multiple of the rolling standard deviation, replace the data points with the rolling median.

While the algorithm sounds simple enough, there are a couple of critical points that can go unnoticed easily:

It uses Median Absolute Deviation and Median instead of Standard Deviation and Mean. This makes Hampel filter more robust to outliers than one that uses mean. Standard deviation (like mean) is also easily skewed by outliers. When our sample data contains outliers, using mean and standard deviation for denoising could mean that we are accepting some of the outliers

Mean is more susceptible to outliers than Median.

Median & Median Absolute Deviation are more robust to outliers and hence can identify outliers better — Image by Author

Hampel Filter works better with symmetrical datasets. Like any denoising algorithm that defines outliers as data points that deviate too much from the “middle” of the dataset, it will not perform as well when it is too skewed. This may not be an issue, as it could be that the volatility of data is naturally higher at certain periods. But the pattern in the rolling standard deviation and skewness/kurtosis are definitely metrics that we should keep an eye on when applying Hampel Filter.

Outlier detection using median absolute deviation is not as performant when the dataset is less symmetrical — Image by Author.

The scaling constant of 1.4826 assumes data to be Gaussian-like. This is arguably one of the most forgotten assumptions of the Hampel Filter. As mentioned before, the scaling constant exists such that the MAD can be scaled to approximate the population standard deviation (E(k MAD) ≈ E(S) ≈ 𝜎). In a normal distribution, k is estimated to be the reciprocal of the 75th percentile (i.e., 1/0.67449 ≈ 1.4826). But that may not be the same for other distributions. Simulating 1,000 samples of 1,000 each for Uniform distribution, Laplace distribution, and Exponential distribution, below is the code and the results of the estimated scaling constants: