Online anomaly detection in correlated data streams using robust Kalman filtering
Main Article Content
Abstract
The demand for online data analysis opens new challenges and research opportunities. The growing wave of IoT devices, low-cost sensors, and robotic systems generates vast amounts of high-frequency streaming data. Efficient online analysis of such data requires algorithms that operate under memory and latency constraints, often within a sliding-window framework. However, the reliability of these data streams critically affects the accuracy of the inference results.This study considers one of the tasks in streaming data analysis – anomaly detection in the smartphone sensors data streams. Our goal was to improve the quality of the geolocation by filtering out anomalies in the signal and then measure the accuracy of trajectory estimation for pedestrian navigation. Pedestrian navigation in urban environment is non-trivial because of global navigation sattelite system signal distortions. These distortions can be caused by various factors such as multipath effects, signal blockage from tall buildings, and interference, which are common in dense urban areas. The full data pipeline requires robust techniques for smartphone sensor data processing wich include low-pass or high-pass filtering of acceleration signal, synchronizing several streams by the timestamps, converting measurements from the device frame of reference to the global coordinate system, feature enrichments etc. When multiple data streams from device sensors are available, their fusion can be used to mitigate the limitations of individual sources. One of the adopted methods for this is the so-called robust Kalman filter. We compared this method with an ensemble anomaly detection method (iForest) applied to the geolocation data stream in the pedestrian navigation set up. We used orthogonal distance metric to compare predicted trajectories with groud truth coordinates and showed that robust Kalman filter achieves superior performance in the streaming setting. A mean deviation from the ground truth trajectory of one metre and eighty-three centimetres was achieved on the test dataset, with the total route length measuring one hundred eighty-four metres.

