loading page

A Deep Learning Approach for Recovering Missing Time Series Sensor Data
  • Yifan Zhang,
  • Peter Fitch,
  • Peter Thorburn
Yifan Zhang

Corresponding Author:yi-fan.zhang@csiro.au

Author Profile
Peter Fitch
Author Profile
Peter Thorburn
CSIRO Ecosystem Sciences Precinct
Author Profile


Wireless sensor networks are increasingly important in monitoring water quality changes. High frequency monitoring can be used to gather water quality information, identify existing problems and improve water quality management activities. However, missing data are unavoidable because of network communication issues, sensor maintenance or failure. Data interpolation is a process for constructing missing values based on known data points. Though traditional methods like polynomial or linear interpolation are widely used in sensor data pre-processing, there are still many limitations. Firstly, current interpolation methods give poor estimations when a continuous number of data within a period of time are missing. Secondly, many interpolation methods reconstruct missing data based on other parameters available at the same time step. When all the data are missing, these methods cannot be used. In our work, we are developing a sequence-to-sequence interpolation model (SIM) for recovering missing data sequences in wireless sensor networks. SIM uses the state-of-the-art sequence-to-sequence architecture. It consists of two parts: an encoder that reads from the source water quality time series data and a decoder that generates the missing data sequences. In our design, Bi-directional Long Short Term Memory Network is used as the encoder and decoder due to its capability in using both past and future information for a given time. The attention mechanism is applied to make the SIM focus on different parts of the input time series when interpolating missing values at different time steps. We evaluated the SIM by using time series data from Queensland government’s water quality monitoring network. Compared to Seasonal-ARIMA, the SIM reduced 23.2% MAE and 40.3% RMSE when recovering missing data in 2 adjacent time steps. The reason for the superior performance is that SIM interpolates missing data based on both the inner relationships between water quality parameters and the accumulated information through time.
Aug 2019Published in IEEE Internet of Things Journal volume 6 issue 4 on pages 6618-6628. 10.1109/JIOT.2019.2909038