Missing Values


Working with Missing Values

When working with time-series data you may have gaps in the target interval where no data was collected. This can cause some issues for our time series prediction algorithms because it is not clear how to interpret that data and causes most algorithms to perform poorly.

If you are cleaning your data before submitting it to Nexosis then it’s best to figure out what a missing value means to you in your domain of expertise and fill in the missing values at the interval you intend to predict.

For example, given a daily dataset:

Timestamp Value
2017-08-13 08:12:00 685.22
2017-08-14 09:10:00 871.29
2017-08-15 08:12:00 358.11
2017-08-17 08:12:00 62.58
...

We’re missing the data for the 16th. This missing data could be an important indicator of the trend from that point. Of course, the more values which are missing the greater the effect.

The Nexosis API makes an inference here when there is a gap between observed values and imputes (creates values) zeroes for the gap at the appropriate interval. In other words, the modified dataset on which we would run predictions would be:

Timestamp Value
2017-08-13 08:12:00 685.22
2017-08-14 09:10:00 871.29
2017-08-15 08:12:00 358.11
2017-08-16 00:00:00 0.00
2017-08-17 08:12:00 62.58
...

   Imputation