Last post I got the depth data streaming from the Kinect to a connected computer. Since doing that I immediately noticed that the depth data from the Kinect comes back extremely noisy (I will endeavour to upload a video to demonstrate my point in the near future). Not only are the edges of objects ‘lumpy’ rather than smooth (a result of the Kinect’s sensing method), there are depth errors constantly appearing and disappearing from frame to frame. These depth errors are all returned from the camera as having a depth of 0 (in my images these are black areas).
In this old image I have reused you can see several black regions on the image. Some of these, the larger groups, will be stable from frame to frame – for example in the above image, the fireplace under the mirror will be constantly causing errors in the depth measurements. I have yet to find an explanation for this.
There is also a large amount of noise, the smaller, patchier black regions will be noise that will randomly come and go from frame to frame. This is very annoying and far from ideal from an image processing point of view, however it is also something that should be easily fixed in software.
I tried several methods to remedy the problem. The first and simplest method I tried was just to not update pixels if their new value was 0. This was exceptionally cheap and worked surprisingly well, although it did produce some tearing on moving objects as their ‘shadow’ would be incorrectly filled in with their depth when moving in certain directions. I then tried using a weighted average method, in the hope of removing the high frequency noise (which usually lasts no more than a frame or two) while keeping the shadows cast by objects. This worked fairly well, although it was far less effective at removing the noise than the previous method, some still came through and a flickering effect could still be seen, it was just subdued. Additionally a noticeable lag could be observed on moving objects, leaving a ‘ghost trail’ behind them. Finally I tried a neighbourhood analysis method: replacing zero-value pixels with the median of the non-zero pixels in a neighbourhood around it (or not if the neighbourhood contained only zero-value pixels). This was exceptionally expensive (reducing the frame rate to just 1 or 2) and, while it did better than the weighted average approach without producing lag it also left a halo surrounding the shadows.
For the time being I will use the first and most simple method I tried, not updating pixels if their new value is 0. While this has significant problems and introduces actual artefacts into the stream (in place of the shadows) which none of the other methods did, it is extremely effective at removing the noise and is the cheapest method by far. I may look into improving the weighted average approach at a later date as I still believe it has potential.