Senior Data Science: Safe Aging with SPHERE Hosted By AARP Foundation



Missing Data

For some sequences the metadata says the record is 10 seconds long, but there is only accelerometer data for less than 4 seconds. What is going on here?


This is expected behavior.

Data was collected for this experiment in a context where missing data is possible. This is particularily the case for the wearable accelerometers that transmit data in connectionless mode. This transmission protocol was selected to optimize battery life, and unfortunately this can result in missing data.

Raw Video Streams

We see that you provide coarse video data (2D/3D bounding boxes). Will we have access to the raw video data?


Unfortunately not.

In the SPHERE project we concern ourselves with ethical data collection, and in particular the preservation of the anonymity of the volunteers that participate in our data collection camapigns. In order to guarantee that no participant can be identified we only give the extracted features from the video streams.

The bounding boxes provide valuable information for the challenge. The aspect ratio should indicate the pose of the person (are they standing/sitting?), and centre of both the 2D and 3D bounding boxes should where the person is (are they on the sofa, for example), and the trajectory of the bounding box shoud indicate whether the person recorded is walking or stationary.

It is worth noting though, that we should not be over-reliant on the bounding box data as the RGB-D cameras are only located in the living room (lounge), downstairs hallway, and the kitchen.

Skeleton Data

Why do you not provide skeleton data from the video streams? The RGB-D camera that you use can extract skeletons. Why are these not given in the dataset?


In short, given the location of the cameras, the extraction of skeleton data is unreliable.

The RGB-D camera that we use can indeed provide skeletons, and skeletons are most reliable when the a person is facing the camera at a distance of approximately 2.5 metres, and with the camera approximately at mid-height. In our deployment context however, the cameras are over 2 metres off the ground, are downward facing and the participants will rarely face camera directly. Consequently, extracted skeletons are unreliable so they are not given. However, extracted bounding boxes are reliable and were validated manually.

It is worth saying that the cameras were placed carefully in order to maximize coverage in all of the rooms.

Interpretation of targets

I can see from the metadata that two annotators annotated sequence 00007, so I would have expected that all of the probabilistic labels in targets.csv would come from the set |${0.0, 0.25, 0.5, 0.75, 1.0}$|. However, when I look in targets.csv, I see that most of the labels are not. Why is this?


The easiest way to understand the numbers in targets.csv is to first assume there is only one annotator, and to look at one activity. As an example, let us imagine that for an arbitrary row (which defines a one second time window) the target value given for walking was 5%, ie |$P(\texttt{a_walk})=0.05$|. The way to interpret this value is as follows: 5% of the window in consideration has been labelled as a_walk by the annotator.

With multiple annotators, therefore, the interpretation changes slightly. Using the same values as before, with multiple annotators we would interpret |$P(\texttt{a_walk})=0.05$| as meaning that when averaged over all annotators, 5% of the window was labelled a_walk.

Feature Extraction Window Length

I see that in targets.csv that the targets span windows of one second. Must we extract features over the same time windows? Or are we allowed to extract features from sensor data from windows that are longer than one second?



We do not put any restrictions on the pipelines that you can use, and this includes the window lengths for feature extraction. We are happy for you to treat the data in the manner that feels most natural.

It is worth being slightly careful about choosing the window length, however, since some of the activities that are annotated are quite short in duration. In our evaluation criteria we reward the classification of rare events, so choosing a window that best balances long and short duration activities is important. Some competitors are likely to consider using multi-scale windows when extracting features.

Accelerometer Data when Stationary

I'm visualizing the accelerometer data for stationary activities (eg p_lie). I would have expected all axes to report values close to 0g during these activities, but they are not. Why is this?


What you are seeing in the data is expected, and it it as a result of the accelerometer measuring the force of earth's gravity. This means that - even when stationary - accelerometers will measure approximately 1g, and this measurement is distributed between the three axes as a function of the static orientation of the accelerometer.

If you wish to have data which is zero-centred when stationary, you could consider taking the difference between successive samples of acceleration (this would represent the fourth derivative of location which is a common preprocessing technique used in accelerometry, and it is called the jerk). This should give you values very close to zero when stationary.

There certainly is value in knowing the static acceleration value on each axis, however, as this should assist in identifying the difference between stationary postures, such as p_lie and p_stand for example.