Senior Data Science: Safe Aging with SPHERE Hosted By AARP Foundation



The task for this challenge is to recognize activities from the sensor data collected from participants. Here, "activity recognition" is the task of recognizing the posture and movements of the participants whose data was recorded, and our definition most closely aligns with the definition given by the accelerometer community. Three sensor modalities are provided for the prediction task:

  1. Accelerometer - Sampled at 20 Hz;
  2. RGB-D - Bounding box information given to preserve anonymity of participants; and
  3. Environmental - The values of passive infrared (PIR) sensors are given.

All sensors have been synchronized with Network Time Protocol during the recording period. We will discuss each modality and the data formats in greater detail in later sections.

Activity labels

We provide the following 20 activity labels in our dataset:

  1. a_ascend - ascend stairs;
  2. a_descend - descend stairs;
  3. a_jump - jump;
  4. a_loadwalk - walk with load;
  5. a_walk - walk;
  6. p_bent - bending;
  7. p_kneel - kneeling;
  8. p_lie - lying;
  9. p_sit - sitting;
  10. p_squat - squatting;
  11. p_stand - standing;
  12. t_bend - stand-to-bend;
  13. t_kneel_stand - kneel-to-stand;
  14. t_lie_sit - lie-to-sit;
  15. t_sit_lie - sit-to-lie;
  16. t_sit_stand - sit-to-stand;
  17. t_stand_kneel - stand-to-kneel;
  18. t_stand_sit - stand-to-sit;
  19. t_straighten - bend-to-stand; and
  20. t_turn - turn.

The prefix a_ on a label indicates an ambulation activity (i.e. an activity requiring of continuing movement), the prefix p_ indicate static postures (i.e. times when the participants are stationary), and the prefix t_ indicate posture-to-posture transitions. These labels are the target variables that are to be predicted in the challenge.

We provide targets that are probabilistic in nature in order to account inter-annotator disagreements. Each target is a vector of length 20 and aggregates the annotations of all annotators over one second non-overlapping windows. To understand the elements of the target vector, if, for a particular target vector, the activity a_walk is given a value of 0.05, this should be interpreted as meaning that on average the annotators marked 5% of the window as arising from walking.


A team of 12 annotators were recruited and trained to annotate the data sequences. To support the annotation process, a head mounted camera (Panasonic HX-A500E-K 4K Wearable Action Camera Camcorder) recorded 4K video at 25 frames per second to an SD-card. This data is not shared in this dataset, and is used only to assist the annotators. Synchronisation between the clock and the head-mounted camera was achieved by focusing the camera on an NTP-synchronized digital clock at the beginning and end of the recording sequences.

An annotation tool called ELAN was used for annotation. ELAN is a tool for the creation of complex annotations on video and audio resources, developed by the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands.

Together with the activity labels, the annotators also labelled room occupancy from within the house. While performance evaluation is not directly affected by room prediction on this, participants may find that conditioning predictions on room occupancy may be useful for prediction.



Participants wore a device equipped with a tri-axial accelerometer on the dominant wrist, attached using a strap. The device wirelessly transmits data using the standard to several access points (receivers) positioned within the house. The outputs of these sensors are a continuous numerical stream of the accelerometer readings (units of g). Accompanying the accelerometer readings are the that were recorded by each access point (in units of dBm), and these data will be informative for indoor localization. The accelerometers record data at 20 Hz, and the maximum accelerometer range is 8 g. Note: if the signal path between the accelerometer and the access point is shielded, this will result in loss of data.


Video recordings were taken using ASUS Xtion PRO cameras. Automatic detection of humans was performed using the OpenNI library, and false positive detections were manually removed by the organizers by visual inspection. Three cameras are installed in the house, and these are located in the living room, hallway, and the kitchen. No cameras are located elsewhere in the residence.

In order to preserve the anonymity of the participants the raw video data are not shared. Instead, the coordinates of the 2D bounding box, 2D centre of mass, 3D bounding box and 3D centre of mass are provided.

The units of 2D coordinates are in pixels (i.e. number of pixels down and right from the upper left hand corner) from an image of size 640x480 pixels. The coordinate system of the 3D data is axis aligned with the 2D bounding box, with a supplementary dimension that projects from the central position of the video frames. The first two dimensions specify the vertical and horizontal displacement of a point from the central vector (in millimetres), and the final dimension specifies the projection of the object along the central vector (again, in millimetres).

Environmental Sensors

The environmental sensing nodes are built on development platforms (Libelium, with CE marking), powered by batteries or/and converted from mains. sensors are employed to detect presence in the data. Values of 1 indicate that motion was detected, whereas values of 0 mean that no motion was detected.

House Layout

The below two images show the floorplan for the ground and first floor of the house.

The nine rooms that are located here are:

  1. bath;
  2. bed1;
  3. bed2;
  4. hall;
  5. kitchen;
  6. living;
  7. stairs;
  8. study; and
  9. toilet.

RGB-D cameras are located in the living room, downstairs hallway, and kitchen. PIR sensors are located in all rooms. The access points are located in the living room, kitchen, staircase, and bedroom.

Data description

Training data and testing data can be found in the ‘train’ and ‘test’ subdirectories respectively. The recorded data are collected under unique codes (each recording will be referred to as a ‘data sequence’). Timestamps are rebased to be relative to the start of the sequences, i.e. for a sequence of length 10 seconds, all timestamps will be within the range 0-10 seconds.

Each data sequence contains the following files:

  • targets.csv (available only with training data)
  • pir.csv
  • video_hallway.csv
  • video_living_room.csv
  • video_kitchen.csv

The following files are also available within the training sequences:

  • annotations_*.csv
  • locations_*.csv

The data from annotations_*.csv is used to create the targets.csv file, and locations_*.csv files are available for participants that want to model indoor localization. These are only available for the training set.

We have created a number of useful scripts for loading, iterating and visualising the sensor data. These scripts — together with other baseline scripts — can be found in our GitHub Repository. All visualizations were created with this Python file.

Main Data Files

targets.csv (available in train only)

This file contains the probabilistic targets for classification. Multiple annotators may have annotated each sequence, and this file aggregates all of the annotations over one second windows. The mean duration of each label within this window is used as the target variable.

The following 20 activities are labelled:

annotation_names = ('a_ascend', 'a_descend', 'a_jump', 'a_loadwalk', 'a_walk', 'p_bent', 'p_kneel', 'p_lie', 'p_sit', 'p_squat', 'p_stand', 't_bend', 't_kneel_stand', 't_lie_sit', 't_sit_lie', 't_sit_stand', 't_stand_kneel', 't_stand_sit', 't_straighten', 't_turn')`

The prefix a_ on a label indicates an ambulation activity (i.e. an activity requiring of continuing movement), the prefix p_ indicate static postures (i.e. times when the participants are stationary), and the prefix t_ indicate posture-to-posture transitions.

This file contains of 22 columns:

  • start - The starting time of the window
  • end - The ending time of the window
  • targets - Columns 3-22: the 20 probabilistic targets.

The target files are generated with this Python script.

pir.csv (available for train and test)

This file contains the start time and duration for all PIR sensors in the smart environment. A PIR sensor is located in every room:

pir_locations = ('bath', 'bed1', 'bed2', 'hall', 'kitchen', 'living', 'stairs', 'study', 'toilet')

The columns of this CSV file are:

  • start - the start time of the PIR sensor (relative to the start of the sequence)
  • end - the end time of the PIR sensor (relative to the start of the sequence)
  • name - the name of the PIR sensor being activated (from the above list)
  • index - the index of the activated sensor from the pir_locations list starting at 0

Below, we show an example of PIR signals for one training record. The axis on the left lists the roooms in the house, and the black lines indicate the time during which the PIR sensors have been activated. The blue and green horizontal lines indicate the room occupancy labels as given by the two nnotators that labelled this sequence.

acceleration.csv (available for train and test)

The acceleration file consists of eight columns:

  • t: this is the time of the recording (relative to the start of the sequence)
  • x/y/z: these are the acceleration values recorded on the x/y/z axes of the accelerometer.
  • Kitchen_AP/Lounge_AP/Upstairs_AP/Study_AP: these specify the received signal strength indicator (RSSI) of the acceleration signal as received by the access kitchen/lounge (ie living room)/upstairs/study access points. Empty values indicate that the access point did not receive the packet.

Below, we show an example of acceleration and signals for one training record. The continuous blue, green and red line traces indicate the accelerometer values recorded by the access points. The horizontal lines indicate the ground-truth as provided by the annotators (two annotators annotated this record, and their annotations are depicted by the green and blue traces respectively

video_*.csv (available for train and test)

The following columns are found in the video_hallway.csv, video_kitchen.csv and video_living_room.csv files:

  • t: The current time (relative to the start of the sequence)
  • centre_2d_x/centre_2d_y: The x- and y-coordinates of the center of the 2D bounding box.
  • bb_2d_br_x/bb_2d_br_y: The x and y coordinates of the bottom right (br) corner of the 2D bounding box
  • bb_2d_tl_x/bb_2d_tl_y: The x and y coordinates of the top left (tl) corner of the 2D bounding box
  • centre_3d_x/centre_3d_y/centre_3d_z: the x, y and z coordinates for the center of the 3D bounding box
  • bb_3d_brb_x/bb_3d_brb_y/bb_3d_brb_z: the x, y, and z coordinates for the bottom right back corner of the 3D bounding box
  • bb_3d_flt_x/bb_3d_flt_y/bb_3d_flt_z: the x, y, and z coordinates of the front left top corner of the 3D bounding box.

Below, we show an example 3D centre of mass data is plotted for the hallway, living room and kitchen cameras. Room occupancy labels are overlaid on these, where we can see strong correspondence between the detected persons and room occupancy, i.e. when the participant is in the kitchen, bounding box data is shown in the kitchen trace.

Supplementary files

The following two sets of file need not be used for the challenge, but are included to facilitate users that wish to perform additional modelling of the sensor environment.

locations_*.csv (available in train only)

This labels the room that is currently occupied by the recruited participant. The following rooms are labelled:

location_names = ('bath', 'bed1', 'bed2', 'hall', 'kitchen', 'living', 'stairs', 'study', 'toilet')

locations.csv contains the following four columns:

  • start - the time a participant entered a room (relative to the start of the sequence)
  • end - the time the participant left the room (relative to the start of the sequence)
  • name - the name of the room (from the above list)
  • index - the index of the room name starting at 0

annotations_*.csv (available in train only)

annotations.csv contains the annotations that were provided by the annotators. These files are used to create targets.csv

Each file contains the following:

  • start - the start time of the activity (relative to the start of the sequence)
  • end - the end time of the activity (relative to the start of the sequence)
  • name - the name of the label (from the list of annotation_names)
  • index - the index of the label name starting at 0

Performance metric

Performance is evaluated with the Brier score, i.e.

$$BS = \frac{1}{N}\sum_{n=1}^{N}\sum_{c=1}^{C} w_c (p_{n,c}-y_{n,c})^2$$

where |$N$| is the number of test sequences, |$C$| is the number of classes, |$p_{n,c}$| is the predicted probability of instance n being from class |$c$|, |$y_{n,c}$| is the proportion of annotators that labelled instance |$n$| as arising from class |$c$|, and |$w_c$| is the weight for each class.

We have specified the class weights to place more weight on the classes that are less frequent. The weights can be loaded from the file class_weights.json, and are listed below:

class_weights = [1.35298455691, 1.38684574053, 1.59587388404, 1.35318713948, 0.347783666015, 0.661081706198, 1.04723628621, 0.398865222651, 0.207586320237, 1.50578335208, 0.110181365961, 1.07803284435, 1.36560417316, 1.17024113802, 1.1933637414, 1.1803704493, 1.34414875433, 1.11683830693, 1.08083910312, 0.503152249073]

Lower Brier scores indicate better performance, and optimal performance is achieved with a Brier score of 0. In Python, we can compute the Brier score with:

def brier_score(target, predicted, class_weights): 
    return np.power(target - predicted, 2.0).dot(class_weights).mean()

where target is the |$N \times C$| target matrix, predicted is the |$N \times C$| predictions, and class_weights is the vector given above.

The relationship between the prior class distribution and the class weights is shown below: