King's College London

File(s) under embargo

Reason: Restricted to UK-based academic institutions. Interested UK-based researchers can apply for access. Please email

Motion Analysis in Simulated Clinical Environments (MA-SCE)

posted on 2024-04-10, 14:40 authored by Yee MahYee Mah, Aryan Esfandiari, Jorge CardosoJorge Cardoso, Parashkev NachevParashkev Nachev


The improved accessibility of high-resolution cameras, and publicly available video footage, have helped advance the fields of pose estimation and action characterisation. Although there is a wealth of colour video depicting a wide range of activities performed in everyday life, the availability of high-quality data within clinical settings is far more limited. Importantly, these technologies are sensitive to prior knowledge of the physical environment and the activities that take place within them. Clinical settings are unusual enough to make seemingly simple activities—such as measuring blood pressure—present complex challenges, especially in the context of the occlusions, wide variation in illumination, and multi-agent interactions characteristic of the clinical domain. Crucially, video capture of real-world clinical scenarios could never be widely available, even at modest scale, owing to insurmountable privacy concerns, except after modification—such as expression-preserving face modification—that presupposes possession of a good model in the first place. Moreover, the minimal performance standard in healthcare is very high, not just in terms of individual-level fidelity but also the invariance across the population that attention to equity of delivered care demands. An individual patient is not rendered less important by being rare.

These constraints leave data from realistically simulated clinical environments, such as those used to train clinical staff, as the only plausible solution. The approach further requires appropriately skilled staff to ensure the actions performed are representative of routine clinical care. Currently there are limited datasets representative of clinical activities, with even fewer providing multiple perspectives or multispectral data.

Our objective here is to create a rich dataset of video samples depicting common ward-based activities within a simulated hospital setting, performed by trained clinical staff. Each task was recorded simultaneous from two different perspectives, with colour, depth and active infra-red data collected.



Two Microsoft Azure Kinect cameras were connected to two independent laptops powered by an Intel Xeon E3-1505 v5 processor with 32GB of RAM and a Samsung NVMe PM951 hard drive; and an Intel i5-8265U processor with 8 GB of RAM and a Kingston NVMe RBU-SNS8154P3, running the Microsoft Windows 10 and Windows 11 operating system respectively.

The two cameras were used to record video footage at 30 frames per second. The colour, depth and active infra-red (IR) stream resolutions were recorded at 1920x1080, 640x576 and 1024x1024 respectively. Using the Microsoft Azure Kinect Software Development Kit, the cameras were linked using the external synchronisation method, with the high-angle-view camera acting as master, and the hip-level-view (side-view) camera as subordinate. The pulse width of the infra-red laser is 125-microseconds. A 200-microseconds offset was applied to the subordinate to minimise interference between the two depth cameras.


Filming was conducted at King’s College Hospital NHS Foundation Trust, at their simulation room and simulation ward environments. The simulation room is arranged to resemble a single occupancy room, while the simulation ward is consistent with a bay located on the ward with multiple beds. In both cases, the cameras were arranged to have only one bed in the centre of the field of view.

Camera setup

Each task was filmed using a two-camera setup, providing a high-angle and hip-level perspectives, with the bed in the centre of the field of view. The two cameras were arranged to encompass the entire bed and a 0.5m surrounding margin.


A total of 21 participants were involved across all the clips, with 14 participants assuming the patient role. The mean age of the patients was 41.4 years (SD 17.2 years, range 23-72 years), while the mean age for all participants was 38.0 years (SD 14.8 years, range 26-43 years) with 13 females and 8 males. Based on the UK 2021 census groupings, the participants encompassed Asian or Asian British; Black, Black British, Caribbean or African; White; and Other ethnic group.


We performed a programme of activities under varying levels of illumination, to replicate conditions typically found within a hospital environment. Participants were given an overview of the tasks required for each clip, but afforded freedom regarding how they were performed to encourage more diversity and natural movements. The full programme of activities is listed in the associated clip description file.

Data files

The video data were stored using the Matroska format (, with the corresponding camera calibration file contained within. No audio data was recorded.


Clinical outcome modelling of rapid dynamics in acute stroke with joint-detail, continuous, remote, body motion analysis

Medical Research Council

Find out more...


Data collection from date


Data collection to date


Collection method

All participants consented to involvement with this project, and ethical approval was granted from King’s College London (MRPP-22/23-34811).



Copyright owner

Yee Mah