Quantitative Influence and Performance Analysis of Virtual Reality Laparoscopic Surgery Training System | BMC medical training
Fifty-one participants aged 17 to 27 years (22.30 ± 2.79) were recruited in this study. The hospital ethics committee approved our experiments. There were 20 men and 31 female medical students. 2 people were left handed and the others were right handed. 41 participants had never experienced virtual reality and 10 participants had played virtual or augmented reality (AR) games once or twice before this study. 15 people (29.41%) played games on PC or mobile phones every day. Only 8 (15.69%) participants rarely played games in their daily life. All participants had never experienced a virtual reality-based laparoscopic surgery simulator before this study. In addition, not all participants had undergone laparoscopic surgery before. All participants were divided into two groups. The first group (control group) is composed of 10 participants (5 men and 5 women). The second group (experimental group) is composed of 41 participants (15 men and 26 women).
The pre-test and the post-test were carried out using a training box. The commercial laparoscopic physical training box (38 × 27 × 27 cm) is shown in Fig. 1. A camera has been placed in the drive box. As shown in Fig. 2, participants were required to perform four tests (three fundamental laparoscopic surgery skill training tasks: ankle transfer, bean picking and donning skill practice, a colon resection task). As shown in Figure 1, electroencephalography (EEG) data were collected using a four-channel dry electrode headset (Muse 2, InteraXon Inc.). Heart rate was recorded using a Polar H10 heart rate chest strap.
The VR laparoscopic simulator (Fig. 3) was developed by the State Key Lab of VR Tech & Syst at Beihang University. The simulator consists of two main components. The first component is the calculation module, a powerful PC connected to a touch screen. The second component is the simulation module, which contains two surgical manipulators connected with haptic devices and a navigation camera in a box. Two foot pedals can be used to enable electrosurgical coagulation during surgical training.
The whole experiment consists of three main steps. First, all participants had to perform a pre-test on a training box. The pre-test contains three fundamental surgical skill tasks (pre-basic task, pre-FT) and one colon resection task (pre-column resection task, pre-CRT). EEG, heart rate and the whole procedure were recorded. Second, the experimental group was asked to perform the same type of tasks on VRLS. Everyone had to complete four trials in a week and each test lasted about 30 minutes. The control group received no training related to surgery. Finally, all participants performed the post-test (Post-FT and Post-CRT) which is identical to the pre-test.
After completing all experiments, participants were asked to complete four questionnaires regarding cognitive load and flow experience. To explore the influence of VRLS on participants, three classic scales (Pass  NASA-TLX WP scale ) were used to measure cognitive load. To calculate the flow experience during experiments, the EGame scale Cheng scale  have been combined and redesigned based on our investigations. We analyzed the validity of the questionnaire, then carried out the pretest, the reliability test and the validity test. In this questionnaire, the Cronbach’s alpha coefficient reaches 0.804 (base: 0.6), while the KMO and Bartlett coefficients are 0.729 and sig=0.000. The reliability of the balance is satisfactory, guaranteeing the solidity and adequacy of the instruments.
In this study, three types of data were obtained. The first was the performance scores calculated from recorded videos according to the Global Operative Assessment of Laparoscopic Skills (GOALS) standards for colon resection tasks and our designed measurement rules (e.g., execution time , number of errors, etc.) for basic surgical skill tasks. . Five medical experts were asked to assess performance scores anonymously by viewing the videos of all participants. The final performance score for each participant was an average of 5 performance scores. The second was self-reported scores, including cognitive load scores and flow experience scores calculated from questionnaires. The third was physiological data extracted from heart rate and EEG data. In educational psychology, the EEG can measure the neural response to changing levels of cognitive stimuli, making it the most appropriate measure for the assessment of cognitive load. EEG has been used for cognitive task load measurement and data analysis for over a decade .
Performance scores and physiological data must be processed before obtaining meaningful information. Fundamental surgical skill tasks and colon resection tasks were measured from different dimensions for performance scores. The performance of fundamental surgical skills was measured from 7 dimensions including execution time, number of failures in ankle transfer and bean picking, number of rope drops, smoothness of movement, depth perception and bimanual dexterity. The GOALS standard measured laparoscopic skills in 4 aspects: depth perception, bimanual dexterity, efficiency, tissue manipulation and autonomy.
After filtering the EEG data according to four data quality indicators (1, good, 2, fair, >=3, bad), the physiological data could be processed from time, frequency and nonlinear domains. We calculated the average, minimum and maximum heart rate of each participant for the heart rate data. To study the change in cognitive load between the pre- and post-test, participants’ cognitive load score was calculated using EEG data according to . Here is the process of extracting cognitive load from the EEG, segmented into baseline and stimulus epochs. These epochs were then processed using the S-transform for each sensor. The resulting time-frequency planes were then processed to extract the gravity frequency and energy density for the theta and alpha frequency bands at each epoch. These values were combined in the cognitive analysis, resulting in a single cognitive load time series for each sensor. These time series were then combined by spatially aware averaging to form the overall cognitive load for the trial.
After pre-processing all data, SPSS (Statistics V.25) was used to analyze our calculated scores. We used the classic paired-samples t-test to test for differences between the before and after tasks. A p– the value