Improving RealEye Accuracy on Mobile: A Study on Calibration Point Count

Martyna Pietrzak
September 29, 2024

Eye tracking on mobile devices is becoming increasingly important as they play a central role in our daily lives, from communication to entertainment and productivity. With the rise of mobile applications and user interface design, understanding user behavior through eye tracking offers valuable insights for enhancing user experience on mobile devices. Accurate eye-tracking technology on mobile platforms can reveal how users interact with their devices, which is essential for developing intuitive and effective applications.

How to accurately calibrate participants on Mobile Devices?

To find the answer to this question, in RealEye we conducted a study in which we tested 4 calibration schemes with 13-, 22-, 27-, and 39-points. The system’s performance was tested across several screen regions and participant data quality. The goal was to find the optimal balance between calibration complexity and accuracy, offering valuable insights for researchers and practitioners conducting eye-tracking on Mobile Devices.

Accuracy by Calibration Schemas

Accuracy and Precision are key metrics used to evaluate how well the eye-tracking system estimates participants' gaze points:

  1. Accuracy [px] – is calculated as the Euclidean distance between the center of the target and the selected fixation.
  2. Precision [px] – is calculated as the standard deviation of the Euclidean distances from the target center to the selected fixations.

In case of Mobile Devices, where the accuracy-measuring task was based on fixations only and not on clicking the target, RealEye decided to calculate the distances between the center of each reference-target, and the center of the longest fixation on the target. This approach was chosen because longer fixations are typically more stable and indicative of focused attention, compensating for the lack of a precise referential point like a click.

Pixels were chosen as an accuracy measurement unit, instead of the more commonly used visual angle, due to the lack of control over participants' distance from the screen and their screen sizes, which are crucial when calculating visual angle. Therefore, using pixels allowed to maintain standard metrics that are independent of these variables, ensuring more consistent and reliable data analysis.

To learn more about RealEye's accuracy methodologies and findings on Computers, we invite you to read our Technology White Paper.

The table below summarizes the accuracy results for each calibration schema for Mobile Devices.

13-point Calibration:

13-point calibration order
  • Accuracy by Screen Area: the accuracy varies across different screen areas, with the central-right diagonal regions achieving the highest accuracy (approximately 44-53 px), while peripheral areas show poorer performance, particularly in the corners, where accuracy drops to 89 px.
  • Accuracy by Quality Groups: across all quality grades, the accuracy remains fairly consistent, ranging from 67.06 px for the highest quality participants to 71.40 px for lower quality grades, indicating little improvement with increased participant quality.
  • Accuracy for at least Good data = 67.24 px (n=30)
  • Implications: This schema is moderately effective, offering reasonable accuracy in central areas but struggling in peripheral regions. It may be suitable for tasks that do not require high precision or focus primarily on central screen areas.

22-point Calibration:

22-point calibration order
  • Accuracy by Screen Area: improvements in accuracy, with values ranging from 46 px in the right corner of the screen to 91 px in the center-left area, which appears to be the area with the poorest accuracy also in 13-point calibration.
  • Accuracy by Quality Groups: the accuracy improves across all participant groups, with the highest value being 63.02 px and the lowest at 59.58 px. The small differences occurring again between quality groups suggest high calibration efficiency.
  • Accuracy for at least Good data = 62.84 px (n=36)
  • Implications: This calibration strikes a balance between the number of calibration points and accuracy achieved. It is a good option for tasks that require moderate accuracy across the screen, with slight compromises in peripheral regions.

27-point Calibration:

27-point calibration order
  • Accuracy by Screen Area: further improvements in accuracy are observed, with central areas achieving up to 37 px. However, peripheral regions continue to show lower accuracy, with the left-center area obtaining again the lowest accuracy amongst all areas, equal to 86 px.
  • Accuracy by Quality Groups: The accuracy remains consistent across participant groups, ranging from 58.89 px to 61.82 px, with some improvement seen for higher-quality participants.
  • Accuracy for at least Good data = 61.82 px (n=43)
  • Implications: This schema offers a better balance of accuracy across the screen compared to the 13- and 22-point calibrations. While peripheral accuracy is still a challenge, it is suitable for tasks where higher precision is needed, especially in the central regions.

39-point Calibration [default]:

39-point calibration order
  • Accuracy by Screen Area: more improvements, with accuracy values ranging from 44 px to max 74 px. Central-right diagonal regions achieve the highest accuracy with 44-48 pixels.
  • Accuracy by Quality Groups: the accuracy is consistent across participant groups, with values ranging from 52.54 px to 56.60 px, indicating that this method is robust.
  • Accuracy for at least Good data = 55.17 px (n=39)
  • Implications: This schema provides the highest accuracy across the screen, with consistent performance across different participant groups. However, its diminishing returns suggest that increasing the number of calibration points beyond 27 may not yield significant improvements.

Comparative Analysis

Accuracy by Position on the Screen

As the number of calibration points increases, the accuracy across screen areas improves. From the 13-point calibration, inaccuracies are prevalent, particularly in the peripheral areas, where the lower right region shows lower accuracy. Moving to the 22-point, 27-point, and 39-point calibrations, accuracy generally improves across both central and peripheral regions.

Accuracy by Data Quality Groups

As the number of calibration points increases, accuracy improves across participant groups for all calibration schemes.

Post-hoc Dunn's Test on min Good data quality

The Kolmogorov-Smirnov test for normality indicated that none of the datasets followed a normal distribution, necessitating the use of the Kruskal-Wallis H-test, which is a non-parametric method that compares distributions without sensitivity to imbalances. Following this, Dunn's test (with Bonferroni adjusted p-values) was used for post-hoc pairwise comparisons.

The post-hoc Dunn's test for minimum Good data quality revealed statistically significant differences in accuracy only between the 13-point and 39-point calibration schemas. No statistically significant differences were found between other calibration methods.

The statistically significant difference observed between the 13-point and 39-point calibrations is primarily due to the substantial increase in the number of reference points, particularly in the more challenging peripheral areas of the screen. Conversely, the lack of significant differences between the 13-point and 27-point calibrations, as well as between the 27-point and 39-point calibrations, can be explained by diminishing returns in accuracy improvements. Both the 27-point and 39-point methods demonstrate satisfactory performance across a wide range of tasks.

Although the 27-point calibration does not show a statistically significant advantage over the 22-point calibration, it is preferred due to its enhanced accuracy. The 27-point method provides more reference points than the 22-point calibration, which helps capture eye movements more effectively, especially in peripheral vision scenarios.

Given these findings, the 27-point calibration stands out as a balanced solution. It provides high accuracy while avoiding the added complexity associated with the 39-point method. Therefore, the 27-point calibration is an optimal choice for various applications in mobile eye tracking, effectively striking a balance between performance and usability.

Recommendations and Conclusions

Based on the results of this study, it is clear that the number of calibration points significantly affects the accuracy of eye-tracking on mobile devices. The 27-point calibration is therefore recommended as the optimal solution because it strikes a balance between accuracy and ease of calibration. It offers:

  • High central accuracy while improving peripheral precision compared to lower-point schemas.
  • Consistency across different participant groups, ensuring reliable data collection.
  • Efficient use of calibration points: Beyond 27 points, the marginal gains in accuracy are minimal, making the additional effort of the 39-point calibration unnecessary for most applications.

Other Blog Posts: