Optimizing RealEye Calibration on Computers: Point Count and Accuracy

Martyna Pietrzak
September 24, 2024

How many calibration points are needed for accurate eye-tracking on Computers?

In webcam eye-tracking, accurate calibration is essential for reliable data in research and usability studies. The number of calibration points can influence gaze estimation precision, particularly across different screen areas. To explore this, we analyzed five calibration schemes in RealEye: 5-, 13-, 21-, 39-, and 78- points, evaluating their performance across screen regions and participant data quality. The goal is to find the optimal balance between calibration complexity and accuracy, offering valuable insights for researchers and practitioners.

Validation task description

During the validation task, 13 white targets appeared in a random order against a mid-grey background, starting from the center. Participants’ task was to click the center of each target while fixating on it, within a 10-second period.

Evaluation metrics

Accuracy and Precision are key metrics used to evaluate how well the eye-tracking system estimates participants' gaze points:

  1. Accuracy [px] – is calculated as the Euclidean distance between the center of the target and the selected fixation.
  2. Precision [px] – is calculated as the standard deviation of the Euclidean distances from the target center to the selected fixations.


Pixels were chosen as an accuracy measurement unit, instead of the more commonly used visual angle, due to the lack of control over participants' distance from the screen and their screen sizes, which are crucial when calculating visual angle. Therefore, using pixels allowed to maintain standard metrics that are independent of these variables, ensuring more consistent and reliable data analysis.

Data processing pipeline

The data was processed following the steps outlined in the diagram below.

During a viewing task, slight drifts from the intended target can occur in fixations due to factors such as eye fatigue or minor shifts in attention. By selecting only the closest fixation, the impact of this drift is minimized, allowing the focus to remain on the fixation that most accurately corresponds to the target location. The analyses were then conducted using the "closest_fixation" methodology to highlight RealEye's optimal performance.

Outliers were kept in the data analysis, which could have led to inflated results. Consequently, the real values are likely higher than those described below.

To learn more about RealEye's accuracy methodologies and findings, we invite you to read our Technology White Paper.

Accuracy by calibration schema

The table below summarizes the accuracy results for each calibration schema.

5-point Calibration:

5-point calibration order
  • Accuracy = 137.10 px (n=47)
  • Accuracy by Screen Area: the accuracy varies across different screen areas, with left-hand side regions tending to have better accuracy (of up to 96 px), and peripheral areas having poorer accuracy (especially bottom corners = 223 and 168 px).
  • Implications: this method may lack consistency, making it unsuitable for tasks requiring high precision.

13-point Calibration:

13-point calibration order
  • Accuracy = 127.14 px (n=46)
  • Accuracy by Screen Area: improved accuracy in most areas, with deterioration in 4 areas compared to the 5-point calibration. Corners however still show inaccuracies, with the bottom corners reaching to 190 / 202 px.
  • Implications: this schema may be suitable for tasks where high precision is not critical.

21-point Calibration:

  • Accuracy = 112.90 px (n=53)
  • Accuracy by Screen Area: further improvements in accuracy, with values ranging from 77-82 px in the top corners of the screen to 135-151 px in the bottom corners.
  • Implications: this schema strikes a semi-good balance between the number of calibration points and the accuracy achieved. It may be suitable for tasks focused on central screen areas, that require average accuracy.

39-point Calibration [default]:

  • Accuracy = 105.76 px (n=102)
  • Accuracy by Screen Area: this calibration shows substantial improvements, with similar accuracy across all screen areas averaging < 100 px, except the bottom corners with 150 / 151 px.
  • Implications: this schema provides very good accuracy across most screen areas, making it highly suitable for tasks requiring both central and peripheral accuracy.

78-point Calibration:

  • Accuracy = 92.23 px (n=48)
  • Accuracy by Screen Area: this calibration offers the highest accuracy among all methods, with values from 71 px to the maximum of 116 px. This method ensures high accuracy across nearly all screen areas, with particularly strong performance in both central and peripheral regions.
  • Implications: this schema is the most accurate and reliable method, making it ideal for tasks that require precise eye-tracking across the entire screen. Its performance involves the increased complexity and time required for calibration.

Comparative Analysis

Accuracy by Position on the Screen

As the number of calibration points increases, the accuracy across the screen areas improves as shown in the table above. The 5-point and 13-point calibrations show significant inaccuracies, particularly in peripheral areas. In contrast, the 21-, 39-point and 78-point calibrations achieve much better accuracy across both central and peripheral regions. This trend indicates that increasing the number of calibration points leads to better spatial accuracy.

Pair-wise comparisons of calibration groups

The Kolmogorov-Smirnov test for normality was performed on all datasets and indicated that none followed a normal distribution. Therefore, the Kruskal-Wallis H-test was used to compare all groups, as it is non-parametric and can handle groups of different sizes without sensitivity to imbalances. Following this, Dunn's test (with Bonferroni adjusted p-values) was conducted for post-hoc pairwise comparisons.

  • 5_CalibPoints stands out as being significantly different from the other groups, especially compared to 21_CalibPoints, 39_CalibPoints, and 78_CalibPoints.
  • 13_CalibPoints is mostly similar to the other groups, with only a significant difference to 78_CalibPoints.
  • 78_CalibPoints has significant differences with the other groups, showing distinctiveness compared to the lower calibration points (5, 13, and 21).

The fact that 39_CalibPoints is significantly different from 5_CalibPoints but not from 13_CalibPoints or 21_CalibPoints, while still offering the best accuracy among these, implies that 39_CalibPoints represents an optimal middle ground, offering a strong performance without the potential overfitting or instability associated with very low or high calibration points.

Conclusions

The analysis shows that increasing the number of calibration points leads to better accuracy in eye-tracking data. However, the choice of calibration method should be based on the specific requirements of the task at hand. The 39-point calibration offers a balanced approach, suitable for a wide range of applications, while the 21-point calibration might be a more practical option for less demanding tasks. The 78-point calibration provides the highest accuracy and consistency, making it the best choice for tasks requiring precise and reliable data across the entire screen.

The 39-point calibration therefore stands out as the best overall choice, offering an optimal balance between accuracy, complexity, and time efficiency. In a variety of eye-tracking tasks, this method consistently delivers strong performance, particularly in user experience research and behavioral studies, where both precision and practicality are crucial.

Other Blog Posts: