Predicting Eye Gaze Location on Websites

This thesis focuses on performing effective transfer learning method for eye gaze estimations during website interaction (using web screenshots inputs).

Completed Master Thesis

The analysis of eye gaze data, that is acquired from the human pupil-retinal location
and movement relative to the object of interest, allows the determination of users
attention during their interactions with user interfaces. This analysis has been shown
to be useful in several fields, such as virtual reality, the health domain, and online
teaching. However, in the current state, gaze data is often very small in quantity due
to the complexity of the experiments required for acquisition and is highly platform
dependent. As such, limiting the potential of current gaze analysis that is highly
depend on the scale of gaze data used.
One solution is visual saliency detection, which attempts to predict attention of human
to a given image. By performing proper cross-task model transfers, the automatic
generation of the users gaze (as heatmaps, for instance) from a website screenshot (as
an input substitute) can be realized. An open question is whether existing approaches
applied to gaze heatmap predictions, which have been developed for natural images
so far, carry over to deliver plausible predictions to the user interfaces too. The main
challenge is that the content (including their inherent structures) between natural
images and website screenshots is different.
The objective of this thesis is to perform transfer learning to enable basic saliency
detection models from web screenshots input as a tool to simulate potential eye gaze
locations (heatmaps). Two main datasets would be used to implement the proposed
approach: Salicon and Gaze Mining. Subsequently, the base saliency detection model
will be extended through multi-modal inputs, involving the website layout alongside
the website screenshot input. Then, multi-task setting is to be integrated to improve
model learning by integrating the gaze coordinate prediction with the base eye-gaze
heatmap location prediction tasks. Lastly, using both datasets, the evaluation and
analysis of the model in terms of its accuracy (using several quantitative metrics) is
to be conducted to measure its direct application potential.

Further information:

Eye gaze data, that is acquired from the human pupil-retinal location and movement relative to the object of interest [1], is one crucial information for many interactive applications including business, multimedia, and health [2]. This is due to its inherent property that describes the location and intensity of the persons attention to each of respective task [3].

One recent example of the use of eye gaze data is on the website usability study [4], where this information is essential constituent of the analysis. To date however, the process of obtaining the eye gaze data on this study is still rather expensive and laborious given its manual acquisition. As the consequences, the current existing data are limited in number thus substantially hampers the possibility to perform more throughout analysis.

This master degree thesis will propose a deep learning based model [5] that predicts the location of the eye-tracking metrics, such as eye-gaze heatmap given the screenshots of the website. The transfer -learning technique will be used to capitalise on the learned paramaters from other machine vision task [6], mainly gaze saliency prediction [7], onto the specific charactheristics of the website usability study [8,9]. The expected results of this thesis will be a principally working model for eye-gaze production to potentially facilitate the simulation of the missing eye-gazed data.


1. Knowledge in Machine Learning.

2. Knowledge in Deep Learning.

3. Knowledge in Image Processing.

Dataset and Tools :

1. Salicon : 2. Gazemining : 3. Eye gaze extractions :


[1] Cognolato, Matteo, Manfredo Atzori, and Henning Müller. "Head-mounted eye gaze tracking devices: An overview of modern devices and recent advances." Journal of rehabilitation and assistive technologies engineering 5 (2018): 2055668318773991.

[2] Jyotsna, C., and J. Amudha. "Eye gaze as an indicator for stress level analysis in students." 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, 2018.

[3] Morimoto, Carlos H., and Marcio RM Mimica. "Eye gaze tracking techniques for interactive applications." Computer vision and image understanding 98.1 (2005): 4-24.

[4] Ehmke, Claudia, and Stephanie Wilson. "Identifying web usability problems from eyetracking data." (2007): 119-128.

[5] Goodfellow, Ian, et al. Deep learning. Vol. 1. No. 2. Cambridge: MIT press, 2016.

[6] Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

[7] Cornia, Marcella, et al. "Predicting human eye fixations via an lstm-based saliency attentive model." IEEE Transactions on Image Processing 27.10 (2018): 5142-5154.

[8] Menges, Raphael. (2020). GazeMining: A Dataset of Video and Interaction Recordings on Dynamic Web Pages. Labels of Visual Change, Segmentation of Videos into Stimulus Shots, and Discovery of Visual Stimuli. (Version 1.0.0) [Data set]. Zenodo.

[9] Ghose, U., Srinivasan, A.A., Boyce, W.P. et al. PyTrack: An end-to-end analysis toolkit for eye tracking. Behav Res 52, 2588–2603 (2020).


To the top of the page