Position within the page tree

Home
Institute
Team
Hedeshy, Ramin

Ramin Hedeshy

M.Sc.

Researcher
KI
Analytic Computing

Contact

Subject

I am a PhD candidate and scientific employee at Stuttgart University's AC department, working under the supervision of Prof. Dr. Steffen Staab. My research focuses on human-computer interaction, particularly multimodal interaction combining eye tracking with touch or non-lexical voice input.

Currently, I am working on the EXIST-funded project Semanux, which aims to make the digital world more inclusive by enabling people with disabilities to control the computer via their individual capabilities.

I have published papers on novel methods of eye typing at ACM CHI and ACM ETRA. In addition, I tutor courses in Human-Computer Interaction, Information Retrieval, and Machine Learning at the University of Stuttgart and supervise student theses. Prior to pursuing my PhD, I gained seven years of industry experience working for companies such as Bliksund in Norway and Union Betriebs-GmbH in Bonn, where I contributed to various web-oriented IT projects, including a rules repository system for CDU and the personal homepage of Angela Merkel.

Publications

Hedeshy, R., Menges, R., & Staab, S. (2023). CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice Expressions. Interspeech 2023, August 20--24, 2023. Dublin, Irland.
- Abstract
- BibTeX
Abstract
Non-verbal voice expressions (NVVEs) have been adopted as a means of human-computer interaction in research studies. However, exploring non-verbal voice-based interactions has been constrained by the limited availability of suitable training data and computational methods for classifying such expressions, leading to a focus on simple binary inputs. We address this issue with a new dataset containing 950 audio samples comprising 6 classes of voice expressions. The data were collected from 42 speakers who donated voice recordings. The classifier was trained on the data using features derived from mel-spectrograms. Furthermore, we studied the effectiveness of data augmentation and improved over the baseline model accuracy significantly with a test accuracy of 96.6% in a 5-fold cross-validation. We have made CNVVE publicly accessible in the hope that it will serve as a benchmark for future research.
BibTeX
@inproceedings{hedeshy2023cnvve, abstract = {Non-verbal voice expressions (NVVEs) have been adopted as a means of human-computer interaction in research studies. However, exploring non-verbal voice-based interactions has been constrained by the limited availability of suitable training data and computational methods for classifying such expressions, leading to a focus on simple binary inputs. We address this issue with a new dataset containing 950 audio samples comprising 6 classes of voice expressions. The data were collected from 42 speakers who donated voice recordings. The classifier was trained on the data using features derived from mel-spectrograms. Furthermore, we studied the effectiveness of data augmentation and improved over the baseline model accuracy significantly with a test accuracy of 96.6% in a 5-fold cross-validation. We have made CNVVE publicly accessible in the hope that it will serve as a benchmark for future research.}, address = {Dublin, Irland}, author = {Hedeshy, Ramin and Menges, Raphael and Staab, Steffen}, booktitle = {Interspeech 2023, August 20--24, 2023. Dublin, Irland}, eventdate = {August 20-24}, eventtitle = {Interspeech 2023}, title = {CNVVE: Dataset and Benchmark for Classifying Non-verbal Voice Expressions}, year = 2023 }
Hedeshy, R., Kumar, C., Lauer, M., & Steffen, Staab. (2022). All Birds Must Fly: The Experience of Multimodal Hands-free Gaming with Gaze and Nonverbal Voice Synchronization. INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI ’22), November 7--11, 2022, Bengaluru, India. https://doi.org/10.1145/3536221.3556593
- BibTeX
- Link
BibTeX
@inproceedings{10.1145/3536221.3556593, address = {Bengaluru, India}, author = {Hedeshy, Ramin and Kumar, Chandan and Lauer, Mike and Steffen, Staab.}, booktitle = {INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI '22]), November 7--11, 2022, Bengaluru, India}, doi = {10.1145/3536221.3556593}, title = {All Birds Must Fly: The Experience of Multimodal Hands-free Gaming with Gaze and Nonverbal Voice Synchronization}, url = {https://doi.org/10.1145/3536221.3556593}, year = 2022 }
Link
https://doi.org/10.1145/3536221.3556593
Hedeshy, R., Kumar, C., Menges, R., & Staab, S. (2021). Hummer: Text Entry by Gaze and Hum. CHI Conference on Human Factors in Computing Systems (CHI ’21), May 8--13, 2021, Yokohama, Japan. https://doi.org/10.1145/3411764.3445501
Abstract
Text entry by gaze is a useful means of hands-free interaction that is applicable in settings where dictation suffers from poor voice recognition or where spoken words and sentences jeopardize privacy or confidentiality. However, text entry by gaze still shows inferior performance and it quickly exhausts its users. We introduce text entry by gaze and hum as a novel hands-free text entry. We review related literature to converge to word-level text entry by analysis of gaze paths that are temporally constrained by humming. We develop and evaluate two design choices: ``HumHum'' and ``Hummer.'' The first method requires short hums to indicate the start and end of a word. The second method interprets one continuous humming as an indication of the start and end of a word.
BibTeX
@inproceedings{10.1145/3411764.3445501, abstract = {Text entry by gaze is a useful means of hands-free interaction that is applicable in settings where dictation suffers from poor voice recognition or where spoken words and sentences jeopardize privacy or confidentiality. However, text entry by gaze still shows inferior performance and it quickly exhausts its users. We introduce text entry by gaze and hum as a novel hands-free text entry. We review related literature to converge to word-level text entry by analysis of gaze paths that are temporally constrained by humming. We develop and evaluate two design choices: ``HumHum'' and ``Hummer.'' The first method requires short hums to indicate the start and end of a word. The second method interprets one continuous humming as an indication of the start and end of a word.}, address = { Yokohama, Japan}, author = {Hedeshy, Ramin and Kumar, Chandan and Menges, Raphael and Staab, Steffen}, booktitle = {CHI Conference on Human Factors in Computing Systems (CHI '21), May 8--13, 2021, Yokohama, Japan}, isbn = {978-1-4503-8096-6/21/05}, title = {Hummer: Text Entry by Gaze and Hum}, url = {https://doi.org/10.1145/3411764.3445501}, year = 2021 }
Link
https://doi.org/10.1145/3411764.3445501
Hedeshy, R., Kumar, C., Menges, R., & Staab, S. (2020). GIUPlayer: A Gaze Immersive YouTube Player Enabling Eye Control and Attention Analysis. ETRA ’20 Adjunct: 2020 Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, June 2-5, 2020, Adjunct Volume, 1:1–1:3. https://doi.org/10.1145/3379157.3391984
- BibTeX
- Link
BibTeX
@inproceedings{hedeshy2020giuplayer, author = {Hedeshy, Ramin and Kumar, Chandan and Menges, Raphael and Staab, Steffen}, bibsource = {dblp computer science bibliography, https://dblp.org}, booktitle = {{ETRA} '20 Adjunct: 2020 Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, June 2-5, 2020, Adjunct Volume}, crossref = {DBLP:conf/etra/2020a}, doi = {10.1145/3379157.3391984}, pages = {1:1--1:3}, title = {GIUPlayer: {A} Gaze Immersive YouTube Player Enabling Eye Control and Attention Analysis}, url = {https://doi.org/10.1145/3379157.3391984}, year = 2020 }
Link
https://doi.org/10.1145/3379157.3391984
Kumar, C., Hedeshy, R., MacKenzie, I. S., & Staab, S. (2020). TAGSwipe: Touch Assisted Gaze Swipe for Text Entry. CHI ’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020, 1–12. https://doi.org/10.1145/3313831.3376317
- BibTeX
- Link
BibTeX
@inproceedings{kumar2020tagswipe, author = {Kumar, Chandan and Hedeshy, Ramin and MacKenzie, I. Scott and Staab, Steffen}, bibsource = {dblp computer science bibliography, https://dblp.org}, booktitle = {{CHI} '20: {CHI} Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020}, crossref = {DBLP:conf/chi/2020}, doi = {10.1145/3313831.3376317}, pages = {1--12}, title = {TAGSwipe: Touch Assisted Gaze Swipe for Text Entry}, url = {https://doi.org/10.1145/3313831.3376317}, year = 2020 }
Link
https://doi.org/10.1145/3313831.3376317

Teaching

HCIIR SS2021
Machine learning Tutorial SS2020

Project

Semanux
Semanux is developing technologies that make it possible to operate a computer via a combination of various input means, mostly eliminating the need for a mouse and a keyboard. More info at www.semanux.com
MICME
The MICME project aims to combine different technologies from gesture recognition, eye tracking, voice control, and AR/VR technology into a system that can be used in the operating room.

Ramin Hedeshy

Contact

Subject

Publications

Abstract

BibTeX

BibTeX

Link

Abstract

BibTeX

Link

BibTeX

Link

BibTeX

Link

Teaching

Project

Audience

Formalities

Services

Organization

Ramin Hedeshy

Contact

Subject

Publications

Abstract

BibTeX

BibTeX

Link

Abstract

BibTeX

Link

BibTeX

Link

BibTeX

Link

Teaching

Project

Here you can reach us

Audience

Formalities

Services

Organization