Ethan Sheng Lyu
PhD Candidate · University of Hong Kong

This is Ethan Sheng Lyu, a PhD Candidate at Department of Computer Science, School of Computing and Data Science, the University of Hong Kong (HKU), under supervision of Prof. Chenshu Wu and Prof. Chuan Wu. Previously, I got my BSc. of EE from Nanjing University.

Physical AISpatial IntelligenceMulti-Modal LearningMobile ComputingAcoustics
Research Interests

We are witnessing a rapid paradigm shift in how we interact with the physical world. To achieve true spatial intelligence, computing systems must possess the ability to perceive, interpret, reason, and act within physical spaces. My research envisions democratizing these Physical-AI services, where ubiquitous everyday devices serve as pervasive sensing interfaces, seamlessly powering human-centric applications.

To realize this vision, my work operates at the dynamic intersection of Embodied AI, HCI, and Cyber-Physical Systems (CPS). During my Ph.D., I have primarily leveraged acoustics as a powerful medium to communicate with and sense the physical world. Specifically, I interpret acoustic signals through various dimensions to demonstrate their transformative power:

News

Publications

ASE: Practical Acoustic Speed Estimation via Sound Diffusion Field
UbiComp / IMWUT'25
ASE: Practical Acoustic Speed Estimation via Sound Diffusion Field
Sheng Lyu, Chenshu Wu
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Passive human speed estimation plays a critical role in acoustic sensing. Despite extensive study, existing systems, however, suffer from various limitations: First, the channel measurement rate proves inadequate to estimate high moving speeds. Second, previous acoustic speed estimation exploits Doppler Frequency Shifts (DFS) created by moving targets and relies on microphone arrays, making them only capable of sensing the radial speed within a constrained distance. To overcome these issues, we present ASE, an accurate and robust Acoustic Speed Estimation system on a single commodity microphone. We propose a novel Orthogonal Time-Delayed Multiplexing (OTDM) scheme for acoustic channel estimation at a high rate that was previously infeasible, making it possible to estimate high speeds. We then model the sound propagation from a unique perspective of the acoustic diffusion field, and infer the speed from the acoustic spatial distribution, a completely different way of thinking about speed estimation beyond prior DFS-based approaches. We further develop novel techniques for motion detection and signal enhancement to deliver a robust and practical system. We implement and evaluate ASE through extensive real-world experiments. Our results show that ASE reliably tracks walking speed, independently of target location and direction, with a mean error of 0.13 m/s, a reduction of 2.5x from DFS, and a detection rate of 97.4% for large coverage, e.g., free walking in a 4m × 4m room. We believe ASE pushes acoustic speed estimation beyond the conventional DFS-based paradigm and inspires exciting research in acoustic sensing. Code is available at https://github.com/aiot-lab/ASE.

CardioLive: Empowering Video Streaming with Online Cardiac Monitoring via Audio-Visual Learning
MM'25
CardioLive: Empowering Video Streaming with Online Cardiac Monitoring via Audio-Visual Learning
Sheng Lyu, Ruiming Huang, Sijie Ji, Yasar Abbas Ur Rehman, Lan Ma, Chenshu Wu
33rd ACM International Conference on Multimedia

Online Cardiac Monitoring (OCM) emerges as a compelling enhancement for the next-generation video streaming platforms. It enables various applications including remote health, affective computing, and deepfake detection. Yet the physiological information encapsulated in the video streams has been long neglected. In this paper, we present the design and implementation of CardioLive, the first online cardiac monitoring system in video streaming platforms. We leverage the naturally co-existed video and audio streams and devise CardioNet, the first audio-visual network to learn the cardiac series. It incorporates multiple unique designs to extract temporal and spectral features, ensuring robust performance under realistic streaming conditions. To enable the Service-On-Demand OCM, we implement CardioLive as a plug-and-play middleware service and develop systematic solutions to practical issues including changing FPS and unsynchronized streams. Extensive evaluations demonstrate the effectiveness of our system. We achieve a Mean Squared Error of 1.79 BPM error, outperforming the video-only and audio-only solutions by 69.2% and 81.2%, respectively. CardioLive achieves average throughput of 115.97 and 98.16 FPS in Zoom and YouTube. We believe our work opens up new applications for video stream systems. Code is available at https://github.com/aiot-lab/CardioLive.

Temporal Modeling of Room Impulse Response Generation via Multi-Scale Autoregressive Learning
Interspeech'25
Temporal Modeling of Room Impulse Response Generation via Multi-Scale Autoregressive Learning
Sheng Lyu, Yuemin Yu, Chenshu Wu
26th edition of the Interspeech Conference

The rise of AIGC has revolutionized multimedia processing, including audio applications. Room Impulse Response (RIR), which models sound propagation in acoustic environments, plays a critical role in various downstream tasks such as speech synthesis. Existing RIR generation methods, whether based on ray tracing or neural representations, fail to fully exploit the temporal dynamics inherent in RIR. In this work, we propose a novel method for temporal modeling of RIR through autoregressive learning. Our approach captures the sequential evolution of sound propagation by introducing a multi-scale generation mechanism that adaptively scales across varying temporal resolutions. Extensive evaluations demonstrate that our approach achieves respective T60 error rates of 4.1% and 5.3% on two real-world datasets, outperforming existing RIR generation methods. We believe our work opens up new directions for future research.

Statistical Acoustic Sensing For Real-Time Respiration Monitoring and Presence Detection
MobiSys'24
Statistical Acoustic Sensing For Real-Time Respiration Monitoring and Presence Detection
Sheng Lyu, Ruiming Huang, Yuemin Yu, Chenshu Wu
The 22nd ACM International Conference on Mobile Systems, Applications, and Services

In this demo, we present an all-in-one real-time system for breathing monitoring and presence detection using statistical acoustic sensing. By applying Auto-Correlation Function (ACF) to the Channel Frequency Response (CFR), our system captures both motion statistics and breathing rates. We devise novel weight combining schemes to enhance the SNR of the weak sensing signals. We then enable human presence detection by integrating both motion statistics and breathing rate as vital indicators. Our system operates using a single microphone without relying on a bulky microphone array. Our demo functions in real-time and supports any device that is equipped with a commodity microphone and speaker. Our demo can be accessed through https://youtu.be/1bxpXNwHGv0.

Awards & Honors
Services & Teaching

Artifact Evaluation Committee

Invited Reviewer

Shadow Program Committee

Teaching Assistant