About
This semester, the seminar has taken on a smaller, concentrated scale as we discuss and write up
our case study in classifying phonetic data using topological methods in time series analysis.
By default, we meet Mondays at 2 pm in M5024.
Talks
Apr 3, '23,
Siheng Yi,
Interleaving distances between persistence modules
May 15, '23,
Zhiwang Yu,
Realizing topological convolutional neural networks for image data
Jul 18, '23 (Tuesday, 5–6 pm, M5024 and Tencent Meeting 291-909-800),
Meng Yu (Tencent AI Lab),
Speech signal processing in multi-speaker environments: problems, modelings, and assessment
Deep learning for speech enhancement has dramatically accelerated the process of the cocktail party problem, which is a major challenge yet to be solved for tracking, enhancing and recognizing each individual speaker when multiple speakers talk simultaneously in a noisy and reverberant environment. In this presentation, I will start from solutions to the keywords spotting problem in a multi-speaker environment. A hybrid of full-band and narrow-band modeling is then introduced to address speech enhancement problems including acoustic echo cancellation, noise suppression, dereverberation, and automatic gain control in both single-channel and multi-channel setups. MetricNet, a non-intrusive speech quality assessment model, is developed for speech enhancement evaluation in real scenarios.
Nov 26, '23 (Yanqi Lake Beijing Institute of Mathematical Sciences and Applications),
Yifei Zhu,
Topological combined machine learning for consonant recognition
Slides based on recent talks at the
Hangzhou Workshop on Topological Data Analysis and at the
FST & FBM Series Lecture of Beijing Normal University – Hong Kong Baptist University United International College
(with support from the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science)
Preprint with Pingyao Feng, Qingrui Qu, Siheng Yi, and Zhiwang Yu
Dec 26, '23 (Tuesday, 10:20–11:10 am, M5024),
Meng Yu (Tencent AI Lab),
Generative models for speech processing
In today’s speech processing, enhancing speech quality in challenging acoustic settings remains a challenge. Current deep learning solutions often can’t fully negate background noise or reverberation, affecting the listening experience. Our research introduces a pioneering method that utilizes pre-trained generative techniques to recreate clear speech from inferior inputs. By capitalizing on pre-trained vocoder and codec models, our approach ensures superior speech quality and resilience in demanding scenarios. Generative techniques skillfully address speech signal information loss, leading to enhanced audio clarity and minimized distortions. Our experiments, spanning both simulated and real-world datasets, emphasize the method’s effectiveness. Notably, using codec models resulted in optimal audio ratings. This research emphasizes the immense potential of leveraging pre-trained generative tools, especially where conventional methods fall short.
Jan 12, '24 (Friday, 2 pm, Lecture Hall 3-115 and Tencent Meeting 580-751-380, postponed from Dec 26, '23),
Zhiwang Yu,
Topological inputs for deep learning and topology of neural network architectures