GitHub Homepage

Audio samples for our paper: “Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem”

This webpage provides representative audio samples for clean speech data in WAV format. Each row represents one random fragment from the Librispeech clean test split. Each column represents a model used to generate the WAV directly from the STFT magnitude spectrogram:

Ground truth: A perfect reconstruction via inverse STFT using ground truth magnitudes and phases
Proposed: Our proposed method with efficient first and second stage
Prev. + Thomas: The result of applying our proposed second stage to the previously proposed CNN
Prev + direct: The result of applying a direct solver to the previously proposed CNN
VOCOS: “Copy-synthesis” function using the VOCOS API and pretrained model, as prescribed in the official repository
RTISI (50 iter.): 50-iteration RTISI (implementation)
RTISI (5 iter.): 5-iteration RTISI (implementation)
Strided + LA: The strided variation of our proposed method, with one frame of lookahead
Strided: The strided variation of our proposed method, without lookahead

See our paper for more details.

Audio Table

Ground Truth	Proposed	Prev. + Thomas	Prev. + direct	VOCOS	RTISI (50 iter.)	RTISI (5 iter.)	Strided + LA	Strided