nupic.research.frameworks.pytorch.audio_transforms¶
Adapted from https://github.com/tugstugi/pytorch-speech-commands Google speech commands dataset.
-
class
AddBackgroundNoise
(bg_dataset, max_percentage=0.45)[source]¶ Bases:
torch.utils.data.Dataset
Adds a random background noise.
-
class
AddBackgroundNoiseOnSTFT
(bg_dataset, max_percentage=0.45)[source]¶ Bases:
torch.utils.data.Dataset
Adds a random background noise on the frequency domain.
-
class
AddNoise
(alpha=0.0, max_val=1.0)[source]¶ Bases:
object
Blend random noise into the sample.
A’ = A * (1 - alpha) + alpha * noise
noise is random uniform in the range [-max_val, max_val]
-
class
ChangeAmplitude
(amplitude_range=(0.7, 1.1))[source]¶ Bases:
object
Changes amplitude of an audio randomly.
-
class
ChangeSpeedAndPitchAudio
(max_scale=0.2)[source]¶ Bases:
object
Change the speed of an audio.
This transform also changes the pitch of the audio.
-
class
DeleteSTFT
[source]¶ Bases:
object
Pytorch doesn’t like complex numbers, use this transform to remove STFT after computing the mel spectrogram.
-
class
FixAudioLength
(time=1)[source]¶ Bases:
object
Either pads or truncates an audio into a fixed length.
-
class
FixSTFTDimension
[source]¶ Bases:
object
Either pads or truncates in the time axis on the frequency domain, applied after stretching, time shifting etc.
-
class
StretchAudioOnSTFT
(max_scale=0.2)[source]¶ Bases:
object
Stretches an audio on the frequency domain.
-
class
TimeshiftAudioOnSTFT
(max_shift=8)[source]¶ Bases:
object
A simple timeshift on the frequency domain without multiplying with exp.
-
class
ToMelSpectrogram
(n_mels=32)[source]¶ Bases:
object
Creates the mel spectrogram from an audio.
The result is a 32x32 matrix.
-
class
ToMelSpectrogramFromSTFT
(n_mels=32)[source]¶ Bases:
object
Creates the mel spectrogram from the short time fourier transform of a file.
The result is a 32x32 matrix.
-
class
ToSTFT
(n_fft=2048, hop_length=512)[source]¶ Bases:
object
Applies on an audio the short time fourier transform.