nupic.research.frameworks.pytorch.audio_transforms¶
Adapted from https://github.com/tugstugi/pytorch-speech-commands Google speech commands dataset.
-
class
AddBackgroundNoise(bg_dataset, max_percentage=0.45)[source]¶ Bases:
torch.utils.data.DatasetAdds a random background noise.
-
class
AddBackgroundNoiseOnSTFT(bg_dataset, max_percentage=0.45)[source]¶ Bases:
torch.utils.data.DatasetAdds a random background noise on the frequency domain.
-
class
AddNoise(alpha=0.0, max_val=1.0)[source]¶ Bases:
objectBlend random noise into the sample.
A’ = A * (1 - alpha) + alpha * noise
noise is random uniform in the range [-max_val, max_val]
-
class
ChangeAmplitude(amplitude_range=(0.7, 1.1))[source]¶ Bases:
objectChanges amplitude of an audio randomly.
-
class
ChangeSpeedAndPitchAudio(max_scale=0.2)[source]¶ Bases:
objectChange the speed of an audio.
This transform also changes the pitch of the audio.
-
class
DeleteSTFT[source]¶ Bases:
objectPytorch doesn’t like complex numbers, use this transform to remove STFT after computing the mel spectrogram.
-
class
FixAudioLength(time=1)[source]¶ Bases:
objectEither pads or truncates an audio into a fixed length.
-
class
FixSTFTDimension[source]¶ Bases:
objectEither pads or truncates in the time axis on the frequency domain, applied after stretching, time shifting etc.
-
class
StretchAudioOnSTFT(max_scale=0.2)[source]¶ Bases:
objectStretches an audio on the frequency domain.
-
class
TimeshiftAudioOnSTFT(max_shift=8)[source]¶ Bases:
objectA simple timeshift on the frequency domain without multiplying with exp.
-
class
ToMelSpectrogram(n_mels=32)[source]¶ Bases:
objectCreates the mel spectrogram from an audio.
The result is a 32x32 matrix.
-
class
ToMelSpectrogramFromSTFT(n_mels=32)[source]¶ Bases:
objectCreates the mel spectrogram from the short time fourier transform of a file.
The result is a 32x32 matrix.
-
class
ToSTFT(n_fft=2048, hop_length=512)[source]¶ Bases:
objectApplies on an audio the short time fourier transform.