nupic.research.frameworks.pytorch.audio_transforms

Adapted from https://github.com/tugstugi/pytorch-speech-commands Google speech commands dataset.

class AddBackgroundNoise(bg_dataset, max_percentage=0.45)[source]

Bases: torch.utils.data.Dataset

Adds a random background noise.

class AddBackgroundNoiseOnSTFT(bg_dataset, max_percentage=0.45)[source]

Bases: torch.utils.data.Dataset

Adds a random background noise on the frequency domain.

class AddNoise(alpha=0.0, max_val=1.0)[source]

Bases: object

Blend random noise into the sample.

A’ = A * (1 - alpha) + alpha * noise

noise is random uniform in the range [-max_val, max_val]

class AudioFromSTFT[source]

Bases: object

Inverse short time fourier transform.

class ChangeAmplitude(amplitude_range=(0.7, 1.1))[source]

Bases: object

Changes amplitude of an audio randomly.

class ChangeSpeedAndPitchAudio(max_scale=0.2)[source]

Bases: object

Change the speed of an audio.

This transform also changes the pitch of the audio.

class DeleteSTFT[source]

Bases: object

Pytorch doesn’t like complex numbers, use this transform to remove STFT after computing the mel spectrogram.

class FixAudioLength(time=1)[source]

Bases: object

Either pads or truncates an audio into a fixed length.

class FixSTFTDimension[source]

Bases: object

Either pads or truncates in the time axis on the frequency domain, applied after stretching, time shifting etc.

class LoadAudio(sample_rate=16000)[source]

Bases: object

Loads an audio into a numpy array.

class StretchAudio(max_scale=0.2)[source]

Bases: object

Stretches an audio randomly.

class StretchAudioOnSTFT(max_scale=0.2)[source]

Bases: object

Stretches an audio on the frequency domain.

class TimeshiftAudio(max_shift_seconds=0.2)[source]

Bases: object

Shifts an audio randomly.

class TimeshiftAudioOnSTFT(max_shift=8)[source]

Bases: object

A simple timeshift on the frequency domain without multiplying with exp.

class ToMelSpectrogram(n_mels=32)[source]

Bases: object

Creates the mel spectrogram from an audio.

The result is a 32x32 matrix.

class ToMelSpectrogramFromSTFT(n_mels=32)[source]

Bases: object

Creates the mel spectrogram from the short time fourier transform of a file.

The result is a 32x32 matrix.

class ToSTFT(n_fft=2048, hop_length=512)[source]

Bases: object

Applies on an audio the short time fourier transform.

class ToTensor(np_name, tensor_name, normalize=None)[source]

Bases: object

Converts into a tensor.

class Unsqueeze(tensor_name, model_type)[source]

Bases: object

Unsqueeze audio data into a single tensor.

should_apply_transform(prob=0.5)[source]

Transforms are only randomly applied with the given probability.