Musical Data Augmentation¶
The muda package implements annotation-aware musical data augmentation, as described in the muda paper [1]. The goal of this package is to make it easy for practitioners to consistently apply perturbations to annotated music data for the purpose of fitting statistical models.
[1] | McFee, B., Humphrey, E.J., and Bello, J.P. “A software framework for Musical Data Augmentation.” 16th International Society for Music Information Retrival conference (ISMIR). 2015. |
Introduction¶
Note
Before reading ahead, it is recommended to familiarize yourself with the JAMS documentation.
The design of muda is patterned loosely after the Transformer abstraction in scikit-learn. At a high level, each input example consists of an audio clip (with sampling rate) as a numpy.ndarray and its annotations stored in JAMS format. To streamline the deformation process, audio data is first stored within the JAMS object so that only a single payload needs to be transferred throughout the system.
Deformation objects (muda.core.BaseTransformer
) have a single user-facing method, transform()
,
which accepts an input JAMS object and generates a sequence of deformations of that object.
By operating on JAMS objects, the deformation object can simultaneously modify both the audio and all
of its corresponding annotations.
After applying deformations, the modified audio and annotations can be stored to disk by calling muda.save()
.
Alternatively, because transformations are generators, results can be processed online by a stochastic learning algorithm.
Requirements¶
Installing muda via pip install muda
should automatically satisfy the python
dependencies:
- JAMS 0.2
- librosa 0.4
- pyrubberband 0.1
- pysoundfile 0.8
- jsonpickle
However, certain deformers require external applications that must be installed separately.
- sox
- rubberband-cli
Examples¶
Example usage¶
This section gives a quick introduction to using muda through example applications.
Loading data¶
In muda, all data pertaining to a track is contained within a jams object. Before processing any tracks with muda, the jams object must be prepared using one of muda.load_jam_audio or muda.jam_pack. These functions prepare the jams object to contain (deformed) audio and store the deformation history objects.
>>> # Loading data from disk
>>> j_orig = muda.load_jam_audio('orig.jams', 'orig.ogg')
>>> # Ready to go!
>>> # Loading audio from disk with an existing jams
>>> j_orig = jams.load('existing_jams_file.jams')
>>> j_orig = muda.load_jam_audio(existing_jams, 'orig.ogg')
>>> # Ready to go!
>>> # Loading in-memory audio (y, sr) with an existing jams
>>> j_orig = jams.load('existing_jams_file.jams')
>>> j_orig = muda.jam_pack(existing_jams, _audio=dict(y=y, sr=sr))
>>> # Ready to go!
Applying a deformation¶
Once the data has been prepared, we are ready to start applying deformations. This example uses a simple linear pitch shift deformer to generate five perturbations of an input. Each deformed example is then saved to disk.
>>> pitch = muda.deformers.LinearPitchShift(n_samples=5, lower=-1, upper=1)
>>> for i, jam_out in enumerate(pitch.transform(j_orig)):
... muda.save('output_{:02d}.ogg'.format(i),
... 'output_{:02d}.jams'.format(i),
... jam_out)
The deformed audio data can be accessed directly in the dictionary
jam_out.sandbox.muda._audio
. Note that a full history of applied transformations
is recorded within jam_out.sandbox.muda
under the state
and history
objects.
Pipelines¶
The following example constructs a two-stage deformation pipeline. The first stage applies random pitch shifts, while the second stage applies random time stretches. The pipeline therefore generates 25 examples from the input j_orig.
>>> # Load an example audio file with annotation
>>> j_orig = muda.load_jam_audio('orig.jams', 'orig.ogg')
>>> # Construct a deformation pipeline
>>> pitch_shift = muda.deformers.RandomPitchShift(n_samples=5)
>>> time_stretch = muda.deformers.RandomTimeStretch(n_samples=5)
>>> pipeline = muda.Pipeline(steps=[('pitch_shift', pitch_shift),
... ('time_stretch', time_stretch)])
>>> for j_new in pipeline.transform(j_orig):
... process(j_new)
Unions¶
Union operators are similar to Pipelines, in that they allow multiple deformers to be combined as a single object that generates a sequence of deformations. The difference between Union and Pipeline is that a pipeline composes deformations together, so that a single output is the result of multiple stages of processing; a union only applies one deformation at a time to produce a single output.
The following example is similar to the pipeline example above:
>>> # Load an example audio file with annotation
>>> j_orig = muda.load_jam_audio('orig.jams', 'orig.ogg')
>>> # Construct a deformation pipeline
>>> pitch_shift = muda.deformers.RandomPitchShift(n_samples=5)
>>> time_stretch = muda.deformers.RandomTimeStretch(n_samples=5)
>>> union = muda.Union(steps=[('pitch_shift', pitch_shift),
... ('time_stretch', time_stretch)])
>>> for j_new in union.transform(j_orig):
... process(j_new)
Each of the resulting j_new objects produced by the union has had either its pitch shifted by the pitch_shift object or its time stretched by the time_stretch object, but not both.
Unions apply deformations in a round-robin schedule, so that the first output is produced by the first deformer, the second output is produced by the second deformer, and so on, until the list of deformers is exhausted and the first deformer produces its second output.
Bypass operators¶
When using pipelines, it is sometimes beneficial to allow a stage to be skipped, so that the input to one stage can be fed through to the next stage without intermediate processing. This is easily accomplished with Bypass objects, which first emit the input unchanged, and then apply the contained deformation as usual. This is demonstrated in the following example, which is similar to the pipeline example, except that it guarantees that each stage is applied to j_orig in isolation, as well as in composition. It therefore generates 36 examples (including j_orig itself as the first output).
>>> # Load an example audio file with annotation
>>> j_orig = muda.load_jam_audio('orig.jams', 'orig.ogg')
>>> # Construct a deformation pipeline
>>> pitch_shift = muda.deformers.RandomPitchShift(n_samples=5)
>>> time_stretch = muda.deformers.RandomTimeStretch(n_samples=5)
>>> pipeline = muda.Pipeline(steps=[('pitch_shift', muda.deformers.Bypass(pitch_shift)),
... ('time_stretch', muda.deformers.Bypass(time_stretch))])
>>> for j_new in pipeline.transform(j_orig):
... process(j_new)
Saving deformations¶
All deformation objects, including bypasses and pipelines, can be serialized to plain-text (JSON) format, saved to disk, and reconstructed later. This is demonstrated in the following example.
>>> # Encode an existing pitch shift deformation object
>>> pitch_shift = muda.deformers.RandomPitchShift(n_samples=5)
>>> ps_str = muda.serialize(pitch_shift)
>>> print(ps_str)
{"params": {"n_samples": 5, "mean": 0.0, "sigma": 1.0},
"__class__": {"py/type": "muda.deformers.pitch.RandomPitchShift"}}
>>> # Reconstruct the pitch shifter from its string encoding
>>> ps2 = muda.deserialize(ps_str)
>>> # Encode a full pipeline as a string
>>> pipe_str = muda.serialize(pipeline)
>>> # Decode the string to reconstruct a new pipeline object
>>> new_pipe = muda.deserialize(pipe_str)
>>> # Process jams with the new pipeline
>>> for j_new in new_pipe.transform(j_orig):
... process(j_new)
API Reference¶
MUDA Core¶
Functions¶
Core functionality for muda
-
muda.core.
load_jam_audio
(jam_in, audio_file, **kwargs)[source]¶ Load a jam and pack it with audio.
Parameters: jam_in : str, file descriptor, or jams.JAMS
JAMS filename, open file-descriptor, or object to load. See
jams.load
for acceptable formats.audio_file : str
Audio filename to load
kwargs : additional keyword arguments
See librosa.load
Returns: jam : jams.JAMS
A jams object with audio data in the top-level sandbox
See also
jams.load
,librosa.core.load
Notes
This operation can modify the file_metadata.duration field of jam_in: If it is not currently set, it will be populated with the duration of the audio file.
-
muda.core.
save
(filename_audio, filename_jam, jam, strict=True, **kwargs)[source]¶ Save a muda jam to disk
Parameters: filename_audio: str
The path to store the audio file
filename_jam: str
The path to store the jams object
strict: bool
Strict safety checking for jams output
kwargs
Additional parameters to soundfile.write
-
muda.core.
jam_pack
(jam, **kwargs)[source]¶ Pack data into a jams sandbox.
If not already present, this creates a muda field within jam.sandbox, along with history, state, and version arrays which are populated by deformation objects.
Any additional fields can be added to the muda sandbox by supplying keyword arguments.
Parameters: jam : jams.JAMS
A JAMS object
Returns: jam : jams.JAMS
The updated JAMS object
Examples
>>> jam = jams.JAMS() >>> muda.jam_pack(jam, my_data=dict(foo=5, bar=None)) >>> jam.sandbox <Sandbox: muda> >>> jam.sandbox.muda <Sandbox: state, version, my_data, history> >>> jam.sandbox.muda.my_data {'foo': 5, 'bar': None}
-
muda.core.
serialize
(transform, **kwargs)[source]¶ Serialize a transformation object or pipeline.
Parameters: transform : BaseTransform or Pipeline
The transformation object to be serialized
kwargs
Additional keyword arguments to jsonpickle.encode()
Returns: json_str : str
A JSON encoding of the transformation
See also
Examples
>>> D = muda.deformers.TimeStretch(rate=1.5) >>> muda.serialize(D) '{"params": {"rate": 1.5}, "__class__": {"py/type": "muda.deformers.time.TimeStretch"}}'
-
muda.core.
deserialize
(encoded, **kwargs)[source]¶ Construct a muda transformation from a JSON encoded string.
Parameters: encoded : str
JSON encoding of the transformation or pipeline
kwargs
Additional keyword arguments to jsonpickle.decode()
Returns: obj
The transformation
See also
Examples
>>> D = muda.deformers.TimeStretch(rate=1.5) >>> D_serial = muda.serialize(D) >>> D2 = muda.deserialize(D_serial) >>> D2 TimeStretch(rate=1.5)
Classes¶
-
class
muda.base.
BaseTransformer
[source]¶ The base class for all transformation objects. This class implements a single transformation (history) and some various niceties.
Methods
get_params
([deep])Get the parameters for this object. states
(jam)transform
(jam)Iterative transformation generator
-
class
muda.base.
Pipeline
(steps=None)[source]¶ Wrapper which allows multiple BaseDeformer objects to be chained together
A given JAMS object will be transformed sequentially by each stage of the pipeline.
The pipeline induces a graph over transformers
See also
Examples
>>> P = muda.deformers.PitchShift(semitones=5) >>> T = muda.deformers.TimeStretch(speed=1.25) >>> Pipe = muda.Pipeline(steps=[('Pitch:maj3', P), ('Speed:1.25x', T)]) >>> output_jams = list(Pipe.transform(jam_in))
Attributes
steps (argument array) steps[i] is a tuple of (name, Transformer) Methods
get_params
()Get the parameters for this object. transform
(jam)Apply the sequence of transformations to a single jam object.
-
class
muda.base.
Union
(steps=None)[source]¶ Wrapper which allows multiple BaseDeformer objects to be combined for round-robin sampling.
A given JAMS object will be transformed sequentially by each element of the union, in round-robin fashion. This is similar to Pipeline, except the deformers are independent of one another in a Union, rather than applied sequentially.
See also
Examples
>>> P = muda.deformers.PitchShift(semitones=5) >>> T = muda.deformers.TimeStretch(speed=1.25) >>> union = muda.Union(steps=[('Pitch:maj3', P), ('Speed:1.25x', T)]) >>> output_jams = list(union.transform(jam_in))
Attributes
steps (argument array) steps[i] is a tuple of (name, Transformer) Methods
get_params
()Get the parameters for this object. transform
(jam)Apply the sequence of transformations to a single jam object.
Deformation reference¶
Utilities¶
-
class
muda.deformers.
Bypass
(transformer=None)[source]¶ Bypass transformer. Wraps an existing transformer object.
This allows pipeline stages to become optional.
The first example generated by a Bypass’s transform method is the input, followed by all examples generated by the contained transformer object.
Examples
>>> # Generate examples with and without a pitch-shift >>> D = muda.deformers.Pitchshift(n_semitones=2.0) >>> B = muda.deformers.Bypass(transformer=D) >>> out_jams = list(B.transform(input_jam))
Attributes
transformer (muda.BaseTransformer) The transformer object to bypass Methods
get_params
([deep])Get the parameters for this object. states
(jam)transform
(jam)Bypass transformations. -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
Audio deformers¶
-
class
muda.deformers.
BackgroundNoise
(n_samples=1, files=None, weight_min=0.1, weight_max=0.5)[source]¶ Additive background noise deformations.
From each background noise signal, n_samples clips are randomly extracted and mixed with the input audio with a random mixing coefficient sampled uniformly between weight_min and weight_max.
This transformation affects the following attributes:
- Audio
Attributes
n_samples (int > 0) The number of samples to generate with each noise source files (str or list of str) Path to audio file(s) on disk containing background signals weight_min (float in (0.0, 1.0)) weight_max (float in (0.0, 1.0)) The minimum and maximum weight to combine input signals y_out = (1 - weight) * y + weight * y_noise Methods
audio
(mudabox, state)get_params
([deep])Get the parameters for this object. states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
-
class
muda.deformers.
DynamicRangeCompression
(preset=None)[source]¶ Dynamic range compression.
For each DRC preset configuration, one deformation is generated.
This transformation affects the following attributes:
- Audio
Examples
>>> # A single preset >>> drc = muda.deformers.DynamicRangeCompression(preset='radio') >>> # Multiple presets >>> drc = muda.deformers.DynamicRangeCompression(preset=['film standard', ... 'film light']) >>> # All presets >>> drc = muda.deformers.DynamicRangeCompression(preset=muda.deformers.PRESETS.keys())
Attributes
preset (str or list of str) One or more supported preset values: - radio - film standard - film light - music standard - music light - speech Methods
audio
(mudabox, state)get_params
([deep])Get the parameters for this object. states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
Time-stretch deformers¶
-
class
muda.deformers.
TimeStretch
(rate=1.2)[source]¶ Static time stretching by a fixed rate
This transformation affects the following attributes:
- Annotations
- all: time, duration
- tempo: values
- metadata
- duration
- Audio
See also
Examples
>>> D = muda.deformers.TimeStretch(rate=2.0) >>> out_jams = list(D.transform(jam_in))
Attributes
rate (float or list of floats, strictly positive) The rate at which to speedup the audio. - rate > 1 speeds up, - rate < 1 slows down. Methods
audio
(mudabox, state)deform_tempo
(annotation, state)deform_times
(ann, state)get_params
([deep])Get the parameters for this object. metadata
(metadata, state)states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
-
class
muda.deformers.
RandomTimeStretch
(n_samples=3, location=0.0, scale=0.1)[source]¶ Random time stretching
For each deformation, the rate parameter is drawn from a log-normal distribution with parameters (location, scale)
- Annotations
- all: time, duration
- tempo: values
- metadata
- duration
- Audio
See also
Attributes
n_samples (int > 0) The number of samples to generate location (float) scale (float > 0) Parameters of a log-normal distribution from which rate parameters are sampled. Methods
audio
(mudabox, state)deform_tempo
(annotation, state)deform_times
(ann, state)get_params
([deep])Get the parameters for this object. metadata
(metadata, state)states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
-
class
muda.deformers.
LogspaceTimeStretch
(n_samples=3, lower=-0.3, upper=0.3)[source]¶ Logarithmically spaced time stretching.
n_samples are generated with stretching spaced logarithmically between 2.0**lower and 2`.0**upper`.
This transformation affects the following attributes:
- Annotations
- all: time, duration
- tempo: values
- metadata
- duration
- Audio
See also
Attributes
n_samples (int > 0) Number of deformations to generate lower (float) upper (float > lower) Minimum and maximum bounds on the stretch parameters Methods
audio
(mudabox, state)deform_tempo
(annotation, state)deform_times
(ann, state)get_params
([deep])Get the parameters for this object. metadata
(metadata, state)states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
Pitch-shift deformers¶
-
class
muda.deformers.
PitchShift
(n_semitones=1)[source]¶ Static pitch shifting by (fractional) semitones
This transformation affects the following attributes:
- Annotations
- key_mode
- chord, chord_harte, chord_roman
- pitch_hz, pitch_midi, pitch_class
- Audio
See also
Examples
>>> # Shift down by a quarter-tone >>> D = muda.deformers.PitchShift(n_semitones=-0.5)
Attributes
n_semitones (float or list of float) The number of semitones to transpose the signal. Can be positive, negative, integral, or fractional. Methods
audio
(mudabox, state)deform_frequency
(annotation, state)deform_midi
(annotation, state)deform_note
(annotation, state)deform_tonic
(annotation, state)get_params
([deep])Get the parameters for this object. states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
-
class
muda.deformers.
RandomPitchShift
(n_samples=3, mean=0.0, sigma=1.0)[source]¶ Randomized pitch shifter
Pitch is transposed by a normally distributed random variable.
This transformation affects the following attributes:
- Annotations
- key_mode
- chord, chord_harte, chord_roman
- pitch_hz, pitch_midi, pitch_class
- Audio
See also
Examples
>>> # 5 random shifts with unit variance and mean of 1 semitone >>> D = muda.deformers.PitchShift(n_samples=5, mean=1.0, sigma=1)
Attributes
n_samples (int > 0) The number of samples to generate per input mean (float) sigma (float > 0) The parameters of the normal distribution for sampling pitch shifts Methods
audio
(mudabox, state)deform_frequency
(annotation, state)deform_midi
(annotation, state)deform_note
(annotation, state)deform_tonic
(annotation, state)get_params
([deep])Get the parameters for this object. states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
-
class
muda.deformers.
LinearPitchShift
(n_samples=3, lower=-1, upper=1)[source]¶ Linearly spaced pitch shift generator
This transformation affects the following attributes:
- Annotations
- key_mode
- chord, chord_harte, chord_roman
- pitch_hz, pitch_midi, pitch_class
- Audio
See also
Examples
>>> # 5 shifts spaced between -2 and +2 semitones >>> D = muda.deformers.LinearPitchShift(n_samples=5, lower=-2, upper=2)
Attributes
n_samples (int > 0) The number of samples to generate per input lower (float) upper (float) The lower and upper bounds for the shift sampling Methods
audio
(mudabox, state)deform_frequency
(annotation, state)deform_midi
(annotation, state)deform_note
(annotation, state)deform_tonic
(annotation, state)get_params
([deep])Get the parameters for this object. states
(jam)transform
(jam)Iterative transformation generator -
get_params
(deep=True)¶ Get the parameters for this object. Returns as a dict.
Parameters: deep : bool
Recurse on nested objects
Returns: params : dict
A dictionary containing all parameters for this object
-
transform
(jam)¶ Iterative transformation generator
Applies the deformation to an input jams object.
This generates a sequence of deformed output JAMS.
Parameters: jam : jams.JAMS
The jam to transform
Examples
>>> for jam_out in deformer.transform(jam_in): ... process(jam_out)
Release notes¶
Release notes¶
v0.1.4
- #58 Updated for compatibility with JAMS 0.3.0
v0.1.3¶
- #40 BackgroundNoise now stores sample positions in its output history
- #44 fixed a bug in reconstructing muda-output jams files
- #47 removed dependency on scikit-learn
- #48, #54 converted unit tests from nose to py.test
- #49 TimeStretch and PitchShift deformers now support multiple values
- #52 added the Union class
v0.1.2¶
This is a minor bug-fix revision.
- The defaults for LogspaceTimeStretch have been changed to a more reasonable setting.
- Track duration is now overridden when loading audio into a jams object.
v0.1.0¶
Initial public release.