As a result, Google emphasizes the need for more work on tackling these risks and is holding back the code: "We have no plans to release models at this point." In the MusicLM paper, its creators outline potential impacts of MusicLM, including "potential misappropriation of creative content" (i.e., copyright issues), potential biases for cultures underrepresented in the training data, and potential cultural appropriation issues. Advertisementįurther Reading Meta’s AI-powered audio codec promises 10x compression over MP3 Google also shows off MusicLM's "long generation" (creating five-minute music clips from a simple prompt), "story mode" (which takes a sequence of text prompts and turns it into a morphing series of musical tunes), "text and melody conditioning" (which takes a human humming or whistling audio input and changes it to match the style laid out in a prompt), and generating music that matches the mood of image captions. Vocals are relaxed with a laid-back feel, very expressive. Slow tempo, bass-and-drums-led reggae song. Here is an example of a rich caption that they provide: On the MusicLM demonstration page, Google provides numerous examples of the AI model in action, creating audio from "rich captions" that describe the feel of the music, and even vocals (which so far are gibberish). Google claims that MusicLM outperforms previous AI music generators in audio quality and adherence to text descriptions. The system relies on an earlier AI model called AudioLM (introduced by Google in September) along with other components such as SoundStream and MuLan. The second part receives user captions and/or input audio and generates acoustic tokens (pieces of sound that make up the resulting song output). ![]() Generally speaking, MusicLM works in two main parts: first, it takes a sequence of audio tokens (pieces of sound) and maps them to semantic tokens (words that represent meaning) in captions for training. MusicCaps gets its text descriptions from human experts and its matching audio clips from Google's AudioSet, a collection of over 2 million labeled 10-second sound clips pulled from YouTube videos. MusicLM uses an AI model trained on what Google calls "a large dataset of unlabeled music," along with captions from MusicCaps, a new dataset composed of 5,521 music-text pairs. ![]() Further Reading Riffusion’s AI generates music from text using visual sonograms
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |