Music, something I like listening to, but not composing. I know nothing about music theory and I have no idea how people come up with original melodies. So hey sounds like the perfect job for AI. When I started this project, my goal is to generate at least one song that would honestly get stuck in my head to me. That'S the ultimate test for good music. I told myself I would not make a video until it happens and it finally did if you haven't already seen my video about generating paces. You should definitely watch that first, because I use the exact same technique and I'm not explaining it again here so first, I need a really good data set to simplify. I want a single instrument and I decided on piano since piano Middies are so common. Almost everyone. I'Ve seen generating music like this either uses classical or jazz music and there's a reason for that. Classical and jazz are both more freeform without definite song structure like verse, chorus and bridge. I feel, like my network, doesn't learn these patterns. It would be hard to make something catchy and memorable like in modern songs, so I decided on video game music since there's a lot of it online and it's usually catchy and repetitive with strong structure. I ended up with about 4000 songs. Now I convert the music into a usable format to simplify I use piano, roll format and treat all notes as single strikes with no holes. I chose 96 notes with a time step or tick of 196th. Of a measure. 96 is the optimal number in my opinion, because it evenly divides all the most common time signatures exactly but there's another symmetry to exploit since measures often repeat or use similar motifs. I add a third dimension for the measure itself. I decided to produce 16 measure songs, so in total, my song vectors are 16 measures by 96, ticks by 96 notes. Okay, now to the neural network, the obvious two techniques to use here, our convolutions and Elle's TMS, since we're dealing with spatially structured data and time structured data, but both of these methods produce bad results and I'll explain why so, first of all, convolutions assume that there's A close relationship between pixels and their immediate neighbors. This is very true in images, but if you think about it, it really isn't true with piano rolls. The notes, most correlated with a single note, are generally much farther away in the same measure or on different measures. So locality isn't actually as meaningful, then there's LSTs they actually do work great for freeform music. Since you really only need to know the recent context to generate the next notes, but since I'm trying to produce very structured songs, the network needs to know about the entire context of the song. At the same time, when one chorus is different from another one, you really notice what I ended up doing was creating a dense Network to encode each measure into a feature vector feeding those into a dense, auto encoder, which then outputs another feature vector that finally gets Converted back to a measure, it's kind of like two auto-encoders in the same network for encoding and decoding the measures I picked 200 dimensions and for encoding and decoding the song I chose 120 dimensions. These values seem to be a good balance all right. It'S time to Train one thing I decided to watch for was how the principal components evolved during the training, as you can see, the largest components are not very big compared to the smallest ones, which means we won't get a lot of feature D correlation, like the Faces in fact, only the first component was obvious to me, and I challenge you to guess what it was before. I reveal it at the end, let's also listen to how generated songs sound throughout training you [, Music, ], you [, Music, ]. After just one epoch. It'S pretty dumb, just repeating a single note, but I did notice. The note was C, the most common key, probably not a coincidence: [ Music ], you [, Music ] after 10 epochs. I can see a little bit of rhythm, but it's quite boring until 6 to 1 key after 50. Epochs there's definitely a beat now and a bit of key change. The melody is lacking though, but it's to be expected, since the more repetitive part is a lot easier to learn: [, Music ]. Finally, some melody is developing. It'S not great yet, but at least it's moving in the right direction: [, Music ]. At best 16 inch electric radiator fan , it's starting to sound good. It sounds much more like a real song now. So, finally, why not check here 'll skip to the results after 2008 box there's a lot of controls here. So let me explain: what's going on the sliders on top, are the top 40 principal components same as the faces? Underneath you see the notes in the piano roll format divided into measures, but showing just the image would be boring. So I built a synth and that's what the controls on the bottom are for blue is volume, green is speed and red is how certain the generator needs to be to play. The note notes and white are played and notes in red are just shy of being played. I can generate random songs, but, what's especially cool is I can adjust the sliders while it plays and hear the changes in real-time [, Music ]. Just imagine a DJ that doesn't just mix live music but actually composes the entire song. Unfortunately, these principal components weren't very human, understandable, but with some future research or automated feature labeling, it could be a reality soon and now for your amusement here's some random songs. I liked [ Music ] [ Music, ], [, Music, ], [ Applause, ], [, Music, ], [ Applause, ], [, Music, ], [ Applause, ], [, Music, ], [ Applause, ], you [, Music, ], oh, and let's reveal what that first principal component was turns out. It controls if the time signature is a multiple of three or four. It makes sense. If you think about it, the only overlap between them is the very first beat in the measure. If you liked this video also check out Kerry, K, H's video, where he trains different networks to produce jazz music link in the description. And finally, I want to end the video with the very first song that got stuck in my head. Enjoy the relaxing music and thank you for watching [, Music, ]
|