The thing is there is no narrative or story telling element to image style transfer. Think of Beethovan's 9th symphony. It develops a narrative its not entirely abstract, the theme of the Song of Brotherhood gets subtly developed through the course of the composition. This is story telling. This is the same reason why its much easier to use ML to generate credible poetry. Generating a real novel with ML faces the same challenges as generating music, even instrumental music has a narrative and a story its telling. Story telling is far more difficult a problem than style transfer.
There are some fun examples of this sort of stuff on http://dadabots.com/ , which includes attempts to synthesize music in the style of The Beatles, Meshuggah and The Dillinger Escape Plan.
We're quite lucky that neural nets are overkill for procedural music generation.
By that I mean, we have huge masses of music theory, applied across genres and focusing on differences between genres, in terms of heuristics for both analysis & composition, and because of a tradition of procedural generation of music that goes back a couple centuries, a lot of it is fairly easy to translate into computer programs. (For instance, end-to-end serialist composition is easier for computers than it is for people, while canons and other mechanisms for creating permutations of melodies are equally straightforward.)
This doesn't translate into a straightforward method for putting in two wav files and producing a third with transferred style, but it does mean that a sufficiently motivated person can write something that translates notes between two known genres with greater ease than they would with images.
(Text is somewhere in the middle. I've worked on a couple attempts at 'style transfer' for text -- mostly using word2vec.)
If you like the idea of taking a song and changing its genre, check out Postmodern Jukebox. They’re pretty brilliant.
I find this surprising, from the simplistic (and probably naive) view that images are 2D signals while music is 1D.