There are several ways to do this ... this is just an example:
Take the image, convert the image to ImageData,"byte", put the data in base 12 to have all the notes (in this case with all notes, the chromatic scale) and replace with the notes (in this case, in the 4th and 5th octaves).
im = Import["ExampleData/rose.gif"]

Length@Flatten@IntegerDigits[ImageData[im, "Byte"], 12]

It can be noticed that there is a lot of data (around 300000 notes) so I selected a part of the data that refers roughly in the middle where it is part of the rose)..
data = Flatten@IntegerDigits[ImageData[im, "Byte"], 12] /. {0 -> "C4",
1 -> "Db4", 2 -> "D4", 3 -> "Eb4", 4 -> "E4", 5 -> "F4",
6 -> "Gb4", 7 -> "G4", 8 -> "Ab5", 9 -> "A5", 10 -> "Bb5",
11 -> "B5"};
Sound[Take[
Table[SoundNote[data[[i]], .1], {i, 1, Length[data]}], {130800,
131700}]]

There are several instrument options, note interval etc ... (here was made with the default instrument and each note .1 second duration). It is also possible to place multiple instruments simultaneously in the melody, overlapping parts of the sound, create simultaneous notes as chords .. the way I did here was in a raw and simple way.
Another example (with 3 instruments, 3 different parts of data, overlapping, major pentatonic scale):
im = Import["ExampleData/turtle.jpg"]

data = Flatten@IntegerDigits[ImageData[im, "Byte"], 5] /. {0 -> "C4",
1 -> "D4", 2 -> "E4", 3 -> "G4", 4 -> "A5"};
d1 = Sound[
Table[SoundNote[data[[i]], 0.3, "PanFlute",
SoundVolume -> 1/2], {i, 40000, 40500}], {0, 200}];
d2 = Sound[
Table[SoundNote[data[[i]], 0.3, "Harp"], {i, 41000, 41500}], {0,
300}];
d3 = Sound[
Table[SoundNote[data[[i]], 0.3, "Percussion"], {i, 42000,
42500}], {0, 200}];
Sound[{d1, d2, d3}]

There are also other ways to handle the data ... for example, PixelValue, ImageValue .. and also can separate images into color channels to play each channel as a different instrument, different music bases, some channels using chords, etc.
im = Import["ExampleData/girlcloseup.jpg", ImageSize -> 100]

Eight tone Spanish scale; violin, guitar, sitar, voiceoohs, bass:
data1 = IntegerDigits[PixelValue[cs[[1]], {All, All}, "Byte"],
8] /. {0 -> "C3", 1 -> "Db3", 2 -> "Eb3", 3 -> "E3", 4 -> "F3",
5 -> "Gb3", 6 -> "Ab4", 7 -> "Bb4"}
data2 = Flatten@
IntegerDigits[PixelValue[cs[[2]], {All, All}, "Byte"], 8] /. {0 ->
"C4", 1 -> "Db4", 2 -> "Eb4", 3 -> "E4", 4 -> "F4", 5 -> "Gb4",
6 -> "Ab5", 7 -> "Bb5"}
data3 = IntegerDigits[PixelValue[cs[[3]], {All, All}, "Byte"],
8] /. {0 -> "C2", 1 -> "Db2", 2 -> "Eb2", 3 -> "E2", 4 -> "F2",
5 -> "Gb2", 6 -> "Ab3", 7 -> "Bb3"}
d1 = Sound[
Table[SoundNote[data1[[i]], 0.3,
RandomChoice[{"Guitar", "VoiceOohs", "Sitar"}],
SoundVolume -> 3/4], {i, 1000, 1200}], {0, 100}];
d2 = Sound[
Table[SoundNote[data2[[i]], 0.3, "Violin", SoundVolume -> 2/3], {i,
1000, 1200}], {0, 150}];
d3 = Sound[
Table[SoundNote[data3[[i]], 0.3, "Bass"], {i, 1000, 1200}], {0,
150}];
Sound[{d1, d2, d3}]

Many other ways to do this ... other scales, another way to convert data, etc ... you can be creative once you understand the process ... so the way I showed it serves as a basis for a way to do this process ... more information about the Sound, SoundNote, instrumentation, other options..: check the documentation.