Message Boards Message Boards

Let's all combine forces to improve this routine

Posted 12 years ago
The routine is a working program that helps optimize disk usage by choosing which files to select for burning.
Suppose you have a bunch of images that you want to burn to a disc with given capacity, for example a 4.3 Gb DVD.
The total size of your files however exceeds the given capacity. So you need to choose which files to burn so that the wasted space is minimized.

The program has a LOT of room for improvement. It can't handle more than ~20 items, so to work around this limitation I manually group items in folders. This could and should be done by the program automatically.
Also:
* There's a bunch of potentially unnecessary variables that should be eliminated
* There's a significant delay upon first run, possibly due to Manipulate, but I can't figure it out

For now I'd welcome suggestions on how to optimize/shorten the code, so that it is more readable and more robust.
I'd like to postpone the implementation of new functionality until after we optimize the existing code.

I'm sure by the end this code would probably look completely different and we all can learn a lot in the process.
Looking forward to all comments and suggestions!

Here's my original v 1.0 of the code:
 
 Manipulate[Column[{folder, capacity, result}],
   {folder, Directory[], FileNameSetter[#, "Directory"] &},
   {{capacity, 703.}, {703. -> "CD 700Mb", 4479. -> "DVD 4.7Gb",8106. -> "DVD DL 8.5Gb", 23098. -> "BluRay 25Gb"}},
   {{result, Null}, Button["Compute", result = (
 
 SetDirectory[folder];
 files =With[{fn =If[(FileNames[] // Length) <= 16, FileNames[],FileNames[][[1 ;; 16]]]},
   Thread[List[fn, (FileByteCount[fn] + dirByteCount[fn])/1024./1024.]]];
list = Sort[With[{p = Subsets[files, {2, Infinity}]},Thread[List[(#[[All, 2]] // Total ) & /@ p, p]]], #1[[1]] > #2[[1]] &];
plot = ListPlot[list[[All, 1]]];
list = Select[list, #[[1]] <= capacity &][[1]];

Grid[{{GraphicsRow[{Show[plot, Plot[capacity, {x, 0, 30000}]],PieChart3D[{list[[1]], capacity - list[[1]]},ChartLegends -> {"Used", "Wasted"}]}, ImageSize -> 800]},
{With[{d = list[[2, All, 1]], s = list[[2, All, 2]],t = list[[2, All, 2]] // Total,pf := ToString[PaddedForm[#, {10, 2}]] &},

Grid[Join[Transpose[{Range[d // Length], d, pf /@ s,pf /@ (100 s/capacity)}], {{, , pf[t],pf[(100 t/capacity)] ~~ "%"}}], Frame -> All,
Alignment -> {Left, Right, Right}]]}}]

)] &},
SynchronousInitialization -> False, SynchronousUpdating -> False,
Initialization :> (dirByteCount[dir_] := (FileByteCount /@ FileNames["*", {dir}, Infinity] //Total);
SetAttributes[{dirByteCount, FileByteCount}, Listable];)
]

Here's a screenshot:
POSTED BY: Todor Latev
5 Replies
Todor,

It is not clear from your post what you are trying to optimize... rather than jumping on the code, can you try to explain your problem further?
Are you trying to figureout how to move all the files on a minimal number of DVDs?

Luc
POSTED BY: Luc Barthelet
Posted 12 years ago
Hi Luc,
It is not clear from your post what you are trying to optimize
Everything - the entire program! I'm sure there's many better ways to rewrite the code, eliminating unnecessary variables and reducing the number of chars
I'm looking for intelligent/elegant/professional code
rather than jumping on the code, can you try to explain your problem further?
Are you trying to figureout how to move all teh files on a minimal number of DVDs?
Not quite. Here's the problem this routine is trying to address:
You have 20 files/folders that you like to copy to a disk. The total size of all the files is 4.5Gb, the capacity of your disk is 4.3Gb. So you can't fit them all.
You need to decide which ones to leave for a different disk. Ideally you'd like a subset of those files which size is exactly the capacity of your disk. Since this subset is very unlikely to exist the second best option is the subset which is slightly less the capacity, such that the wasted space is minimal. Hopefully you get the idea now.

I ran into this problem 15+ years ago - I was trying to copy an audio CD to a cassette. Since the cassette has two sides x 30 mins each I had to figure out how to rearrange the songs so that I can fit them all on the cassette without cutting off a song. I wrote a simple program in pascal that would randomly rearrange the songs into two groups and give me the groups that were as close to 30mins each as possible. After running the program for several minutes it would almost always come up with a solution. Now in Mathematica you can actually find the best possible solution /for ~20 items or less, depending on your hardware/, since it can work out all possible subsets.

If you take a look at my screenshot you'll see that in that particular case MA found a subset that is 0.02 Mb less than the capacity of a 703Mb CD - not bad at all!
POSTED BY: Todor Latev
Posted 12 years ago
Depending on how you look at this problem, it is either a knapsack ( http://en.wikipedia.org/wiki/Knapsack_problem ) or bin packing ( http://en.wikipedia.org/wiki/Bin_packing_problem ) problem. There are well-established algorithms for solving these problems.
POSTED BY: Jari Kirma
Posted 12 years ago
Thanks Daniel and Jari for the great links - looks like this is indeed a well defined mathematics problem which I was unaware of.
POSTED BY: Todor Latev
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract