Message Boards Message Boards

0
|
12713 Views
|
8 Replies
|
0 Total Likes
View groups...
Share
Share this post:

How does Mathematica use (or not use) Hyper-Threading on Intel i7 chips?

First, the apparent good news.  My new MacBook Pro, running Mathematica 9.0.1.0 under OS X 10.8.4 on a 2.3 GHz Intel Core i7, tops the BenchmarkReport list, with a score of 1.09 ... !  (My machine has 8GBs of memory and the regular technology 750GB hard drive - not a solid state drive.) See the report appended at the end of this post.

According to the Mac Activity Monitor (application) the Benchmarking challenges do not max out the 8 cores (4 real cores and 4 virtual cores created using Intel's Hyper-Threading). On the other hand the Parallel Kernel Status report (accessible through Mathematica Preferences) show all 8 "cores" running at about 90% of their max.

It's possible that the next major OS X release ("Maverick") will deal with all this differently (better), or so I'm told.

Second, a very large Notebook I have doesn't run that much faster than on an earlier (non-hyper-threaded) chip, though I think this is because the Notebook writes to the disk a lot - because I have many, many short commands, functions, and programs. Thus, with my new more powerful laptop my Notebook is now I/O bound.

I'm not going to worry about performance for the moment, but - eventually - I will want to see what I can get Mathematica to do on this machine, when undertaking a cpu-bound computation. Which is why I'm posting this question.

Thank you,
-- Mark


=== System Information ===

Machine Name: markt2013macbookpro
System: Mac OS X x86 (64-bit)
Date: September 15, 2013
Mathematica Version: 9.0.1
Benchmark Result: 1.09


=== MathematicaMark9 System Comparison ===

markt2013macbookpro 1.09
Mac OS X x86 (64-bit)

3.07 GHz Core i7-950 (8 Cores) 1.00
Windows 7 Pro (64-bit) Desktop

2.93 GHz Core i7-940 (8 Cores) 0.89
Linux Ubuntu (64-bit) Desktop

2.67 GHz Core 2 Quad Q9450 (4 Cores) 0.80
Linux Debian (64-bit) Desktop

3.00 GHz Core 2 Duo E8400 (2 Cores) 0.78
Linux Ubuntu (64-bit) Desktop

3.06 GHz Core 2 Duo E8435 (2 Cores) 0.73
iMac OS X Snow Leopard (64-bit) Desktop

1.73 GHz Core i7-820QM (8 Cores) 0.73
Windows 7 Ultimate (64-bit) Laptop

2 * 2.26 GHz Quad Core Xeon E5520 (8 Cores) 0.69
Mac XServe OS X (64-bit) Server

2.80 GHz Core 2 Duo Mobile T9600 (2 Cores) 0.67
Windows 7 Pro (64-bit) Laptop

2 * 2.66 GHz Dual Core Xeon 5150 (4 Cores) 0.56
MacPro OS X Snow Leopard (64-bit) Server

2.4 Ghz Core 2 Duo Mobile T8300 (2 Cores) 0.47
MacBook OS X Snow Leopard (64-bit) Laptop

2.60 GHz Core 2 Duo Mobile T7800 (2 Cores) 0.44
Windows XP Pro (32-bit) Laptop

2 * 2.80 GHz Opteron 254 (2 Cores) 0.38
Windows XP Pro (64-bit) Server

2.13 GHz Core 2 Duo E6400 (2 Cores) 0.36
Windows Vista (32-bit) Server

1.6 GHz Core 2 Duo Mobile L7500 (2 Cores) 0.32
Windows 7 Pro (32-bit) Laptop

2 * 2.00 GHz G5 PowerPC (2 Cores) 0.14
Mac OS X (32-bit) Desktop

(Faster systems give larger numbers)


=== MathematicaMark9 Detailed Timings ===

Total Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7 Test 8 Test 9 Test 10 Test 11 Test 12 Test 13 Test 14 Test 15

markt2013macbookpro
Mac OS X x86 (64-bit) 12.7 0.79 0.78 1.08 0.83 1.04 0.56 0.53 1.14 0.81 0.84 1.17 0.11 1.53 0.75 0.76

3.07 GHz Core i7-950 (8 Cores)
Windows 7 Pro (64-bit) Desktop 13.8 0.80 0.98 1.00 0.80 0.84 1.00 0.98 1.00 0.78 1.06 0.95 0.89 0.97 0.92 0.86

2.93 GHz Core i7-940 (8 Cores)
Linux Ubuntu (64-bit) Desktop 15.6 0.94 0.99 1.14 0.92 0.80 0.81 0.88 1.51 0.89 1.31 1.16 1.14 1.41 0.89 0.86

2.67 GHz Core 2 Quad Q9450 (4 Cores)
Linux Debian (64-bit) Desktop 17.3 1.11 0.96 1.60 1.14 1.17 0.83 0.89 1.77 0.92 1.25 1.21 0.98 1.44 1.04 1.06

3.00 GHz Core 2 Duo E8400 (2 Cores)
Linux Ubuntu (64-bit) Desktop 17.8 1.05 0.87 1.88 1.18 1.45 0.76 0.79 1.73 1.43 1.13 1.03 0.84 1.21 1.15 1.33

3.06 GHz Core 2 Duo E8435 (2 Cores)
iMac OS X Snow Leopard (64-bit) Desktop 18.9 1.06 1.07 1.65 0.93 1.94 0.85 0.88 1.65 1.42 1.32 1.11 1.01 1.47 1.16 1.39

1.73 GHz Core i7-820QM (8 Cores)
Windows 7 Ultimate (64-bit) Laptop 18.9 1.26 1.19 1.14 1.26 1.33 1.16 1.12 1.17 1.59 1.25 1.63 1.08 1.09 1.44 1.23

2 * 2.26 GHz Quad Core Xeon E5520 (8 Cores)
Mac XServe OS X (64-bit) Server 20.1 1.06 1.22 1.44 1.00 1.67 1.05 1.12 1.71 0.67 2.79 1.32 1.22 1.76 0.94 1.08

2.80 GHz Core 2 Duo Mobile T9600 (2 Cores)
Windows 7 Pro (64-bit) Laptop 20.7 1.12 1.20 1.90 1.06 2.04 1.17 1.15 1.61 1.53 1.61 1.19 1.06 1.22 1.25 1.56

2 * 2.66 GHz Dual Core Xeon 5150 (4 Cores)
MacPro OS X Snow Leopard (64-bit) Server 24.8 1.56 1.26 2.20 1.34 2.44 1.02 1.12 2.03 1.20 2.97 1.55 1.19 1.77 1.46 1.64

2.4 Ghz Core 2 Duo Mobile T8300 (2 Cores)
MacBook OS X Snow Leopard (64-bit) Laptop 29.8 1.78 1.40 2.44 1.40 3.03 1.13 1.22 2.20 2.35 2.31 1.63 1.57 2.06 2.86 2.38

2.60 GHz Core 2 Duo Mobile T7800 (2 Cores)
Windows XP Pro (32-bit) Laptop 31.8 1.27 1.80 2.59 1.44 3.69 1.92 2.06 2.41 3.25 1.91 1.45 1.41 1.34 2.52 2.72

2 * 2.80 GHz Opteron 254 (2 Cores)
Windows XP Pro (64-bit) Server 36.1 2.30 1.09 2.70 2.38 4.83 0.78 0.91 2.06 4.14 2.31 1.70 1.94 1.58 4.03 3.31

2.13 GHz Core 2 Duo E6400 (2 Cores)
Windows Vista (32-bit) Server 38.1 2.49 2.18 3.39 1.75 3.18 2.35 2.54 2.92 3.28 2.52 2.33 1.65 1.78 3.12 2.60

1.6 GHz Core 2 Duo Mobile L7500 (2 Cores)
Windows 7 Pro (32-bit) Laptop 43.8 2.17 2.75 3.74 2.26 4.23 2.90 3.14 3.46 3.42 2.73 2.48 1.92 2.11 3.39 3.12

2 * 2.00 GHz G5 PowerPC (2 Cores)
Mac OS X (32-bit) Desktop 100.9 4.61 4.64 10.90 4.94 19.50 5.16 5.01 5.70 5.21 7.17 3.33 4.87 4.28 9.74 5.84

(Timings are CPU time in seconds) 
POSTED BY: Mark Tuttle
8 Replies
Just for the record, after a little "hammering", and a restart of Mathematica, I managed to get the Options Inspector and the command ...

Options[EvaluationNotebook[], NotebookAutoSave]

to agree on the status of the NotebookAutoSave option.  In the unlikely event that this disagreement is ever reproducible I will report same.

As would be expected, making sure that the Status of this option was "False" executing my very long Notebook was noticeably faster.

Thanks for everyone's help.

I think the definitive Hyper-Threading story has yet to be writ (assuming it ever matters), but I'll be on the lookout  for anomolies and will (also) report same - here.

Thanks again,
-_Mark
POSTED BY: Mark Tuttle
A note about the benchmarking package included in version 9:

It has not been updated since version 8!  All the results that are included, for various machines, were obtained with version 8 (it must be so because they're the same as in version 8).  On my machine I get a better score under version 9 then version 8.  So comparisons against the included results isn't valid.

Regarding the number of parallel kernels to use: versions 8 and 9 use 8 in this case.  In my experience using 8 instead of 4 does improve performance in some cases, but not always.

There is a SystemOption called MKLThreads, which has a default value of 4.  You could try changing it and see if it makes a difference.  I am not sure what computations it has an effect on (it's in the LinearProgramming subgroup)
POSTED BY: Szabolcs Horvát
Posted 11 years ago
From the perspective of a process running on a modern Intel microarchitecture CPU on modern OS, all hyperthreads are mostly symmetric by their performance characteristics, and no special attention by the application is required to take advantage of them. (This wasn't the case in the early days, but has been so for quite a while now.) So, no worries in this regard - but there are various complications to this answer.

I think most Mathematica licenses cap amount of (compute) subprocesses to four, which usually maps to four physical threads of parallelizable general-purpose Mathematica computation - or around five, when various inter-thread computation overheads are taken into account. Also, I think I've seen all eight hardware threads maxing out when some very specific algorithms are used; probably(?) these are implemented using well-multithreaded algorithms inside one or more subprocesses. Numerical matrix and image computation with sufficiently large inputs are good candidates for this.

To some extent, this is an "academic" question on current generation of desktop systems that usually have four physical cores. Although half of the logical processors might appear to be idle, usually much less extra capacity is left unused. This is because two hardware threads (seen as logical processors) share a single physical core. This kind of an arrangement is sort of a trick, mostly used to squeeze more performance from tasks that spend lots of time waiting on data to be available to the CPU from the memory hierarchy (RAM and caches). Depending on the workload, this can give perfect - or almost no scaling between usage of four and eight busy threads.

On top of all this, scalability of multithreaded algorithms is often not as perfect as one would wish. Although multi-core hardware seems to be the way forward on the hardware front, software technologies are yet to catch on properly apart from relatively narrow set of specialized fields and algorithms.

Conclusion: if your workload appears to run only single-threaded, there *may* be room for improvement. If it automatically scales to four threads on a system with four cores (or eight hyperthreads), worry of "lost" performance only if you have a compelling reason to do so. Loss is typically less than naively assumed 50% - probably closer to 10-20%. Usually optimizations such as running numerical algorithms as compiled C give completely different level of improvement.
POSTED BY: Jari Kirma
Here is the BenchmarkReport[] for my system:
  === System Information ===
  
  Machine Name:          thedoctor
  System:                Mac OS X x86 (64-bit)
  Date:                  September 16, 2013
  Mathematica Version:   9.0.1
  Benchmark Result:      1.32
  
  
 === MathematicaMark9 System Comparison ===
 
 thedoctor                                     1.32
 Mac OS X x86 (64-bit)
 
 3.07 GHz Core i7-950 (8 Cores)                1.00
 Windows 7 Pro (64-bit) Desktop
 
 2.93 GHz Core i7-940 (8 Cores)                0.89
 Linux Ubuntu (64-bit) Desktop
 
 2.67 GHz Core 2 Quad Q9450 (4 Cores)          0.80
 Linux Debian (64-bit) Desktop
 
 3.00 GHz Core 2 Duo E8400 (2 Cores)           0.78
 Linux Ubuntu (64-bit) Desktop
 
 3.06 GHz Core 2 Duo E8435 (2 Cores)           0.73
 iMac OS X Snow Leopard (64-bit) Desktop
 
 1.73 GHz Core i7-820QM (8 Cores)              0.73
 Windows 7 Ultimate (64-bit) Laptop
 
 2 * 2.26 GHz Quad Core Xeon E5520 (8 Cores)   0.69
 Mac XServe OS X (64-bit) Server
 
 2.80 GHz Core 2 Duo Mobile T9600 (2 Cores)    0.67
 Windows 7 Pro (64-bit) Laptop
 
 2 * 2.66 GHz Dual Core Xeon 5150 (4 Cores)    0.56
 MacPro OS X Snow Leopard (64-bit) Server
 
 2.4 Ghz Core 2 Duo Mobile T8300 (2 Cores)     0.47
 MacBook OS X Snow Leopard (64-bit) Laptop
 
 2.60 GHz Core 2 Duo Mobile T7800 (2 Cores)    0.44
 Windows XP Pro (32-bit) Laptop
 
 2 * 2.80 GHz Opteron 254 (2 Cores)            0.38
 Windows XP Pro (64-bit) Server
 
 2.13 GHz Core 2 Duo E6400 (2 Cores)           0.36
 Windows Vista (32-bit) Server
 
 1.6 GHz Core 2 Duo Mobile L7500 (2 Cores)     0.32
 Windows 7 Pro (32-bit) Laptop
 
 2 * 2.00 GHz G5 PowerPC (2 Cores)             0.14
 Mac OS X (32-bit) Desktop
 
 (Faster systems give larger numbers)
 
 
 === MathematicaMark9 Detailed Timings ===
 
                                              Total  Test 1  Test 2  Test 3  Test 4  Test 5  Test 6  Test 7  Test 8  Test 9  Test 10  Test 11  Test 12  Test 13  Test 14  Test 15
 
 thedoctor
 Mac OS X x86 (64-bit)                        10.5   0.47    0.64    0.86    0.56    0.94    0.51    0.47    0.92    0.72    0.81     0.83     0.11     1.33     0.66     0.70
 
 3.07 GHz Core i7-950 (8 Cores)
 Windows 7 Pro (64-bit) Desktop               13.8   0.80    0.98    1.00    0.80    0.84    1.00    0.98    1.00    0.78    1.06     0.95     0.89     0.97     0.92     0.86
 
 2.93 GHz Core i7-940 (8 Cores)
 Linux Ubuntu (64-bit) Desktop                15.6   0.94    0.99    1.14    0.92    0.80    0.81    0.88    1.51    0.89    1.31     1.16     1.14     1.41     0.89     0.86
 
 2.67 GHz Core 2 Quad Q9450 (4 Cores)
 Linux Debian (64-bit) Desktop                17.3   1.11    0.96    1.60    1.14    1.17    0.83    0.89    1.77    0.92    1.25     1.21     0.98     1.44     1.04     1.06
 
 3.00 GHz Core 2 Duo E8400 (2 Cores)
 Linux Ubuntu (64-bit) Desktop                17.8   1.05    0.87    1.88    1.18    1.45    0.76    0.79    1.73    1.43    1.13     1.03     0.84     1.21     1.15     1.33
 
 3.06 GHz Core 2 Duo E8435 (2 Cores)
 iMac OS X Snow Leopard (64-bit) Desktop      18.9   1.06    1.07    1.65    0.93    1.94    0.85    0.88    1.65    1.42    1.32     1.11     1.01     1.47     1.16     1.39
 
 1.73 GHz Core i7-820QM (8 Cores)
 Windows 7 Ultimate (64-bit) Laptop           18.9   1.26    1.19    1.14    1.26    1.33    1.16    1.12    1.17    1.59    1.25     1.63     1.08     1.09     1.44     1.23
 
 2 * 2.26 GHz Quad Core Xeon E5520 (8 Cores)
 Mac XServe OS X (64-bit) Server              20.1   1.06    1.22    1.44    1.00    1.67    1.05    1.12    1.71    0.67    2.79     1.32     1.22     1.76     0.94     1.08
 
 2.80 GHz Core 2 Duo Mobile T9600 (2 Cores)
 Windows 7 Pro (64-bit) Laptop                20.7   1.12    1.20    1.90    1.06    2.04    1.17    1.15    1.61    1.53    1.61     1.19     1.06     1.22     1.25     1.56
 
 2 * 2.66 GHz Dual Core Xeon 5150 (4 Cores)
 MacPro OS X Snow Leopard (64-bit) Server     24.8   1.56    1.26    2.20    1.34    2.44    1.02    1.12    2.03    1.20    2.97     1.55     1.19     1.77     1.46     1.64
 
 2.4 Ghz Core 2 Duo Mobile T8300 (2 Cores)
 MacBook OS X Snow Leopard (64-bit) Laptop    29.8   1.78    1.40    2.44    1.40    3.03    1.13    1.22    2.20    2.35    2.31     1.63     1.57     2.06     2.86     2.38
 
2.60 GHz Core 2 Duo Mobile T7800 (2 Cores)
Windows XP Pro (32-bit) Laptop               31.8   1.27    1.80    2.59    1.44    3.69    1.92    2.06    2.41    3.25    1.91     1.45     1.41     1.34     2.52     2.72

2 * 2.80 GHz Opteron 254 (2 Cores)
Windows XP Pro (64-bit) Server               36.1   2.30    1.09    2.70    2.38    4.83    0.78    0.91    2.06    4.14    2.31     1.70     1.94     1.58     4.03     3.31

2.13 GHz Core 2 Duo E6400 (2 Cores)
Windows Vista (32-bit) Server                38.1   2.49    2.18    3.39    1.75    3.18    2.35    2.54    2.92    3.28    2.52     2.33     1.65     1.78     3.12     2.60

1.6 GHz Core 2 Duo Mobile L7500 (2 Cores)
Windows 7 Pro (32-bit) Laptop                43.8   2.17    2.75    3.74    2.26    4.23    2.90    3.14    3.46    3.42    2.73     2.48     1.92     2.11     3.39     3.12

2 * 2.00 GHz G5 PowerPC (2 Cores)
Mac OS X (32-bit) Desktop                    100.9  4.61    4.64    10.90   4.94    19.50   5.16    5.01    5.70    5.21    7.17     3.33     4.87     4.28     9.74     5.84

(Timings are CPU time in seconds)
POSTED BY: David Reiss
Thank you for the info.  Among other things I'm interested in your experience going forward with your SSD - both in general and in support of Mathematica virtual memory.

In response to your first question/observation, I am not using explicit I/O in my Notebook.  Instead, I'm doing algorithm development using Knuth's Literate Programming approach adapted for Mathematica.  This means I'm telling a story (to myself for the moment) interspersed with function development and execution, with output.  Often this output is saved to a variable, and that variable becomes an argument to the next function, etc..  In my execution window there's lots of writing to disk  - as evidenced by the "Checking then Saving" cycle in the lower left hand corner of the window.  In support of this, unless the Mac Activity Monitor is lying, I have lots of disk I/O even though none of it is "explicit" at present.  I assume, but haven't verified, that if I combined things into a single program - reducing or eliminating intermediate output - then the amount of "Saving" would be greatly reduced.

Because I'm focused on my productivity (and not Mathematica's) I let Mathematica decide how to parallize things, if at all.  Thus, in my current Notebook I don't launch any kernels (explicitly).  Instead, as I described above, some combination of Mathematica and Mac OS X use(?) all 8 cores - 4 real, and 4 virtual - according to the Mathematica Parallel Kernel Status disply.  However, the Mac CPU meter shows that I'm using only 4 of the 8 possible cores.  Your observation that starting more cores than one has in reality is counter-productive makes sense.  Some time down the road understanding these tradeoffs will be very important to me.

Thank you, again.
POSTED BY: Mark Tuttle
That your notebook is showing a "Checking then Saving" message indicates that the notebook itself is being saved.  This is not normal for an executing notebook unless one of two things is going on.  One is that there are explicit commands in the notebook that cause this to happen: one such would be an execution of  NotebookSave.
Another possibility is that the given notebook has the internal option NotebookAutoSave set to True.  If this were the case then the notebook would save itself after each cell is executed.  But NotebookAutoSave is generally not set to True by default: one has to do so intentionally through either the Options INspector or by executing a SetOptions for the notebook object. (I can give you the details on this if you wish.)

A way to check to see if NotebookAutoSave is set to True for the notebook (perhaps inadvertently if that was not your intention) is to execute the following in the notebook:
Options[EvaluationNotebook[], NotebookAutoSave]

As for the SSD, it is very fast for general computer operations: e.g., startup from shutdown of the computer takes perhaps 10 seconds.  I haven't done any benchmarks on Mathematica though.
POSTED BY: David Reiss
Ah, the Options Inspector - which I had viewed earlier - says that NotebookAutoSave is False, whereas executing ...

Options[EvaluationNotebook[], NotebookAutoSave]


says that it's True.  That explains the observed Checking and Saving behavior, but not how it got that way.  And, at present, I can't change the value.

I need to figure this out.

Thanks.  And your higher performance is some combination of ...

1) Not having NotebookAutoSave On,

2) More memory, and

3) Your SSD.

-- Mark
POSTED BY: Mark Tuttle
I have a similar machine (but with 16 Gig Ram and an SSD). When you say that your notebook example writes to disk a lot, is that by design? I.e., that you have commands that are explicitly saving reading or saving data via functions like Import, Export, Read, etc...  If so, yes, things will become I/O driven as you mention. 

One thing worth remembering with regard to Parallel capabilities is that even though $ProcessorCount has a value of 8, it is generally best when doing parallel computations to only launch 4 kernels (the other 4 "cores" being, as you note, "virtual"). Generally, in my experience, there is little or no parallel advantage by launching more Kernels than physical cores. 
POSTED BY: David Reiss
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract