Message Boards Message Boards

What's faster/better Apply or Map?

Posted 10 years ago

Hi,

Have a list of strings of the form data={{{str1,str2},{str3,str4}},...}, this list is very large, about 68 million elements. I want to compare elements str2 and str4 in each of the sublists. I wonder what would be faster/better (or if you can provide another way of doing it):

code 1:

{#[[1]],#[[2]],EditDistance[#[[1]]//Last,#[[2]]//Last]}&/@data

or code 2:

{#1,#2,EditDistance[#1//Last,#2//Last]}&@@@data

I think code 2 is more elegant, but I am interested on performance. In any case, to (try) conserve memory I am replacing the contents of data like data=OperationOn[data], is this good practice or is it better to employ something like data2=OperationOn[data] and then, possibly, Remove[data]?

POSTED BY: Miguel Olivo-V
3 Replies
Posted 10 years ago

They are just strings, probably no more than 15 characters long. How can I use Compile or ParallelTable in this case? I am not very familiar with those functions. What if I wrap everything inside Parallelize?

POSTED BY: Miguel Olivo-V

You can simply use ParallelMap:

ParallelMap[{#[[1]], #[[2]], EditDistance[#[[1]] // Last, #[[2]] // Last]} &, data]

Here is my sample test:

names = {"Sophia", "Emma", "Olivia", "Noah", "Liam", "Jacob", "Mason", 
"Isabella", "William", "Ethan", "Michael", "Ava", "Alexander", 
"Jayden", "Daniel", "Elijah", "Aiden", "James", "Benjamin", "Matthew"}
(*5 million names*)
realtest = RandomChoice[names, 5000000];
(*formatting function*)
f = Partition[#, 2, 2, 1] &; 
data = f@f@realtest;
(*timing done on my 4 core i7-3770 + 4 core xeon 3.0 GHZ = 8 core environment*)
AbsoluteTiming[
 plotData = 
   ParallelMap[{#[[1]], #[[2]], 
      EditDistance[#[[1]] // Last, #[[2]] // Last]} &, testData];]
(* {70.584037, Null} *)

Total time elapsed including I/O is 70 seconds.

POSTED BY: Shenghui Yang

Can you provide part of the data? Maybe Compile and ParallelTable can save your life.

POSTED BY: Shenghui Yang
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract