Message Boards Message Boards

[✓] Get the number of specified elements in a sparse array?

GROUPS:

If myArray is a sparse array with some elements already explicitly defined. How can I find that number of specified elements? Those that do not have the default value. I.e. the number, which is displayed when I just say Print[myArray] ?

I know that

Length[ArrayRules[myArray]] - 1

works. But this seems pretty strange for me. Is there no standard function similar to Length?

POSTED BY: Werner Geiger
Answer
3 months ago

There are two separate issues here:

  • How to get the number of explicitly stored elements efficiently? Use Length[myArray["NonzeroValues"]]. This is undocumented syntax, but it is widely used publicly and—in my opinion—very unlikely to go away in the future. See here. I expect this method to be the most efficient way (in speed and memory usage) to accomplish your task.

    Caveat: Despite its name, "NonzeroValues" returns non-default values even in the case when the default value is something else than zero.

  • How to get the number of elements that are different from the default value? This is a separate issue because some of the explicitly stored elements may be equal to the default value. To get rid of these, we must re-build the sparse array: Length[ SparseArray[myArray]["NonzeroValues"] ]

If you are curious to see a sparse array that has explicitly stored elements that are equal to the background value, here is an example:

In[25]:= sa = SparseArray[{0, 1, 2, 3}];

In[26]:= sa2 = If[# > 2, 0, #] & /@ sa;

In[27]:= sa2["NonzeroValues"]

Out[27]= {1, 2, 0}

In[28]:= ArrayRules[sa2]
Out[28]= {{2} -> 1, {3} -> 2, {4} -> 0, {_} -> 0}

In[29]:= sa3 = SparseArray[sa2]; (* rebuild the array *)

In[30]:= ArrayRules[sa3]
Out[30]= {{2} -> 1, {3} -> 2, {_} -> 0}
POSTED BY: Szabolcs Horvát
Answer
3 months ago

Hmmh.

This helps. But still sounds strange. I do not like to use undocumented functions like your

Length[myArray["NonzeroValues"]].

And this looks as expensive as my primitive solution from above

Length[ArrayRules[myArray]] - 1

In both cases some huge list of things has to be established. For nothing, just to count its members.

And of course I am not interested in values which differ from the default, but only in values which had been explicitly assigned to my sparse array.

I think, the internal structure of a SparseArray-Object knows exactly, which and how many different objects had ever been explicitly assigned. That number should be accessible at no time. I assume, that this is the number that a simple display of a sparse array shows as "Specified elements".

But I still have no idea, how to get that number directly without counting

POSTED BY: Werner Geiger
Answer
3 months ago

I understand your concerns about undocumented functions, and I usually advise people to be very cautious. However, not all undocumented functions are made equal, and I think this one is as safe as it gets. If you search this site, Mathematica.SE, or the old MathGroup, you will find countless uses of this functions, sometimes by WRI employees. If you search the accessible source code of Mathematica, you will also find several uses.

When I face using an undocumented function, I usually consider the risks:

  • Does it really do what I think it does? Will I get a wrong result from it? I think there is no issue here.

  • Will it keep working in the future? This usually matters when you are using it in a package that you intend to publish, and which will be used by many people. This is where the real risk lies. But in this case, based on the experience of the past decade, I'd say the risk is as small as it gets.

(Still, future versions of Mathematica may break any current package, and the best way to protect against this is to have unit tests, run them when a new version comes out, and update the package as needed.)

In both cases some huge list of things has to be established.

This is only true for ArrayRules.

I think, the internal structure of a SparseArray-Object knows exactly, which and how many different objects had ever been explicitly assigned. That number should be accessible at no time.

I strongly suspect that "NonzeroValues" does not create any copies, it simply accesses the internal structure of the sparse array.

But what really matters is actual performance, not the reason behind it, so let's test it!

In[13]:= sa = SparseArray@RandomChoice[{9, 1} -> {0, 1}, 1000000];

In[14]:= Do[
  Length@ArrayRules[sa],
  {100}
  ] // AbsoluteTiming

Out[14]= {3.58351, Null}

In[15]:= Do[
  Length@sa["NonzeroValues"],
  {100}
  ] // AbsoluteTiming

Out[15]= {0.000092, Null}

I hope this convinced you that this is indeed the most efficient solution.

POSTED BY: Szabolcs Horvát
Answer
3 months ago

Thank you very much, Szabolcs,

you convinced me that your Length[myArray["NonzeroValues"]] is far better (since faster) than the official Length[ArrayRules[myArray]]-1. I did these timing test myself meanwhile. Actually I don't understand why your solution is faster, since myArray["NonzeroValues"] creates a list just like ArrayRules[myArray] does. The Wolfram interpreter seems to be pretty clever optimized and recognizes that in your case it needs not construct the list just to get it's length. I have no idea, how it is able to know that.

Anyway, it works. And it works quickly.

Maybe somebody from Wolfram reads our discussion. They should implement some function like SpecifiedElements[myArray] which would probably be a no-brainer. The SparseArray object knows of course directly how many elements had been explicitly assigned. Without counting anything.

POSTED BY: Werner Geiger
Answer
3 months ago

There remains a problem. You seem to think, that "Specified Elements" ist what it should be, i.e. the number of elements which had been explicitly assigned. I think, that this is not true. But it is the number of elements which are different from the default value instead.

Look at the following trivial code.

Step 1: Create a sparse array myArray with dimensions 10x10, default -1, and assigns some value not equal to -1 to element {5,4}. Then print all our ideas we have until now to get the number of specified elements. They all show 1 which is correct.

Step 2: Then we assign 0 (i.e different from the default) to element {4,3} and the specified elements are 2 now. With all our known solutions. Which is correct.

Step 3: Then we assign -1 (i.e the default) to element {3,2} and the specified elements are still 2 now. With all our known solutions. Which is wrong from my point of view. It should be 3. This proves that all our ideas for counting "specified elements" do not count those, but values different from the default instead.

myArray = SparseArray[{{5, 4} -> 4711}, {10, 10}, -1];
Print["1. ", myArray, ", ", Length[myArray["NonzeroValues"]], ", ", 
  Length[ArrayRules[myArray]] - 1];
myArray[[4, 3]] = 0;
Print["2. ", myArray, ", ", Length[myArray["NonzeroValues"]], ", ", 
  Length[ArrayRules[myArray]] - 1];
myArray[[3, 2]] = -1;
Print["3. ", myArray, ", ", Length[myArray["NonzeroValues"]], ", ", 
  Length[ArrayRules[myArray]] - 1];



1. SparseArray[Specified elements: 1
Dimensions: {10,10}
Default: -1

], 1, 1

2. SparseArray[Specified elements: 2
Dimensions: {10,10}
Default: -1

], 2, 2

3. SparseArray[Specified elements: 2
Dimensions: {10,10}
Default: -1

], 2, 2

This is a bit strange. But not a big problem. I will just use some nonsense-default for my sparse array which will never occur under the values stored within it. Say -1, Infinity, Null or the like.

POSTED BY: Werner Geiger
Answer
3 months ago

What I should have made clearer: I do not think there is any other way than "NonzeroValues" to do what you are looking for (i.e. efficient access). I would be happy to be proven wrong though.

POSTED BY: Szabolcs Horvát
Answer
3 months ago

Actually I don't understand why your solution is faster, since myArray["NonzeroValues"] creates a list just like ArrayRules[myArray] does.

Mathematica employs the copy-on-write paradigm. When we do

a = Range[10000];

it does indeed allocate memory for 10000 elements. But when we do

b = a;

it does not allocate memory for a further 10000 elements. b and a will point to the same area of memory.

Conceptually, a and b are two entirely separate variables. But the system will take the opportunity to share storage between them when possible.

Now if we change b using

b[[1]] = 42;

then a and b won't have the same value anymore, and won't be able to share storage. Only at this point—i.e. when changing b—will Mathematica allocate separate storage for b.

It's also interesting to read up on the Share[] command, which will attempt to detect duplicate data and share their storage to conserve memory.

Most of this sharing is fully transparent to the user. We do not need to be aware of it to get correct results. But it is useful to know about it when we want to write efficient programs.

POSTED BY: Szabolcs Horvát
Answer
3 months ago

"Mathematica employs the copy-on-write paradigm."

Ja, I understand that. I wrote several compilers and interpreters myself.

Hence myArray["NonzeroValues"] does not create a copy of that list, since it has that list already in the internal structure of a sparse array. But ArrayRules[myArray] are not directly available and have to be constructed.

Ja, that sounds understandable.

Thank you for your comments. I learned a lot from them.

POSTED BY: Werner Geiger
Answer
3 months ago

Group Abstract Group Abstract