Message Boards Message Boards

4
|
7168 Views
|
10 Replies
|
12 Total Likes
View groups...
Share
Share this post:

Possible bug: Map puts SparseArray in inconsistent state

Posted 7 years ago

It seems that Maping a function over a SparseArray can produce a result where one of the explicit values in the array is the same as the background value. It seems to me that this is a bug because other functions appear to assume that an explicit value will never be the same as the background value.

Example

Let us create a sparse array,

In[1]:= sa = SparseArray[{1, 0, 2, 0, 3}];

and check its explicit values.

In[2]:= sa["NonzeroValues"]
Out[2]= {1, 2, 3}

Let us now change the first element (which is nonzero) to have the background value.

In[3]:= sa[[1]] = 0;

And check explicit values again:

In[4]:= sa["NonzeroValues"]
Out[4]= {2, 3}

As you can see, the sparse array has been recomputed so that none of the explicit values is 0. It is in a consistent state. This is what I expected.

Now let us modify elements in a different manner. Let us map a function over the sparse array in such a way that it will transform some of the explicit values to zero:

In[5]:= sa = SparseArray[{1, 0, 2, 0, 3}];

In[6]:= sa2 = Map[If[# > 2, 0, #] &, sa];

Check explicit values in the result:

In[7]:= sa2["NonzeroValues"]
Out[7]= {1, 2, 0}

Oops! What is that 0 doing in the list? Is should have been removed from the explicit values list.

Is this a bug? Do decide, let us take a look at how other functions handle this sparse array.

ArrayRules works on both sparse and dense arrays. ArrayRules[arr, b] gives the positions of elements that are different from b. It works fine even if b is different from the background element of a sparse arr.

But here ArrayRules fails to give the result I expect, even though the second element was given explicitly:

In[8]:= ArrayRules[sa2, 0]
Out[8]= {{1} -> 1, {3} -> 2, {5} -> 0, {_} -> 0}

That {5} -> 0 should not be in the list.

See how ArrayRules works if the default element is set to 1. There is no problem this time:

In[9]:= ArrayRules[sa2, 1]
Out[9]= {{2} -> 0, {3} -> 2, {4} -> 0, {5} -> 0, {_} -> 1}

This suggests that either ArrayRules or SparseArray is misbehaving here. Which one should be considered the buggy one?

Does this SparseArray have a broken internal state? Or does ArrayRules fail to take a special case into account?


This array can be fixed using SparseArray[sa2].

POSTED BY: Szabolcs Horvát
10 Replies

My previous post is too long, so many will not read it. Let me put it more concisely.

Usually, ArrayRules[arr, b] will return the positions/values of elements in arr that are different from b.

But there is a special case: this may not be what happens if arr is sparse and its background element is precisely b.

This is a rarely occurring special case. It is not pointed out in the documentation. It is unavoidable that people will not think of it, and get rarely triggered bugs in their code because of it. This means that the current behaviour of ArrayRules is not user-friendly. A good API does not make it easy to write buggy code. I wonder if any Wolfram programmers fell into the same trap I have ...

POSTED BY: Szabolcs Horvát

That one line description should probably not be taken to supersede the ArrayRules refpage... anyway, I have it on good authority that this is expected behavior. The "NonzeroPositions" are not recomputed automatically, as doing so would be unreasonably expensive. One may call SparseArray again if desired, otherwise be prepared that what "NonzeroValues" returns may in fact contain a background element.

POSTED BY: Ilian Gachevski

Okay, "promises" was too strong a word.

The documentation is somewhat vague on ArrayRules, so one is left to make some guesses.

To interpret the documentation properly, it is also critical to distinguish between arguments described as list, which refer to dense arrays, and arguments described as s or SparseArray[...], which refer only to SparseArrays. I did not pay sufficient attention to this distinction.

The "NonzeroPositions" are not recomputed automatically, as doing so would be unreasonably expensive.

This is completely reasonable.

What ArrayRules actually does is, in my opinion, not so reasonable from a user-friendliness perspective.

Let me describe what it does precisely (as I understand it now).

ArrayRules[x] will:

  • If x is a SparseArray, then return the background element (which may be nonzero!) and explicit values (which may contain the background) in a specific format.
  • Otherwise, return positions/values that are nonzero.

The documentation says, "ArrayRules[list] gives rules for SparseArray[list]", which is a completely reasonable concise explanation, but of course it should not be taken literally, as evidenced by the difference between ArrayRules[{{1} -> 1, {4} -> 1}] and ArrayRules@SparseArray[{{1} -> 1, {4} -> 1}].

There is also a line saying "ArrayRules[list] assumes a default value of 0." Notice that this is only true for dense arrays (which is implied by the name list, but that is too easy to overlook).

ArrayRules[x, b] will:

  • If x is a SparseArray with background element b, then return the background element and explicit values (which may contain the background) in a specific format
  • If x is a SparseArray with background element other than b, then return the positions/values different from b
  • If x is not a SparseArray then return positions/values different from b.

This may be okay from the implementor's perspective, but it is disturbingly inconsistent from a user's perspective. Why is there a special casing for whether the second argument is equal to the background element of the first one?

  1. ArrayRules[x] could be a function that just reveals the structure of a SparseArray: explicit values/positions and background element. In this case it should not work on a dense array and should not have a second argument. There is no need for a second argument because that background element is what it is. This would be a simple to understand function that operates on a sparse array data structure.

  2. Otherwise ArrayRules[x, b] could be a function which returns the values/positions of elements different from b. The default b could be taken to be 0 (for the one-argument form). This ArrayRules would not care about what data structure x is (dense or sparse). It would operate at a higher, conceptual level. It is also simple to understand.

Instead, we have an ArrayRules which is a confusing mix of these two things.

All these things are not precisely stated in the documentation. The behaviour is sufficiently complicated that it would be hard to describe it concisely. So I am left to guess. Given that ArrayRules works on dense arrays and that it has a second argument, guessing that the two-argument form does what I described in 2. above was completely reasonable.

POSTED BY: Szabolcs Horvát

ArrayRules[arr, x] promises to return the positions of only those elements of arr which are different from x.

Out of curiosity, where exactly is this promise stated?

POSTED BY: Ilian Gachevski

In tutorial/SparseArrays-LinearAlgebra it says: "the rules for nonzero elements in an array". But it is not said in the documentation of ArrayRules itself I think..

POSTED BY: Sander Huisman

I am retyping this response because after I typed up most of it, it simply vanished with no warning (and no interaction on my part) ... not sure why.

I've always wondered if AdjacencyLists, Background, ColumnIndices, Density, MatrixColumns, MethodInformation, Methods, NonzeroPositions, NonzeroValues, PatternArray, Properties, and RowPointers are actually officially supported and should be used.

These are commonly used, and shown in some documentation examples (just search for them). You will also find them in a lot of source code that comes with Mathematica. I really doubt that they will go away.

But here I only used them to pinpoint the problem. The problem manifests itself in a practical way during the use of ArrayRules.

But yeah, seems to be a 'bug', though not a critical one, as your matrix is fine...

I think this is a critical problem because it leads to incorrect results. I am not sure that we can say that the matrix is fine if the various documented functions no longer work correctly with it. See the ArrayRules example.

I used ArrayRules to replace all zero elements of a sparse or dense matrix by something else. The implementation needed to work on huge sparse arrays, so converting to a dense format was not an option. This bug broke it.

Specifically, I needed to replace all zeros in a sparse adjacency matrix by Infinity, so I could use WeightedAdjacencyGraph to convert back the output of WeightedAdjacencyMatrix to a graph. The former uses Infinity to represent nonexistent edges, while the latter uses zero (0 or 0.0). I am not sure why there is a mismatch between the two. It is only an inconvenience.

I Mapped a cutoff function onto the weighted adjacency matrix to replace some small weights with zeros (and thus remove those edges). That is how the inconsistent sparse matrix was created.

POSTED BY: Szabolcs Horvát

What I meant was, is that when you construct an array from the array rules the matrix is not 'wrong', it is however not stored most efficiently... Someone should have a look at it though...

POSTED BY: Sander Huisman

My point is that ArrayRules[arr, x] promises to return the positions of only those elements of arr which are different from x. This is regardless of whether arr is sparse or dense or what its background element is when it is sparse.

This is not what it does here. ArrayRules[arr, 0] is giving me the positions of some zeros.

That's a wrong result, in other words: a bug. This is not about inefficient storage. It is about a function giving an incorrect result.

POSTED BY: Szabolcs Horvát

Aaah yeah, that is true of course! I think however it is SparseArray that is to blame. The 'guts' of SparseArray is not updated properly. Triggering and evaluation of sparsearrays solves it indeed:

sa = SparseArray[{1, 0, 2, 0, 3}];
FullForm[sa]
sa2 = Map[If[# > 2, 0, #] &, sa];
FullForm[sa2]
sa2 = SparseArray[sa2];
FullForm[sa2]

It seems the 'Map' does not update the guts nicely, as you noticed before... I think ArrayRules purely uses the guts of SparseArray...

POSTED BY: Sander Huisman

I've always wondered if AdjacencyLists, Background, ColumnIndices, Density, MatrixColumns, MethodInformation, Methods, NonzeroPositions, NonzeroValues, PatternArray, Properties, and RowPointers are actually officially supported and should be used. I've used some of these myself, but I always have in the back of my mind the idea that some day it might not work anymore... I could not find any of the properties in the documentation...

But yeah, seems to be a 'bug', though not a critical one, as your matrix is fine...

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract