Message Boards Message Boards

11
|
11664 Views
|
1 Reply
|
11 Total Likes
View groups...
Share
Share this post:

Associations and immutability

NOTE: Please see the original version of this post HERE. Cross-posted here per suggestion of Vitaliy Kaurov. Also note, that there is a related discussion on effective applications of associations, which, in my view, nicely complements this one.


Preamble

Discussion below will make clear what immutability means, both in general and in the context of Associations.

General

A few general words on immutability

Associations are immutable data structures. This means that they carry no state, and a copy of an Association is another completely independent Association. This is similar to Lists in Mathematica, but while List copying is expensive (when, for example, it is not just copying but some, however small, modification), the copying / making small modifications to Associations is cheap. In particular,

Append[lst, elem]

will have the complexity proportional to Length[lst], while

Append[assoc, key -> value]

will be roughly constant time. Again, in both cases the result is a new independent data structure carrying no state. But while iterative accumulation of lists isn't practical, iterative accumulation of Associations is (top-level iteration overhead notwithstanding), precisely due to the above complexity arguments.

Because immutable structures have no internal state (at least as far as the end user is concerned), their identity is completely defined by their structure. You can't change an immutable structure without changing its identity, i.e. without producing a new, different structure. This is in contrast to mutable structs or objects, which formally preserve their identity (reference, place in memory where the reference points to) even when they are internally changed. For example, in Java, if you write

 HashMap<String,String> map = new HashMap<String, String>();
 map.put("first","firstValue");

and then execute

 map.put("second","secondValue")

then the state of the object has changed, but it remained the same object in terms of its identity (can be accessed via the same reference), etc. This also means that if you use such an object in several places in your program, and change it in one place, it will also be "felt" in all other places, since all of them actually point to the same object.

But when you modify an Association, you get a brand new Association, which is a different object. So, looking at it from this angle, we can see that the core difference between mutable and immutable data structures is in their approach to identity: mutable structures define identity via the reference to the object (place in memory, etc), but not its content. For the above Java example, this is reflected by the necessity to define methods equals and hashCode in a mutually consistent way, to be able to use an object as a key in a hash map. For the same reason, in Python you can't use lists as keys in dictionaries (lists are mutable, and therefore don't preserve their structural identity over time), but can use special data structures like tuples and frozen sets (which are immutable).

Immutable data structures OTOH directly associate their identity with their internal (user-level) structure, and don't care about references, memory locations, etc. The code that uses immutable structures tends to be more modular and easier to argue about and debug, because immutable structures, being stateless, don't depend on the environment. This automatically makes functions using them more composable, and also individually testable. Not all problems are easily amenable to immutable data structures, but many are, particularly those related to complicated data transformations.

Complications

Immutable structures are nice, but in many cases using them in their pure form would be impractical. In particular, one of the most common needs is to be able to mutate (change) some specific elements in a list, without copying the entire list. The main reason to need this is, of course, efficiency.

So, Lists in Mathematica are given a limited form of mutability: if you assign a list to a Symbol, then you can mutate its elements in-place. In fact, this is true for general expressions, and on arbitrary level (elements at any depth), not just lists. Strictly speaking, this does not really make lists or general expressions mutable - because there are no references, but rather constructs a more efficient short-cut to an operation like

var  = doSomethingWithSomePartOf[var]

In other words, I think that the right way to think about this mutability is that the resulting contents of a given variable is a different list (expression), but there is a syntax to perform part replacements efficiently, de-facto changing elements in place. This doesn't change the overall programming model of Mathematica, which is based on immutable expressions and the absence of explicit references - it just makes certain specific (part) modifications efficient when done in-place for structures assigned to Symbols.

What is peculiar about this capability is that it can only be achieved via the Part command (and a few overloaded functions such as AddTo, AppendTo, etc.), and only if the argument of Part is a Symbol. In other words, there doesn't exist general reference / pointer semantics in Mathematica, but rather a very limited form of it, the minimal form necessary to provide mutability. What this means is that while you can mutate some List (or general expression), like

lst = Partition[Range[10],2]
lst[[2,1]] = 100

You can't do this in two steps like

With[{part = lst[[2]]}, part[[1]] = 200]

During evaluation of In[27]:= Set::setps: {100,4} in the part assignment is not a symbol. >>

(* 200  *)

because lst[[2]] is not by itself an L-value that can be assigned - it evaluates to an immutable value which can't be further assigned. So, what we have here is certain form of mutability without actual pass-by-reference mechanism. Note by the way that this works:

lst[[2]][[1]] = 200

200

But not because there are general references, but because Part has been specially overloaded.

The same story happens with Associations. They too have been given some form of mutability, similar to Lists. If we start with your example:

x = 
 {<|"firstValue" -> True, "isFirstValueTrue" -> False|>, 
  <|"firstValue" -> True, "isFirstValueTrue" -> True|>, 
  <|"firstValue" -> False, "isFirstValueTrue" -> False|>, 
  <|"firstValue" -> False, "isFirstValueTrue" -> True|>}

Then, using

xcopy = x;

We can see that this will work:

xcopy[[1, Key["isFirstValueTrue"]]] = xcopy[[1, Key["firstValue"]]]

(* True *)

But for Associations, this doesn't:

xcopy[[1]]["isFirstValueTrue"] = xcopy[[1]]["firstValue"]

During evaluation of In[45]:= Association::setps: <|firstValue->True,isFirstValueTrue->True|> in the part assignment is not a symbol. >>

(* True *)

What works here is:

xcopy[[1]][[Key["isFirstValueTrue"]]] = xcopy[[1]]["firstValue"]

(* True *)

It may well be that at some later point the first form will also be made to work.

The role of Hold - attributes

It is important to clarify the role of Hold-attributes in this context. We know that, if we want to pass some argument by reference, we need to set the appropriate Hold attribute. A simplest example here would be writing our own Increment function:

ClearAll[increment];
SetAttributes[increment, HoldFirst]
increment[x_]:=x=x+1;

There are two things which I think are important to realize here. One is that Hold- attributes carry out a much more general task in Mathematica: they allow one to modify the evaluation process and work with unevaluated code blocks (pass them around, delay their evaluation, analyze them, etc). Their role in providing a pass-by-reference semantics is a very special case of their general functionality.

The second thing to note here is that Hold-attributes in fact don't provide a full pass-by-reference mechanism. They can't, because there are no references in Mathematica. By a reference, I mean here a handle, which can exist in between evaluations, while providing a way to mutate an object it points to. Symbols by themselves can't serve as references, because they evaluate immediately, once allowed to. One can certainly emulate references with the top-level Mathematica code, but the point here is that they are not natively present in the core computational model in Mathematica (and that would mean tight integration with the language and its many constructs).

The way Hold-attributes work is by delaying evaluation of expressions until they are substituted verbatim into the body of the functions, to which they get passed. This is possible because parameter-passing semantics in Mathematica makes pretty much all functions work like macros, which assemble their bodies before executing them. The reason this is not a true pass by reference semantics is that the property of delayed evaluation is attached not to arguments being passed (which would make them true references), but to certain functions we pass those argument to. And this is exactly the reason why this style of programming is hard in Mathematica: if we pass arguments through the chain of functions, we have to explicitly make sure that they will be preserved unevaluated by all intermediate functions in the chain. In practice, this is too much hassle to be worth it, in all cases except a few.

The case at hand

Mutable approach

In any case, given your particular problem, one way to solve it would be:

xcopy = x;
xcopy[[All, Key["isFirstValueTrue"]]] = xcopy[[All, Key["firstValue"]]];
xcopy

(you could've used x instead of xcopy - I maintain a copy to preserve x unchanged, because I am using it here several times).

The above considerations also tell us that this will work:

ClearAll[setFV];
SetAttributes[setFV, HoldAll];
setFV[x_?ListQ] := Map[setFV[x[[#]]] &, Range[Length[x]]];
setFV[x_?AssociationQ] := x[[Key["isFirstValueTrue"]]] = x["firstValue"];

So that you can also do:

xcopy = x;
setFV[xcopy];
xcopy

However, already here one can notice that we start to swim upstream. In particular, we must pay careful attention to preserve certain parts in held form (for reasons I outlined above, so that they can kind of serve as references), and there are subtle things here, which, if changed slightly, will break this code. For example, if I used patterns _List and _Association in place of _?ListQ and _?AssociationQ, this wouldn't work.

Even for lists, such mutable approach is rarely worth using (except in massive assignments at the same time), and for Associations, this is even more so, because, in contrast with Lists, they are cheap to modify element-wise.

A better approach

A much better approach, in my view, is to embrace immutability. In this particular case, this would mean using something like:

xcopy = x;
xcopy = Map[
  Function[assoc,
    MapAt[assoc["firstValue"] &, assoc, {Key["isFirstValueTrue"]}]
  ]
]@ xcopy 

which is, perform operations on immutable data, and assign the result to the initial variable.

Conclusions

Because Mathematica adds a limited form of mutability to otherwise immutable data structures, but lacks the general reference / pointer semantics, mutable code is rarely the best approach (it does have its uses though). Very often, it may be better to use a generally more idiomatic approach based on transforming immutable structures. This may be particularly true for Associations, which are cheap to modify.

POSTED BY: Leonid Shifrin

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract