Group Abstract Group Abstract

Message Boards Message Boards

Uses for the new Association feature

GROUPS:
The Raspberry Pi version has a completely new data structure: Association

This is very similar to the dictionary data structure found in many other languages, but it does seem to keep its ordering.  A dictionary doesn't seem to bring anything entirely new in possibilities (??), but it does finally give us this data structure without having to use any hacks (like Dispatch or DownValues), which is a big improvement.

Question: Have any of you already found any practical uses for this data structure in your day-to-day work?  I'd be curious how people are using it, especially in instances when it makes code significantly nice (shorter or easier to follow).  I will need some until I get used to it and it'll become a key part of my active vocabulary, so I'm asking for some ideas/thoughts ...

This data structure seems to be a very natural fit for the output of many functions.  It would be very natural for example for Tally[] and for GatherBy[].  It seems that instead of keeping the same functionality ties to the same functions, completely new names are made up.  For example the twin of Tally[] is called CountBy and GatherBy has OrganizeBy (they're not 100% equivalent but there's a large intersection).  Generally, it should be a good output format for many of the "TransformBy" type function I talked about here and indeed the naming of new functions follows this.

I'm not entirely satisfied with the performance at this moment though.  As of today, both GatherBy and Tally seems to perform significantly faster than the Associations versions.  Take for example this implementation and compare the performance to the seemingly entirely equiavalent PositionIndex.  When there are a large number of duplicates, GatherBy is orders of magnitude faster; for no duplicates it's still faster but the difference is much smaller.  The same goes for Tally vs CountBy.  I'm hoping the performance will be improved eventually.

EDIT: Several of the doc pages mention map-reduce which points at applicatons for distributed computing.
POSTED BY: Szabolcs Horvat
Answer
11 months ago
Re performance, I believe some of the implementation code will be revisited. That said, I'm not sure offhand if this will hit the specific functions you mention.
POSTED BY: Daniel Lichtblau
Answer
11 months ago
I apologize I can't go into more detail about this functionality, but I would like to make a couple notes about the Linux-ARM port. First off, for reasons of technical feasibility this release represents a snapshot, so any function that you find and *especially* functions that did not exist before could have their performance and behavior change drastically within the next year. So, for that reason alone some of what you find might not be fully representative of language features we release in the future. That said, the character of performance on the Raspberry Pi is significantly different also because it simply doesn't have the same level of machine optimizations we have in our x86 builds. Those optimizations are coming over time, but I would caution you against judging the character of performance on functionality on this platform in general for this reason.

As a hypothetical example, a developer writing new functionality might rely heavily on functions that have high performance CPU-specific optimizations applied in the case of our current development build. While this would be a good practice for building future-facing functionality for the product, the Raspberry Pi port might not have such optimizations available and instead rely on an implementation of a given function that performs orders of magnitude slower than on our optimized build, even with the difference in performance between a modern x86 and ARM11 processor taken into account. So, in this hypothetical example, functions that predate those optimizations might outperform the newer code because they may not have been written with that optimization in mind. I can't speak in detail about this given functionality, but for reasons like this I think it would be very easy to mischaracterize the performance of langauge features judging by the Raspberry Pi build alone. I strongly recommend waiting for the next official release of the product to scrutinize the behaviors and *especially* performance of new functionality. 
POSTED BY: Alex Newman
Answer
11 months ago
I'm sorry if my post was misunderstood as criticism, it wasn't meant to be.  I understand that this is not finalized functionality, but that's exactly why it's so exciting for many users, including me :-)
POSTED BY: Szabolcs Horvat
Answer
11 months ago
I don't have a Pi to play with, but offhandedly, Associations allow the most straightforward OO-like behavior in terms of syntax. It's possible the people at WR had this in mind, given the sugar for string keys:
obj = <|"x" -> 1, "y" -> 1|>
Print[obj|:x]
If there was a way to give Associations default values (like the third argument to Lookup, but automatic), you could even have a basic form of inheritence (so-called prototypal inheritence, sans 'this'/'self'). Hopefully no one listens to me though, because the thought of OO-y Mathematica, even of an immutable flavor, brings great disgust to my loins. Although, because Associations are immutable, you can actually accomodate for inheritence just by adding onto the parent. Whoaaah. [1] This might end up being a lot more useful than it seems at first glance.

But even just in terms of basic functionality, I do have a couple programs where Associations would straightforwardize things. It's definitely a "quality of life" improvement, since otherwise you have to choose between intentfulness of declaration (Rules) vs ease of use (Functions).

[1] 
Answer
11 months ago