Message Boards Message Boards


Re-order two GroupBy operations to transpose Dataset

Posted 25 days ago
0 Replies
1 Total Likes

In this post I want to revisit the Titanic Dataset example to point out some unexpected GroupBy behavior.

Let's load the example Dataset as so:

titanic = ExampleData[{"Dataset", "Titanic"}]

We'll calculate the survival ratio as a function of gender and cabin class:

Query[GroupBy["sex"], GroupBy["class"], Boole /* Mean /* N, 

Which results in this named row and named column Dataset:

Dataset with survival ratio by gender (first row) and class (second row)

Nothing out of the ordinary, but what if we switch the GroupBy order to transpose the Dataset?

Query[GroupBy["class"], GroupBy["sex"], Boole /* Mean /* N, 

What we'll achieve is not exactly what I expect by switching the GroupBy operations.

Dataset with survival ratio by cabin class (row) and gender (column)

So what if we use the Transpose operation on the initial Dataset and see if works anyway:

 Query[GroupBy["sex"], GroupBy["class"], Boole /* Mean /* N, 

Now this is what I wanted to achieve when switching the order of the GroupBy operations:

Dataset with survival ratio by cabin class (row) and gender (column)

So why was I not able to switch the GroupBy operations to transpose the Dataset?

Let's look into the input form (i.e. association of associations) of the Transposed Dataset and compare it with the input form of the switched GroupBy operations.

Transposing the Dataset gives

<|"1st" -> <|"female" -> 0.965278, "male" -> 0.340782|>, 
 "2nd" -> <|"female" -> 0.886792, "male" -> 0.146199|>, 
 "3rd" -> <|"female" -> 0.490741, "male" -> 0.15213|>|>

Switching GroupBy operations gives:

<|"1st" -> <|"female" -> 0.965278, "male" -> 0.340782|>, 
 "2nd" -> <|"male" -> 0.146199, "female" -> 0.886792|>, 
 "3rd" -> <|"male" -> 0.15213, "female" -> 0.490741|>|>

The difference is subtle: it's in the order of the gender keys. Switching the GroupBy operations did not result in sorted keys, so as a workaround we should sort the keys as so:

Query[GroupBy["class"], GroupBy["sex"] /* KeySort, Boole /* Mean /* N,

Indeed, sorting keys transposed the Dataset:

Switching GroupBy operations with KeySort

As a last note, the workaround does not work if the keys are Symbols. For example try this:

Query[GroupBy["class"], GroupBy["survived"] /* KeySort, Length, "age"]@titanic


Switching GroupBy operations for class and survived with KeySort

The workaround involves changing the Boolean symbols to Strings in the Dataset first and then sort the keys.

The question is is this behavior a bug?

In other words, I expect by changing the order of the GroupBy operations to obtain a transposed Dataset.



Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract