Message Boards Message Boards

1
|
2776 Views
|
0 Replies
|
1 Total Likes
View groups...
Share
Share this post:

Re-order two GroupBy operations to transpose Dataset

Posted 3 years ago

In this post I want to revisit the Titanic Dataset example to point out some unexpected GroupBy behavior.

Let's load the example Dataset as so:

titanic = ExampleData[{"Dataset", "Titanic"}]

We'll calculate the survival ratio as a function of gender and cabin class:

Query[GroupBy["sex"], GroupBy["class"], Boole /* Mean /* N, 
  "survived"]@titanic

Which results in this named row and named column Dataset:

Dataset with survival ratio by gender (first row) and class (second row)

Nothing out of the ordinary, but what if we switch the GroupBy order to transpose the Dataset?

Query[GroupBy["class"], GroupBy["sex"], Boole /* Mean /* N, 
  "survived"]@titanic

What we'll achieve is not exactly what I expect by switching the GroupBy operations.

Dataset with survival ratio by cabin class (row) and gender (column)

So what if we use the Transpose operation on the initial Dataset and see if works anyway:

Transpose@
 Query[GroupBy["sex"], GroupBy["class"], Boole /* Mean /* N, 
   "survived"]@titanic

Now this is what I wanted to achieve when switching the order of the GroupBy operations:

Dataset with survival ratio by cabin class (row) and gender (column)

So why was I not able to switch the GroupBy operations to transpose the Dataset?

Let's look into the input form (i.e. association of associations) of the Transposed Dataset and compare it with the input form of the switched GroupBy operations.

Transposing the Dataset gives

<|"1st" -> <|"female" -> 0.965278, "male" -> 0.340782|>, 
 "2nd" -> <|"female" -> 0.886792, "male" -> 0.146199|>, 
 "3rd" -> <|"female" -> 0.490741, "male" -> 0.15213|>|>

Switching GroupBy operations gives:

<|"1st" -> <|"female" -> 0.965278, "male" -> 0.340782|>, 
 "2nd" -> <|"male" -> 0.146199, "female" -> 0.886792|>, 
 "3rd" -> <|"male" -> 0.15213, "female" -> 0.490741|>|>

The difference is subtle: it's in the order of the gender keys. Switching the GroupBy operations did not result in sorted keys, so as a workaround we should sort the keys as so:

Query[GroupBy["class"], GroupBy["sex"] /* KeySort, Boole /* Mean /* N,
   "survived"]@titanic

Indeed, sorting keys transposed the Dataset:

Switching GroupBy operations with KeySort

As a last note, the workaround does not work if the keys are Symbols. For example try this:

Query[GroupBy["class"], GroupBy["survived"] /* KeySort, Length, "age"]@titanic

Gives:

Switching GroupBy operations for class and survived with KeySort

The workaround involves changing the Boolean symbols to Strings in the Dataset first and then sort the keys.

The question is is this behavior a bug?

In other words, I expect by changing the order of the GroupBy operations to obtain a transposed Dataset.

Cheers,

Dave

POSTED BY: Dave Middleton
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract