# Re-order two GroupBy operations to transpose Dataset

Posted 25 days ago
171 Views
|
0 Replies
|
1 Total Likes
|
 In this post I want to revisit the Titanic Dataset example to point out some unexpected GroupBy behavior.Let's load the example Dataset as so: titanic = ExampleData[{"Dataset", "Titanic"}] We'll calculate the survival ratio as a function of gender and cabin class: Query[GroupBy["sex"], GroupBy["class"], Boole /* Mean /* N, "survived"]@titanic Which results in this named row and named column Dataset:Nothing out of the ordinary, but what if we switch the GroupBy order to transpose the Dataset? Query[GroupBy["class"], GroupBy["sex"], Boole /* Mean /* N, "survived"]@titanic What we'll achieve is not exactly what I expect by switching the GroupBy operations.So what if we use the Transpose operation on the initial Dataset and see if works anyway: Transpose@ Query[GroupBy["sex"], GroupBy["class"], Boole /* Mean /* N, "survived"]@titanic Now this is what I wanted to achieve when switching the order of the GroupBy operations:So why was I not able to switch the GroupBy operations to transpose the Dataset?Let's look into the input form (i.e. association of associations) of the Transposed Dataset and compare it with the input form of the switched GroupBy operations.Transposing the Dataset gives <|"1st" -> <|"female" -> 0.965278, "male" -> 0.340782|>, "2nd" -> <|"female" -> 0.886792, "male" -> 0.146199|>, "3rd" -> <|"female" -> 0.490741, "male" -> 0.15213|>|> Switching GroupBy operations gives: <|"1st" -> <|"female" -> 0.965278, "male" -> 0.340782|>, "2nd" -> <|"male" -> 0.146199, "female" -> 0.886792|>, "3rd" -> <|"male" -> 0.15213, "female" -> 0.490741|>|> The difference is subtle: it's in the order of the gender keys. Switching the GroupBy operations did not result in sorted keys, so as a workaround we should sort the keys as so: Query[GroupBy["class"], GroupBy["sex"] /* KeySort, Boole /* Mean /* N, "survived"]@titanic Indeed, sorting keys transposed the Dataset:As a last note, the workaround does not work if the keys are Symbols. For example try this: Query[GroupBy["class"], GroupBy["survived"] /* KeySort, Length, "age"]@titanic Gives:The workaround involves changing the Boolean symbols to Strings in the Dataset first and then sort the keys.The question is is this behavior a bug?In other words, I expect by changing the order of the GroupBy operations to obtain a transposed Dataset.Cheers,Dave