Message Boards Message Boards

The main function I miss in Mathematica after reading a Python Pandas Book

Posted 4 years ago

Yesterday I finished Python for Data Analysis, 2nd Edition. A very nice book by Wes McKinney, the creator of panda's library for Python, that explores basic data manipulation using Python.

For me (a Mathematica enthusiast), it was a great experience to see how data manipulation is handled in other languages. I love the elegance of the Mathematica environment with It clean and sharp syntax, and yes, the plots are incomparable more beautiful, but at the same time, that are some interesting learnings with pandas.

Going straight to the point, if there is a main function from pandas that I miss in Mathematica, It's the pivot_table function. See how simple is the syntax using python: enter image description here

As you can see, it's as simple as dataset.pivot_table(row, columns, value, total option)

I remember this very old post from Stack Exchange from 2011 about this case. IMHO, pivot is a very basic core operation for data manipulation. It would be nice to have an implementation in some future Mathematica versions.

PS: Dataset can pivot very simple cases, but it has an uncontrollable behavior. It's changes if all the second-level association have the same keys or not, and you can't choose witch rows and columns you want to Pivot. For example:

titanic = ExampleData[{"Dataset", "Titanic"}]    
pivot = titanic[GroupBy["sex"], GroupBy["class"], N@*Mean, "age"]

Will return us: enter image description here That's cool, but if I simulate a missing value in the dataset:

titanic2 = titanic[Select[!(#class ==  "1st" && #sex == "female")&]];
pivot2 = titanic2[GroupBy["sex"], GroupBy["class"], N@*Mean, "age"]

We get the unpivoted form:

enter image description here

To return the desired result, I need to do something like this:

pivot3 = Normal@pivot2;
pivot3[[All]] = KeyUnion[Values@pivot3, 0 &];
Dataset@pivot3

enter image description here

What I believe is very unintuitive, and don't solve the case with more Columns and Rows.

UPDATE - strange result

If I remove N from:

titanic2[GroupBy["sex"], GroupBy["class"], N@*Mean, "age"]

to:

titanic2[GroupBy["sex"], GroupBy["class"], Mean, "age"]

We get:

enter image description here

I tried another functions in the 3rd argument, and I changes the pivot form.

POSTED BY: Rodrigo Murta
6 Replies

Load DataReshape.m

This can be done with the package "DataReshape.m":

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/DataReshape.m"]

(It is in my TODO list to submit a few Wolfram Function Repository entries based on that package.)

Titanic data

In order to convert to wide form with ToWideForm we specify the ID column ("sex"), the Variable column ("class"), the Value column ("age"), and the aggregation function (Mean):

ToWideForm[titanic2, "sex", "class", "age", "AggregationFunction" -> Mean]

enter image description here Here we replace the missing values with 0:

% /. Missing[] -> 0

enter image description here

Tips data

See the attached notebook how to massage the data and apply ToWideForm to the "tips" table in the question / book in order to obtain the posted in the question result.

enter image description here

Attachments:
POSTED BY: Anton Antonov

Many categories of programming languages are found across the history of computer instruction, including chipset (assembler), list processing, structured, typed, untyped, relational, and objected oriented. These should not be confused with execution environments (e.g. single program compilers, runtime/interactive compilers) nor with integrated development environments (IDE's).

Python and Wolfram Language are examples of computer instruction languages. Mathematica is an IDE. In my opinion comparing the two is non-sequitur.

POSTED BY: Richard Frost

Hi Richard, I could have used Wolfram Language in the topic, but paraphrasing Fermat with some poetic licence: "the topic character limit is too short to contain it". So, I used Mathematica.

POSTED BY: Rodrigo Murta

Once the data is 'ragged' (which you did by doing a key drop) there is no 'uniform' way of doing a pivot I think. Note also that the Python example is not ragged, it just has NaN (i.e. Missing[] equivalent), so it is not ragged then… I'm thinking that GroupBy (or GatherBy for non-dataset) can achieve this, but I'm not familiar enough by the Python implementation to copy it in Mathematica.

POSTED BY: Sander Huisman

Hi Sander. I updated the way I simulated the ragged case, I believe that now is more realistic.

POSTED BY: Rodrigo Murta

I completely agree!

POSTED BY: l van Veen
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract