Message Boards Message Boards

46
|
59612 Views
|
27 Replies
|
62 Total Likes
View groups...
Share
Share this post:

A primer on Association and Dataset

Attachments:
POSTED BY: Seth Chandler
27 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished on your profile! Thank you, keep it coming!

POSTED BY: Moderation Team

It would be great if this -- or something with similar depth -- made it into the official Wolfram Mathematica documentation.

Definitely. That would be really welcome.

POSTED BY: Arno Bosse
Posted 6 years ago

Excellent! thanks for sharing it!

POSTED BY: Andres Aldana

Often, it's not necessary to use Slot for positional dereference, eg Query[2,f,3] evaluates the same as Query[#[[2]]&,f,#[[3]]&]. Similarly, Span works as well.

Ps, for those interested, I'm close to finishing my book Functional Data Workflow which is based on real-world methods and data collected as part of large time-motion/UX/EHR studies at two large healthcare organizations. Email if you'd like to see sample chapter preprints.

POSTED BY: Alan Calvitti

Hi Alan, thanks for the offer. I'd love to see those sample chapter preprints. My email address is ruben dot garcia at jic dot ac dot id

Great resource, thanks!

Be patient! A long book on the topic is coming. Before end of 2019.

POSTED BY: Seth Chandler

Hi Seth, Any updates on the book? I know such things take longer than expected, but it will be very useful. WCC

POSTED BY: W. Craig Carter
Posted 2 years ago

Hi ,Seth,

Is your book published?

POSTED BY: Pred Liu

Is the notebook file attached? I didn't see it?

POSTED BY: Stephen Wandzura

This is a fantastic resource, many thanks.

I have a question concerning Datasets. When comparing associations to lists, I have found that the efficiency gain of using associations instead of lists can be very, very substantial.

Is there an analogous strong incentive for using Datasets instead of, for example, lists of lists or associations of associations?

Thanks,

Francisco

Posted 2 years ago

Thank you for sharing your Dataset Primer.

Initially, I used Datasets by trial and error. The Mathematica Reference Documentation is a great resource, but this post shows again that we may need a more extensive, hands-on tutorial.

Your Primer and numerous resources on StackExchange or some books helped me on my way with Datasets.

Cheers,

Dave

POSTED BY: Updating Name
Posted 2 years ago

In most cases, queries with keys using (key) names or slots will give the identical results.

However, in some cases the way Mathematica handles names or slots can lead to different results.

I ran into this example:

Query[All, All, Delete@"class"]@
 GroupBy[#class &]@ExampleData[{"Dataset", "Titanic"}]

Which drops/deletes the class column as so:

enter image description here

If we try this using the Slot notation we get a different result:

Query[All, All, Delete@#class &]@
 GroupBy[#class &]@ExampleData[{"Dataset", "Titanic"}]

enter image description here

I think there may be a sematic difference between the two notations. I suspect the name notation refers to the whole column (position), whereas the slot notation refers to the items in the Dataset under that name. In most cases, it will lead to equivalent result.

POSTED BY: Dave Middleton
Posted 2 years ago

Why is the notebook missing. Are you allowed to repost the notebook? Noticed no book forthcoming yet from him. Any other related primer?

POSTED BY: Andrew Meit
Posted 2 years ago

Seth, thank you for this wonderful primer.

As you point out,

We can also have a Dataset that just has a single Association inside. Mathematica presents the information with the keys and values displayed vertically.

Is this considered a feature or a bug? Super annoying to have computers doing random unexpected stuff. No option or setting to control this behavior. Or am I missing something?

Thanks!

Allan

POSTED BY: A Cooper

Dataset seems to be evolving rapidly. In V 12.0, it had no Options, in V 12.3 it has over a dozen. I suspect that would-be documenters are having trouble keeping up.

POSTED BY: Stephen Wandzura
Posted 2 years ago

Steven, thanks.

If there are indeed documenters out there reading this, here's another one. Dataset has a kind of cool behavior showing the item path just below the display, and it really needs to be paired with an option for PathDisplayFunction or equivalent.

Allan

POSTED BY: A Cooper
Posted 2 years ago

The notebook needs to be restored; please. Why is this taking so long to get restored??

POSTED BY: Andrew Meit
Posted 2 years ago

I agree! Where is the notebook?

POSTED BY: Douglas Kubler
Posted 2 years ago

So this post gets bumped yet again; but no notebook yet. Seth, please restore your notebook. Thank you. And yet also no book from you. Or is this post what is in the notebook and so no need for the notebook? Frustrated and confused.

POSTED BY: Andrew Meit

Thanks to @Seth Chandler, author of this post, the notebook is restored again. You can find it attached to the main post and to this message too.

Attachments:
POSTED BY: Ahmed Elbanna

Andrew, you can find the notebook currently attached to the main post.

POSTED BY: Ahmed Elbanna
Posted 2 years ago

Finally, thank you.

Something to consider, having a button to go to top of page and at end of the OP instead of scrolling a lot or also putting the attachment link also at end of the last post. When one is visually impaired navigation sometimes can be a problem.

POSTED BY: Andrew Meit

Hi. Did you get your book published?

Posted 4 months ago

It was announced here in this community; the link to the book is: https://www.wolfram.com/language/query-getting-information-from-data-with-the-wolfram-language/

POSTED BY: Dave Middleton
Posted 4 months ago

Seth is presenting a Wolfram-U webinar on the book.

POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract