Group Abstract Group Abstract

Message Boards Message Boards

0
|
3.8K Views
|
10 Replies
|
6 Total Likes
View groups...
Share
Share this post:

How to understand the three parameters in Dataset

Posted 1 year ago

Hello,

A dataset is

dataset = Dataset[{
   <|"a" -> 1, "b" -> "x", "c" -> {1}|>,
   <|"a" -> 2, "b" -> "y", "c" -> {2, 3}|>,
   <|"a" -> 3, "b" -> "z", "c" -> {3}|>,
   <|"a" -> 4, "b" -> "x", "c" -> {4, 5}|>,
   <|"a" -> 5, "b" -> "y", "c" -> {5, 6, 7}|>,
   <|"a" -> 6, "b" -> "z", "c" -> {}|>}]

Apply a function f to every element in every row:

dataset[All, All, f]

Partition the dataset based on a column, applying further operators to each group:

dataset[GroupBy["b"], Catenate, "c"]

If Dataset can accept three parameters, the first one is the operation on the row, how do we understand the second and third ones?

And, why does this not work

dataset[GroupBy["b"], "c"]
POSTED BY: Zhenyu Zeng
10 Replies
Posted 1 year ago
POSTED BY: Eric Rimbey
Posted 1 year ago

Thanks a lot. In the cases of data[f], what kind of f function can operate on dataset? May you give me an example? Can you give another example with four parameters or five parameters?

POSTED BY: Zhenyu Zeng
Posted 1 year ago

In the cases of data[f], what kind of f function can operate on dataset?

It can be any kind. It just depends on what you're trying to analyze. Let's use a very general dataset:

dataset = 
  Dataset[
    <|"b0a1140" -> <|"a" -> 1, "b" -> "x", "c" -> {1}|>, 
      "a250c" -> <|"a" -> 2, "b" -> "y", "c" -> {2, 3}|>, 
      "d74df75" -> <|"a" -> 3, "b" -> "z", "c" -> {3}|>, 
      "f93bdfe2" -> <|"a" -> 4, "b" -> "x", "c" -> {4, 5}|>, 
      "a78710f" -> <|"a" -> 5, "b" -> "y", "c" -> {5, 6, 7}|>, 
      "976c" -> <|"a" -> 6, "b" -> "z", "c" -> {}|>|>]

Maybe you just want to know how many records there are:

dataset[Length]
(* 6 *)

Maybe you're interested the keys for some reason:

dataset[Keys]

enter image description here

Maybe you want to filter on the keys:

dataset[KeySelect[StringMatchQ["a*"]]]

enter image description here

POSTED BY: Eric Rimbey
Posted 1 year ago

Too difficult for me to understand. May you teach me what is the meaning of this first

dataset[All, "c", 1]
POSTED BY: Zhenyu Zeng

In this case, the 1 is treated as an index or part so the first element of each c is extracted. Try

dataset[All, "c", 2]

I highly recommend reading Seth Chandler's book Query: Getting Information from Data with the Wolfram Language. A free notebook edition is available for download.

POSTED BY: Rohit Namjoshi
Posted 1 year ago

You should really just play around with it. Try this:

dataset[z, y, x, w]

Notice where each operator is applied. If an operator can be interpreted as a part specification (e.g. All or an integer), then it is applied that way. If an operator can be interpreted as a filter, then it is applied that way (and this is a "descending" operator). If an operator can be interpreted as an aggregator, then it is applied that way after the lower level operators are applied (and this is an "ascending" operator). And there are a few other special forms. But there is no point duplicating the documentation here. You really need to just wrestle with it for awhile.

POSTED BY: Eric Rimbey
Posted 1 year ago

I have tried this

dataset[z, y, x, w]

Very difficult to understand this.

POSTED BY: Zhenyu Zeng
Posted 1 year ago
POSTED BY: Eric Rimbey
Posted 1 year ago

I tried and found Normal can be removed in

Dataset[GroupBy[#["b"] &][Normal[dataset]]]
POSTED BY: Zhenyu Zeng
Posted 1 year ago

I think in dataset[GroupBy["b"], "c"], GroupBy is working with rows and c is working with coloumn. What it the meaning of

There is no key "c" available to the next level of the query. You need to first extract the values from the lists.

And why does all in dataset[GroupBy["b"], All, "c"] can extract the values from the list? What is the meaning of the values here.

POSTED BY: Zhenyu Zeng
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard