Message Boards Message Boards

0
|
1627 Views
|
2 Replies
|
2 Total Likes
View groups...
Share
Share this post:

How to merge list with Select in Dataset?

Posted 1 year ago

Hello, I want to use setdelay to make a function.

dataset = Dataset[{
   <|"a" -> 1, "b" -> "x", "c" -> {1}|>,
   <|"a" -> 2, "b" -> "y", "c" -> {2, 3}|>,
   <|"a" -> 3, "b" -> "z", "c" -> {3}|>,
   <|"a" -> 4, "b" -> "x", "c" -> {4, 5}|>,
   <|"a" -> 5, "b" -> "y", "c" -> {5, 6, 7}|>,
   <|"a" -> 6, "b" -> "z", "c" -> {}|>}]
dataset1 = {"a", "b", "c"}
dataset2[test_] := dataset[Select[#dataset1[[test]] > 3 &]]
dataset2[1]

The result is enter image description here It seems that I can't use list when there is # symbol. How to resolve this problem?

POSTED BY: Zhenyu Zeng
2 Replies
Posted 1 year ago

When doing dataset queries, imagine each subsequent query function as being applied to elements of the dataset at deeper and deeper levels. Since you're just using one selector, it's being applied to each "row" of the dataset. The next thing to understand is that the #dataset notation is just a slightly fancier slot expression that you can use in functions. A plain # means the first argument passed to the function (typically used when there is only one argument). #n (where n is an actual integer) means the nth argument. ## means all of the arguments. And finally, #hello (where hello is a symbol) means the part of the first argument named hello (where it's assumed that the argument is an association).

Putting that together, we can do this

dataset[Select[#a > 3 &]]

To find all "rows" of your dataset (each row is an association) such that the value in that row associated with the key named a is greater than 3. But you could also do any of these to get the same result:

dataset[Select[#[["a"]] > 3 &]] (* Part works with keys *)
dataset[Select[#["a"] > 3 &]] (* Associations can look like functions *)
dataset[Select[#[[1]] > 3 &]] (* Associations are just lists, and so can be accessed by index *)

Now, based on your attempted definition of dataset2,

dataset2[test_] := dataset[Select[#dataset1[[test]] > 3 &]]

it looks like you're trying to select rows based on the key determined by using test to find the desired column header in dataset1. But what does your function really mean? Well, first off, #dataset means "the value associated with the key named dataset1". But of course there are no keys named dataset1 in dataset, so nothing will match and everything after that will fail.

Instead, you could try this:

dataset2[test_] := dataset[Select[#[dataset1[[test]]] > 3 &]]

In this definition, # just refers to the whole row (which is an association), dataset1[[test]] will fetch the value from dataset1 corresponding to test (which will need to be an integer between 1 and 3), and this value will be used as a key to look into each row from dataset.

You could have also just done this:

dataset2[test_] := dataset[Select[#[[test]] > 3 &]]

No need for the indirection of looking at dataset1, since the values/keys are in the same order. In this case, we're just asking for the indexed part of each row specified by test.

I'm assuming that you're just experimenting, because your example looks very similar to examples from the documentation. If you wanted to product-ize this, I'd have suggestions for making it more robust and maintainable.

POSTED BY: Eric Rimbey
Posted 1 year ago

Thanks a lot. Your answer is very understandable. I am too busy these days so reply late.

POSTED BY: Zhenyu Zeng
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract