When doing dataset queries, imagine each subsequent query function as being applied to elements of the dataset at deeper and deeper levels. Since you're just using one selector, it's being applied to each "row" of the dataset. The next thing to understand is that the #dataset
notation is just a slightly fancier slot expression that you can use in functions. A plain # means the first argument passed to the function (typically used when there is only one argument). #n (where n is an actual integer) means the nth argument. ## means all of the arguments. And finally, #hello (where hello is a symbol) means the part of the first argument named hello (where it's assumed that the argument is an association).
Putting that together, we can do this
dataset[Select[#a > 3 &]]
To find all "rows" of your dataset (each row is an association) such that the value in that row associated with the key named a is greater than 3. But you could also do any of these to get the same result:
dataset[Select[#[["a"]] > 3 &]] (* Part works with keys *)
dataset[Select[#["a"] > 3 &]] (* Associations can look like functions *)
dataset[Select[#[[1]] > 3 &]] (* Associations are just lists, and so can be accessed by index *)
Now, based on your attempted definition of dataset2,
dataset2[test_] := dataset[Select[#dataset1[[test]] > 3 &]]
it looks like you're trying to select rows based on the key determined by using test to find the desired column header in dataset1. But what does your function really mean? Well, first off, #dataset means "the value associated with the key named dataset1". But of course there are no keys named dataset1 in dataset, so nothing will match and everything after that will fail.
Instead, you could try this:
dataset2[test_] := dataset[Select[#[dataset1[[test]]] > 3 &]]
In this definition, # just refers to the whole row (which is an association), dataset1[[test]] will fetch the value from dataset1 corresponding to test (which will need to be an integer between 1 and 3), and this value will be used as a key to look into each row from dataset.
You could have also just done this:
dataset2[test_] := dataset[Select[#[[test]] > 3 &]]
No need for the indirection of looking at dataset1, since the values/keys are in the same order. In this case, we're just asking for the indexed part of each row specified by test.
I'm assuming that you're just experimenting, because your example looks very similar to examples from the documentation. If you wanted to product-ize this, I'd have suggestions for making it more robust and maintainable.