Message Boards Message Boards

4
|
8407 Views
|
11 Replies
|
12 Total Likes
View groups...
Share
Share this post:
GROUPS:

How to query a Dataset based on key values?

Posted 10 years ago

I have been working with a notebook using a hierarchically structured Dataset. I am looking for the correct syntax to query the Dataset to return all records for which a lower level Key takes on a particular value. Essentially, I want a new Dataset which is a subset of the master, but contains only those records agreeing with the test criteria. Hopefully, this could be applied at any level, but the example in the pseudo-code below wishes to extract the key-value pairs at the top level, for which the criteria at the lower level is true.

Any and all help is very welcome!

(* structured associations *)
as = Association[
  {
   x -> Association[{a -> 1, b -> 2}],
   y -> Association[{a -> 2, b -> 2}],
   z -> Association[{a -> 2, b -> 1}]
   }
  ]

(* a dataset *)
ds = Dataset[as];

(* how to construct a query *)
Query[a == 2 ?? ?]@ds

(* to get the equivalent of this? *)
Association[
 {
  y -> Association[{a -> 2, b -> 2}],
  z -> Association[{a -> 2, b -> 1}]
  }
 ]
POSTED BY: David Keith
11 Replies

Dear David,

yes, I understand. I was not sure how to achieve this with Query. The function I defined is a bit cryptic and involved some trial-and-error development :-)

It is quite possible to understand what the function is doing if you start at the innermost command and then work your way outwards. But you are right that it would be preferable to use the built in functionality. Also I was not quite sure whether I understood your problem fully.

Perhaps we'll find a better way...

Best wishes,

Marco

POSTED BY: Marco Thiel

Try this:

 In[1]:= as = 
 Association[{x -> Association[{a -> 1, b -> 2}], 
   y -> Association[{a -> 2, b -> 2}], 
   z -> Association[{a -> 2, b -> 1}]}]

Out[1]= <|x -> <|a -> 1, b -> 2|>, y -> <|a -> 2, b -> 2|>, 
 z -> <|a -> 2, b -> 1|>|>

In[2]:= ds = Dataset[as]

Out[2]= Dataset[
Association[
 x -> Association[a -> 1, b -> 2], y -> Association[a -> 2, b -> 2], 
  z -> Association[a -> 2, b -> 1]], 
TypeSystem`Assoc[TypeSystem`AnyType, 
TypeSystem`Assoc[TypeSystem`AnyType, 
TypeSystem`Atom[Integer], 2], 3], 
Association["ID" -> 96684043424021]]

In[31]:= ds[Select[#[a] == 2 &]]

Out[31]= Dataset[
Association[
 y -> Association[a -> 2, b -> 2], z -> Association[a -> 2, b -> 1]], 
TypeSystem`Assoc[TypeSystem`AnyType, 
TypeSystem`Assoc[TypeSystem`AnyType, 
TypeSystem`Atom[Integer], 2], TypeSystem`AnyLength], 
Association["Origin" -> HoldComplete[
Query[
Select[#[a] == 2& ]][
Dataset`DatasetHandle[96684043424021]]], "ID" -> 246303524148496]]
POSTED BY: Jason Grigsby

Yes Jason is spot on here. You can see it by running Normal

ds[Select[#[a] == 2 &]] // Normal
Out[] = <|y -> <|a -> 2, b -> 2|>, z -> <|a -> 2, b -> 1|>|> 

on his last line.

POSTED BY: Vitaliy Kaurov

The key to this is that the Select statement is acting at the first level, you are selecting keys at the first level based on a criterion within them, thus

ds[Select[...]]

after that, you want to pick based on a key inside the value, thus the

ds[Select[#[a]==2&]]

With #[a]==2&, the # refers to the value associated with the key in question (itself an Association) and the [a] is there to point to the value associated with the key of a.

POSTED BY: Jason Grigsby
Posted 10 years ago

Yes! That seems to work. Thanks Jason.

In[12]:= out1 = ds[Select[#[a] == 2 &]]; out1 // Normal

Out[12]= <|y -> <|a -> 2, b -> 2|>, z -> <|a -> 2, b -> 1|>|>

In[13]:= out2 = ds[Select[#[a] == 2 && #[b] == 1 &]]; out2 // Normal

Out[13]= <|z -> <|a -> 2, b -> 1|>|>

Now, suppose the key I was trying to select for is at a lower level in the dataset. Can this be generalized to generate a dataset which is a top-level subset, but based on key values lower than level 1, or even a boolean combination of key values at different levels?

POSTED BY: David Keith

Yes, if you are still picking Keys at top level, the Select statement stays on top. You just work your way down to the test you want to provide:

In[1]:= as = <|x -> <|a -> 1, b -> <|c -> 1, d -> 2|>|>, 
  y -> <|a -> 1, b -> <|c -> 2, d -> 2|>|>|>

Out[1]= <|x -> <|a -> 1, b -> <|c -> 1, d -> 2|>|>, 
 y -> <|a -> 1, b -> <|c -> 2, d -> 2|>|>|>

In[2]:= ds = Dataset[as]

Out[2]= Dataset[
Association[
 x -> Association[a -> 1, b -> Association[c -> 1, d -> 2]], 
  y -> Association[a -> 1, b -> Association[c -> 2, d -> 2]]], 
TypeSystem`Assoc[TypeSystem`AnyType, 
TypeSystem`Assoc[TypeSystem`AnyType, TypeSystem`AnyType, 2], 2], 
Association["ID" -> 168461536877204]]

In[3]:= ds[Select[#[b, c] == 2 &]]

Out[3]= Dataset[
Association[
 y -> Association[a -> 1, b -> Association[c -> 2, d -> 2]]], 
TypeSystem`Assoc[TypeSystem`AnyType, 
TypeSystem`Assoc[TypeSystem`AnyType, TypeSystem`AnyType, 2], 
  TypeSystem`AnyLength], 
Association["Origin" -> HoldComplete[
Query[
Select[#[b, c] == 2& ]][
Dataset`DatasetHandle[168461536877204]]], "ID" -> 216114199026345]]
POSTED BY: Jason Grigsby
Posted 10 years ago

EXCELLENT!

I think this is going to make for much more readable notebooks. Thank you, Jason.

POSTED BY: David Keith
Posted 10 years ago

This is just what I needed:

In[43]:= as2 = <|
   x -> <|a -> 1, b -> <|c -> 1, d -> <|e -> 4, f -> 5|>|>|>, 
   y -> <|a -> 1, b -> <|c -> 2, d -> <|e -> 4, f -> 6|>|>|>, 
   z -> <|a -> 2, b -> <|c -> 3, d -> <|e -> 5, f -> 6|>|>|>|>;

In[44]:= ds2 = Dataset[as2];


(* first level key only *)ds2[Select[#[a] == 2 &]] // Normal

Out[45]= <|z -> <|a -> 2, b -> <|c -> 3, d -> <|e -> 5, f -> 6|>|>|>|>


(* both first and second level keys *)
ds2[Select[#[a] == 1 && #[b, c] == 2 &]] // Normal

Out[46]= <|y -> <|a -> 1, b -> <|c -> 2, d -> <|e -> 4, f -> 6|>|>|>|>
POSTED BY: David Keith
Posted 10 years ago

Yes, Marco. Thanks. Your functions does just what I wanted. To be clear, though. I was just assuming Query was the right approach. I must admit I am finding it difficult to understand the syntax and intended usage of the various operators and functions which are able to act on associations and datasets. I will study your function as an example.

I have often created and used data structures in notebooks, where the meaning of an entry in the structure was defined by its location in a structured hierarchy of lists. I have been hoping that using associations and datasets with the key->value relationships would allow me to write more self-documenting code. In this particular case, I have been importing a binary structure and storing it in such a dataset. I then want to write functions which can extract information by keys and values. Very little of what I try works, and I find the documentation so obscure that I have been trying to learn the best usages by experimentation.

Your function certainly works, but to be honest I was hoping for a more readable built-in capability in the Wolfram language, and one that would permit convenient boolean combinations for the criteria as well.

Best regards, David

POSTED BY: David Keith

Dear David,

just for me to understand: are you looking for a search pattern in Query that does essentially this?

dsQuery[ds_, key_, value_] := Normal[KeyTake[ds, Keys@Select[ds[[All, Key[key]]], # == value &]]]

If you use the example that you gave above:

as = Association[{x -> Association[{a -> 1, b -> 2}], y -> Association[{a -> 2, b -> 2}], z -> Association[{a -> 2, b -> 1}]}];
ds = Dataset[as];

the command

dsQuery[ds, a, 2]

i.e. in ds look for the entries for which the sub-key "a" has the value two, gives:

<|y -> <|a -> 2, b -> 2|>, z -> <|a -> 2, b -> 1|>|> 

which might be what you are looking for.

Is this what you want to achieve with the built in Query function?

Cheers,

M.

POSTED BY: Marco Thiel
Posted 10 years ago

Hope springs eternal! Anyone?

POSTED BY: David Keith
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract