4
|
8407 Views
|
11 Replies
|
12 Total Likes
View groups...
Share
GROUPS:

# How to query a Dataset based on key values?

Posted 10 years ago
 I have been working with a notebook using a hierarchically structured Dataset. I am looking for the correct syntax to query the Dataset to return all records for which a lower level Key takes on a particular value. Essentially, I want a new Dataset which is a subset of the master, but contains only those records agreeing with the test criteria. Hopefully, this could be applied at any level, but the example in the pseudo-code below wishes to extract the key-value pairs at the top level, for which the criteria at the lower level is true. Any and all help is very welcome! (* structured associations *) as = Association[ { x -> Association[{a -> 1, b -> 2}], y -> Association[{a -> 2, b -> 2}], z -> Association[{a -> 2, b -> 1}] } ] (* a dataset *) ds = Dataset[as]; (* how to construct a query *) Query[a == 2 ?? ?]@ds (* to get the equivalent of this? *) Association[ { y -> Association[{a -> 2, b -> 2}], z -> Association[{a -> 2, b -> 1}] } ] 
11 Replies
Sort By:
Posted 10 years ago
 Dear David,yes, I understand. I was not sure how to achieve this with Query. The function I defined is a bit cryptic and involved some trial-and-error development :-)It is quite possible to understand what the function is doing if you start at the innermost command and then work your way outwards. But you are right that it would be preferable to use the built in functionality. Also I was not quite sure whether I understood your problem fully. Perhaps we'll find a better way...Best wishes,Marco
Posted 10 years ago
 Try this:  In[1]:= as = Association[{x -> Association[{a -> 1, b -> 2}], y -> Association[{a -> 2, b -> 2}], z -> Association[{a -> 2, b -> 1}]}] Out[1]= <|x -> <|a -> 1, b -> 2|>, y -> <|a -> 2, b -> 2|>, z -> <|a -> 2, b -> 1|>|> In[2]:= ds = Dataset[as] Out[2]= Dataset[ Association[ x -> Association[a -> 1, b -> 2], y -> Association[a -> 2, b -> 2], z -> Association[a -> 2, b -> 1]], TypeSystemAssoc[TypeSystemAnyType, TypeSystemAssoc[TypeSystemAnyType, TypeSystemAtom[Integer], 2], 3], Association["ID" -> 96684043424021]] In[31]:= ds[Select[#[a] == 2 &]] Out[31]= Dataset[ Association[ y -> Association[a -> 2, b -> 2], z -> Association[a -> 2, b -> 1]], TypeSystemAssoc[TypeSystemAnyType, TypeSystemAssoc[TypeSystemAnyType, TypeSystemAtom[Integer], 2], TypeSystemAnyLength], Association["Origin" -> HoldComplete[ Query[ Select[#[a] == 2& ]][ DatasetDatasetHandle[96684043424021]]], "ID" -> 246303524148496]] 
Posted 10 years ago
 Yes Jason is spot on here. You can see it by running Normal ds[Select[#[a] == 2 &]] // Normal Out[] = <|y -> <|a -> 2, b -> 2|>, z -> <|a -> 2, b -> 1|>|> on his last line.
Posted 10 years ago
 The key to this is that the Select statement is acting at the first level, you are selecting keys at the first level based on a criterion within them, thus ds[Select[...]] after that, you want to pick based on a key inside the value, thus the ds[Select[#[a]==2&]] With #[a]==2&, the # refers to the value associated with the key in question (itself an Association) and the [a] is there to point to the value associated with the key of a.
Posted 10 years ago
 Yes! That seems to work. Thanks Jason. In[12]:= out1 = ds[Select[#[a] == 2 &]]; out1 // Normal Out[12]= <|y -> <|a -> 2, b -> 2|>, z -> <|a -> 2, b -> 1|>|> In[13]:= out2 = ds[Select[#[a] == 2 && #[b] == 1 &]]; out2 // Normal Out[13]= <|z -> <|a -> 2, b -> 1|>|> Now, suppose the key I was trying to select for is at a lower level in the dataset. Can this be generalized to generate a dataset which is a top-level subset, but based on key values lower than level 1, or even a boolean combination of key values at different levels?
Posted 10 years ago
 Yes, if you are still picking Keys at top level, the Select statement stays on top. You just work your way down to the test you want to provide: In[1]:= as = <|x -> <|a -> 1, b -> <|c -> 1, d -> 2|>|>, y -> <|a -> 1, b -> <|c -> 2, d -> 2|>|>|> Out[1]= <|x -> <|a -> 1, b -> <|c -> 1, d -> 2|>|>, y -> <|a -> 1, b -> <|c -> 2, d -> 2|>|>|> In[2]:= ds = Dataset[as] Out[2]= Dataset[ Association[ x -> Association[a -> 1, b -> Association[c -> 1, d -> 2]], y -> Association[a -> 1, b -> Association[c -> 2, d -> 2]]], TypeSystemAssoc[TypeSystemAnyType, TypeSystemAssoc[TypeSystemAnyType, TypeSystemAnyType, 2], 2], Association["ID" -> 168461536877204]] In[3]:= ds[Select[#[b, c] == 2 &]] Out[3]= Dataset[ Association[ y -> Association[a -> 1, b -> Association[c -> 2, d -> 2]]], TypeSystemAssoc[TypeSystemAnyType, TypeSystemAssoc[TypeSystemAnyType, TypeSystemAnyType, 2], TypeSystemAnyLength], Association["Origin" -> HoldComplete[ Query[ Select[#[b, c] == 2& ]][ DatasetDatasetHandle[168461536877204]]], "ID" -> 216114199026345]] 
Posted 10 years ago
 EXCELLENT!I think this is going to make for much more readable notebooks. Thank you, Jason.
Posted 10 years ago
 This is just what I needed: In[43]:= as2 = <| x -> <|a -> 1, b -> <|c -> 1, d -> <|e -> 4, f -> 5|>|>|>, y -> <|a -> 1, b -> <|c -> 2, d -> <|e -> 4, f -> 6|>|>|>, z -> <|a -> 2, b -> <|c -> 3, d -> <|e -> 5, f -> 6|>|>|>|>; In[44]:= ds2 = Dataset[as2]; (* first level key only *)ds2[Select[#[a] == 2 &]] // Normal Out[45]= <|z -> <|a -> 2, b -> <|c -> 3, d -> <|e -> 5, f -> 6|>|>|>|> (* both first and second level keys *) ds2[Select[#[a] == 1 && #[b, c] == 2 &]] // Normal Out[46]= <|y -> <|a -> 1, b -> <|c -> 2, d -> <|e -> 4, f -> 6|>|>|>|> 
Posted 10 years ago
 Yes, Marco. Thanks. Your functions does just what I wanted. To be clear, though. I was just assuming Query was the right approach. I must admit I am finding it difficult to understand the syntax and intended usage of the various operators and functions which are able to act on associations and datasets. I will study your function as an example. I have often created and used data structures in notebooks, where the meaning of an entry in the structure was defined by its location in a structured hierarchy of lists. I have been hoping that using associations and datasets with the key->value relationships would allow me to write more self-documenting code. In this particular case, I have been importing a binary structure and storing it in such a dataset. I then want to write functions which can extract information by keys and values. Very little of what I try works, and I find the documentation so obscure that I have been trying to learn the best usages by experimentation.Your function certainly works, but to be honest I was hoping for a more readable built-in capability in the Wolfram language, and one that would permit convenient boolean combinations for the criteria as well.Best regards, David
Posted 10 years ago
 Dear David,just for me to understand: are you looking for a search pattern in Query that does essentially this? dsQuery[ds_, key_, value_] := Normal[KeyTake[ds, Keys@Select[ds[[All, Key[key]]], # == value &]]] If you use the example that you gave above: as = Association[{x -> Association[{a -> 1, b -> 2}], y -> Association[{a -> 2, b -> 2}], z -> Association[{a -> 2, b -> 1}]}]; ds = Dataset[as]; the command dsQuery[ds, a, 2] i.e. in ds look for the entries for which the sub-key "a" has the value two, gives: <|y -> <|a -> 2, b -> 2|>, z -> <|a -> 2, b -> 1|>|> which might be what you are looking for. Is this what you want to achieve with the built in Query function?Cheers,M.
Posted 10 years ago
 Hope springs eternal! Anyone?