Message Boards Message Boards

1
|
13992 Views
|
6 Replies
|
9 Total Likes
View groups...
Share
Share this post:

Missing["KeyAbsent", somekey] (How to extract somekey)?

Posted 10 years ago

Hello, I am using the KeyUnion function to combine a couple of lists of lists. In some cases, a key exists in one list of sublists, but not in the other.

Here is an example:

assoc1 = <|1 -> {1, a, aa},3 -> {3, c, cc}|>

assoc2 = <|1 -> {1, 10, 100, 1000}, 5 -> {5, 50, 500, 5000}|>

I do a KeyUnion....

assocList = KeyUnion[{assoc1, assoc2}]

...to get something like this...

assocList = { <|1 -> {1, a, aa},3 -> {3, c, cc}, 5 -> Missing["KeyAbsent", 5]|>, <|1 -> {1, 10, 100, 1000},  
     3 -> Missing["KeyAbsent", 3], 5 -> {5, 50, 500, 5000}|>}

Now I want to merge the two lists, based on the key, replacing the missing values by some Key, Value pair that includes "dummy" value such as 5->{-9999}. The following piece of code, for example, works nicely for an individual replacement (eg. Key "3"):

Merge[{assocList[[1]], assocList[[2]]}, Identity]
merged = Values[%] /. _Missing -> {3, -9999};

To get...

{{{1, a, aa}, {1, 10, 100, 1000}}, {{3, c, cc}, {3, -9999}},...}

But you also get...

 {...,{{3,-9999}, {5, 50, 500, 5000}},...}

Which is not what we want. (The ultimate goal is to make a Union between the two lists associated with each key, where the first item in the list is also the same as the key.)

What I am looking for is a way to make a {Key, Value} pair replacement for each missing key. Is there a way to Is there a way to pick out the "index" part of the Missing[KeyAbsent, index] so that I can write a replacement rule that iterates over the keys?

POSTED BY: Caitlin Ramsey
6 Replies
Posted 10 years ago

Yes, please do. The source is not confidential. I am attaching a text file.

Attachments:
POSTED BY: Caitlin Ramsey

Not much to do with this. I loaded the file down to the local machine and could open it with MS Excel but

In[15]:= dS = SemanticImport[FileNameJoin[{NotebookDirectory[], "others", "ramsey", "timeseries_fy14.csv"}]]
Out[15]= $Failed

In[16]:= FileNameJoin[{NotebookDirectory[], "others", "ramsey", "timeseries_fy14.csv"}]
Out[16]= "N:\\Udo\\Abt_N\\others\\ramsey\\timeseries_fy14.csv"

In[22]:= FileNames[FileNameJoin[{NotebookDirectory[], "others", "ramsey", "*"}]]
Out[22]= {"N:\\Udo\\Abt_N\\others\\ramsey\\Example_Missing.txt", \
          "N:\\Udo\\Abt_N\\others\\ramsey\\timeseries_fy14.csv"}

In[18]:= sales = SemanticImport["ExampleData/RetailSales.tsv"]
Out[18]= $Failed

SemanticImport has a denial of service, even with the example from the help (RetailSales.tsv). Is the user supposed the preprocess the file before SemanticImport?

In[25]:= Clear[dS]
dS = SemanticImport[FileNameJoin[{NotebookDirectory[], "others", "ramsey", "timeseries_fy14.csv"}],
  Automatic, "Rows", "CharacterEncoding" -> "Unicode", HeaderLines -> 1]
Out[26]= $Failed
POSTED BY: Udo Krause
Posted 10 years ago

I've mulled this one over, and unless I am missing something, you are handling the "missing" values by not generating "missing" in the first place. Perhaps that is my fault, for constructing a "toy example" that is not well suited to illustration. Usually, I would be starting with a dataset from an outside source, "from the wild," where I do not have any control over the code or system that actually generates the sample data.

Here, I will present a typical example I would encounter importing real-world data:

In my sample data, I have two kinds of Missing ...

Missing["Unrecognized", "21.799999"]

Missing["Empty"]

To give a little context, these are being imported as a Dataset (in the Mathematica 10 sense) using Semantic Import[]. I can extract values from a column called "Days to Start a Business" that would typically contain a numeric value:

in := $CurrentData[All, "Days to Start a Business"] // Normal
out := {5, 4, Missing["Unrecognized", 
  "21.799999"], 18, 32, 13, 5, 10, 101, 15, 22, 60, 15, 101, 8, 17, 8,
  Missing["Unrecognized", "16.5"], 84, 15, 27, 2, 14, 
 Missing["Unrecognized", "19.5"], 16, 9, 19, 97, 14, 
 Missing["Unrecognized", "26.1"], 
 Missing["Unrecognized", "75.5"], 32, 31, 8, 92, 29, 
 Missing["Unrecognized", 
  "4.5"], 8, 40, 11, 19, 11, 11, 13, 17, 36, 17, 
 Missing["Unrecognized", 
  "30.799999"], 19, 53, 35, 36, 7, 9, 5, 6, 12, 9, 
 Missing["Empty"], 11, 36, 38, 13, 33, 19, 32, 21, 
 Missing["Unrecognized", "7.5"], 34, 40, 
 Missing["Unrecognized", "7.5"], 90}

For those that are Missing["Unrecognized", "7.5"], one might choose to handle them by mapping some function onto the "unrecognized" values (eg. applying a rounding strategy to convert to the nearest integer) and replacing. I should note, I haven't yet explored an alternative possibility: specifying the type explicitly as a decimal in the semantic import statement. This function and its options are new to me.

For the values that are simply Missing["Empty"] a different approach would be required. In practice, the empty or "missing" values might be replaced by either the Min or Max of the non-empty members of the set.

But in explaining this in more detail, I am prompted to look into the documentation on "Missing Values" under SemanticImport, and will follow up if/when I find a solution.

POSTED BY: Caitlin Ramsey

you are handling the "missing" values by not generating "missing" in the first place

Right.

a typical example I would encounter importing real-world data

If the source is not confidential, can you please attach it as it is for on-going experimentation on it?

POSTED BY: Udo Krause

This

    In[37]:= Clear [assoc1, assoc2]
    assoc1 = <|1 -> {1, a, aa}, 3 -> {3, c, cc}|>;
    assoc2 = <|1 -> {1, 10, 100, 1000}, 5 -> {5, 50, 500, 5000}|>;

    In[41]:= assocList = KeyUnion[{assoc1, assoc2}, <|1 -> {1}, 3 -> {3}, 5 -> {5}|>]
    Out[41]= {<|1 -> {1, a, aa}, 3 -> {3, c, cc}, 5 -> {5}|>,
              <|1 -> {1, 10, 100, 1000}, 3 -> {3}, 5 -> {5, 50, 500, 5000}|>}

    In[44]:= Merge[assocList, Identity]
    Out[44]= <|1 -> {{1, a, aa}, {1, 10, 100, 1000}}, 3 -> {{3, c, cc}, {3}}, 5 -> {{5}, {5, 50, 500, 5000}}|>

does not help? From the manual for KeyUnion:

The missing function can be an association:

the only thing yet to be done is creating the missing function association automatically from the all the present keys which seems possible but now I have to get the gifts out of the wrap ... Merry Christmas everyone!

POSTED BY: Udo Krause

the only thing yet to be done is creating the missing function association automatically

In[19]:= Clear [assoc1, assoc2, assoc3, assoK]
assoc1 = <|1 -> {1, a, aa}, 3 -> {3, c, cc}|>;
assoc2 = <|1 -> {1, 10, 100, 1000}, 5 -> {5, 50, 500, 5000}|>;
assoc3 = <|2 -> {4, 8, 16}, 4 -> {16, 64}, 7 -> {"The", "whole", "nine", "yards"}|>;
assoK = Association @@ (Rule[#, {#}] & /@ Union[Flatten[Keys /@ {assoc1, assoc2, assoc3}]])
Out[23]= <|1 -> {1}, 2 -> {2}, 3 -> {3}, 4 -> {4}, 5 -> {5}, 7 -> {7}|>

In[24]:= Clear[assocList]
assocList = KeyUnion[{assoc1, assoc2, assoc3}, assoK]
Out[25]= {<|1 -> {1, a, aa}, 3 -> {3, c, cc}, 5 -> {5}, 2 -> {2}, 4 -> {4}, 7 -> {7}|>, 
          <|1 -> {1, 10, 100, 1000}, 3 -> {3}, 5 -> {5, 50, 500, 5000}, 2 -> {2}, 4 -> {4}, 7 -> {7}|>, 
          <|1 -> {1}, 3 -> {3}, 5 -> {5}, 2 -> {4, 8, 16}, 4 -> {16, 64}, 7 -> {"The", "whole", "nine", "yards"}|>}

In[26]:= Merge[assocList, Identity]
Out[26]= <|1 -> {{1, a, aa}, {1, 10, 100, 1000}, {1}}, 
           3 -> {{3, c, cc}, {3}, {3}}, 
           5 -> {{5}, {5, 50, 500, 5000}, {5}}, 
           2 -> {{2}, {2}, {4, 8, 16}}, 
           4 -> {{4}, {4}, {16, 64}}, 
           7 -> {{7}, {7}, {"The", "whole", "nine", "yards"}}|>

of course, in defining assoK one could give a value more reminiscent about a missing value like the NULL of SQL ...

POSTED BY: Udo Krause
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract