Yes, please do. The source is not confidential. I am attaching a text file.
I've mulled this one over, and unless I am missing something, you are handling the "missing" values by not generating "missing" in the first place. Perhaps that is my fault, for constructing a "toy example" that is not well suited to illustration. Usually, I would be starting with a dataset from an outside source, "from the wild," where I do not have any control over the code or system that actually generates the sample data. Here, I will present a typical example I would encounter importing real-world data: In my sample data, I have two kinds of Missing ...
Missing["Unrecognized", "21.799999"]
Missing["Empty"]
To give a little context, these are being imported as a Dataset (in the Mathematica 10 sense) using Semantic Import[]. I can extract values from a column called "Days to Start a Business" that would typically contain a numeric value:
in := $CurrentData[All, "Days to Start a Business"] // Normal
out := {5, 4, Missing["Unrecognized",
"21.799999"], 18, 32, 13, 5, 10, 101, 15, 22, 60, 15, 101, 8, 17, 8,
Missing["Unrecognized", "16.5"], 84, 15, 27, 2, 14,
Missing["Unrecognized", "19.5"], 16, 9, 19, 97, 14,
Missing["Unrecognized", "26.1"],
Missing["Unrecognized", "75.5"], 32, 31, 8, 92, 29,
Missing["Unrecognized",
"4.5"], 8, 40, 11, 19, 11, 11, 13, 17, 36, 17,
Missing["Unrecognized",
"30.799999"], 19, 53, 35, 36, 7, 9, 5, 6, 12, 9,
Missing["Empty"], 11, 36, 38, 13, 33, 19, 32, 21,
Missing["Unrecognized", "7.5"], 34, 40,
Missing["Unrecognized", "7.5"], 90}
For those that are Missing["Unrecognized", "7.5"], one might choose to handle them by mapping some function onto the "unrecognized" values (eg. applying a rounding strategy to convert to the nearest integer) and replacing. I should note, I haven't yet explored an alternative possibility: specifying the type explicitly as a decimal in the semantic import statement. This function and its options are new to me. For the values that are simply Missing["Empty"] a different approach would be required. In practice, the empty or "missing" values might be replaced by either the Min or Max of the non-empty members of the set. But in explaining this in more detail, I am prompted to look into the documentation on "Missing Values" under SemanticImport, and will follow up if/when I find a solution.
|
|
you are handling the "missing" values by not generating "missing" in
the first place
Right.
a typical example I would encounter importing real-world data
If the source is not confidential, can you please attach it as it is for on-going experimentation on it?
|
|
This
In[37]:= Clear [assoc1, assoc2]
assoc1 = <|1 -> {1, a, aa}, 3 -> {3, c, cc}|>;
assoc2 = <|1 -> {1, 10, 100, 1000}, 5 -> {5, 50, 500, 5000}|>;
In[41]:= assocList = KeyUnion[{assoc1, assoc2}, <|1 -> {1}, 3 -> {3}, 5 -> {5}|>]
Out[41]= {<|1 -> {1, a, aa}, 3 -> {3, c, cc}, 5 -> {5}|>,
<|1 -> {1, 10, 100, 1000}, 3 -> {3}, 5 -> {5, 50, 500, 5000}|>}
In[44]:= Merge[assocList, Identity]
Out[44]= <|1 -> {{1, a, aa}, {1, 10, 100, 1000}}, 3 -> {{3, c, cc}, {3}}, 5 -> {{5}, {5, 50, 500, 5000}}|>
does not help? From the manual for KeyUnion :
The missing function can be an association:
the only thing yet to be done is creating the missing function association automatically from the all the present keys which seems possible but now I have to get the gifts out of the wrap ... Merry Christmas everyone!
|
|
the only thing yet to be done is creating the missing function
association automatically
In[19]:= Clear [assoc1, assoc2, assoc3, assoK]
assoc1 = <|1 -> {1, a, aa}, 3 -> {3, c, cc}|>;
assoc2 = <|1 -> {1, 10, 100, 1000}, 5 -> {5, 50, 500, 5000}|>;
assoc3 = <|2 -> {4, 8, 16}, 4 -> {16, 64}, 7 -> {"The", "whole", "nine", "yards"}|>;
assoK = Association @@ (Rule[#, {#}] & /@ Union[Flatten[Keys /@ {assoc1, assoc2, assoc3}]])
Out[23]= <|1 -> {1}, 2 -> {2}, 3 -> {3}, 4 -> {4}, 5 -> {5}, 7 -> {7}|>
In[24]:= Clear[assocList]
assocList = KeyUnion[{assoc1, assoc2, assoc3}, assoK]
Out[25]= {<|1 -> {1, a, aa}, 3 -> {3, c, cc}, 5 -> {5}, 2 -> {2}, 4 -> {4}, 7 -> {7}|>,
<|1 -> {1, 10, 100, 1000}, 3 -> {3}, 5 -> {5, 50, 500, 5000}, 2 -> {2}, 4 -> {4}, 7 -> {7}|>,
<|1 -> {1}, 3 -> {3}, 5 -> {5}, 2 -> {4, 8, 16}, 4 -> {16, 64}, 7 -> {"The", "whole", "nine", "yards"}|>}
In[26]:= Merge[assocList, Identity]
Out[26]= <|1 -> {{1, a, aa}, {1, 10, 100, 1000}, {1}},
3 -> {{3, c, cc}, {3}, {3}},
5 -> {{5}, {5, 50, 500, 5000}, {5}},
2 -> {{2}, {2}, {4, 8, 16}},
4 -> {{4}, {4}, {16, 64}},
7 -> {{7}, {7}, {"The", "whole", "nine", "yards"}}|>
of course, in defining assoK one could give a value more reminiscent about a missing value like the NULL of SQL ...
|
|
Reply to this discussion
in reply to
Group Abstract
|