0
|
8170 Views
|
4 Replies
|
0 Total Likes
View groups...
Share
GROUPS:

# Best way to process data using SemanticImport?

Posted 9 years ago
 Hi, I am trying to import some large datasets with mixed types using SemanticImport. I'd like to normalise many of the columns that contain numeric data, and I'm wondering what the best way to do this is? I understand that I can apply a specific function to columns, e.g.: dataset[All, {"C2" -> f, "C3" -> f, "C4" -> f}] However the data has hundreds of columns (of which only a large range need transforming), so manually specifying the transformation function for each column is very tedious. I could generate the transformation list first and then apply that. But I'm wondering if there is an easier way to specify a range of columns to apply a single function to, rather than having to specify each individually? I also tried using MapAt however while: dataset[All,MapAt[f,{2}]] applies f to each of the elements in the second column, but dataset[All,MapAt[f,{{2},{3}}]] fails. Thanks in advance for any help. Jon
4 Replies
Sort By:
Posted 9 years ago
 Hello Jon,I found one way of doing this (as usual Mathematica language is so rich that there might be other solutions): first I configure my data set nrows = 3; ncolumns = 10; Then the data values are generated: (data = Table[ n + 100 (m - 1), {m, 1, nrows}, {n, 1, ncolumns}]) // TableForm Next the keys are defined: keys = Array["k" <> ToString[#] &, ncolumns] Here the dataset is build: dataset = Dataset[Map[Association[Thread[keys -> #]] &, data]] Here is the way how to define the columns that need transformation. You can use Range or define them explicitly col2trf = Join[Range[1, 4], {6}, {8, 9}] The selected columns will be transformed using f1, the remaining columns are treated with f2 which should just leave the values as they are f10 = Map[If[MemberQ[col2trf, #], f1, f2] &, Range[ncolumns]] f2 can be defined in the following way: f2[x_] := x Finally f10 is applied to the dataset (of course you have to define f1 according to your transformation first) dataset[All, Thread[keys -> f10]] Regards,Michael
Posted 9 years ago
 Is this what you're aiming for? dataset[All, MapAt[f, Range[2, 5]]]
Posted 9 years ago
 Hi Jesse and Michael, Thanks for your suggestions. I tried Jesse's suggestion, but I get "Failure: Part {All, 2, 3, 4, 5} ... does not exist". The association is much longer than 5 elements. My current solution is to do this: dataset = SemanticImport["filename"] dataset[All,Map[Rule[#, f ]&, Keys[dataset[1] [[2 ;; 5]]] // Normal] While this works, it seems an odd and slow way to apply the same function over a large matrix of elements.The problem seems to me to be a slightly different behaviour between Part selection on lists and associations. Running MapAt on lists just applies the function to the selected elements of the list, but returns all the elements of the list. Part on associations in the dataset only returns the selected associations, so the size changes.Here's what I mean: MapAt[f, {a, b, c, d, e}, {2 ;; All}] {a, f[b], f[c], f[d], f[e]} MapAt[f, Association[{1 -> a, 2 -> b, 3 -> c, 4 -> d, 5->e}], {2 ;; All}] <|2 -> f[b], 3 -> f[c], 4 -> f[d], 5-> f[e]|> Notice how the first element is dropped in the second example.Jon
Posted 9 years ago
 Hi Jon, I found a way to do the trick with associations: MapAt[f, Association[{1 -> a, 2 -> b, 3 -> c, 4 -> d}], {{3}, {4}}] This returns all elements, for the columns given as argument the function f is applied