Message Boards Message Boards

0
|
9158 Views
|
4 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Best way to process data using SemanticImport?

Posted 10 years ago
POSTED BY: Jon McCormack
4 Replies
Posted 10 years ago

Hello Jon,

I found one way of doing this (as usual Mathematica language is so rich that there might be other solutions): first I configure my data set

nrows = 3;
ncolumns = 10;

Then the data values are generated:

(data = Table[
    n + 100 (m - 1), {m, 1, nrows}, {n, 1, ncolumns}]) // TableForm

Next the keys are defined:

keys = Array["k" <> ToString[#] &, ncolumns]

Here the dataset is build:

dataset = Dataset[Map[Association[Thread[keys -> #]] &, data]]

Here is the way how to define the columns that need transformation. You can use Range or define them explicitly

col2trf = Join[Range[1, 4], {6}, {8, 9}]

The selected columns will be transformed using f1, the remaining columns are treated with f2 which should just leave the values as they are

f10 = Map[If[MemberQ[col2trf, #], f1, f2] &, Range[ncolumns]] 

f2 can be defined in the following way:

f2[x_] := x

Finally f10 is applied to the dataset (of course you have to define f1 according to your transformation first)

dataset[All, Thread[keys -> f10]]

Regards,

Michael

POSTED BY: Michael Helmle

Is this what you're aiming for?

dataset[All, MapAt[f, Range[2,]
POSTED BY: Jesse Friedman

Hi Jesse and Michael, Thanks for your suggestions. I tried Jesse's suggestion, but I get "Failure: Part {All, 2, 3, 4, 5} ... does not exist". The association is much longer than 5 elements. My current solution is to do this:

dataset = SemanticImport["filename"]
dataset[All,Map[Rule[#, f ]&, Keys[dataset[1] [[2 ;; 5]]] // Normal]

While this works, it seems an odd and slow way to apply the same function over a large matrix of elements.

The problem seems to me to be a slightly different behaviour between Part selection on lists and associations. Running MapAt on lists just applies the function to the selected elements of the list, but returns all the elements of the list. Part on associations in the dataset only returns the selected associations, so the size changes.

Here's what I mean:

MapAt[f, {a, b, c, d, e}, {2 ;; All}]
{a, f[b], f[c], f[d], f[e]}
MapAt[f, Association[{1 -> a, 2 -> b, 3 -> c, 4 -> d, 5->e}], {2 ;; All}]
<|2 -> f[b], 3 -> f[c], 4 -> f[d], 5-> f[e]|>

Notice how the first element is dropped in the second example.

Jon

POSTED BY: Jon McCormack
Posted 10 years ago
POSTED BY: Michael Helmle
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract