Message Boards Message Boards


Modify and overwrite a Dataset?

Posted 5 months ago
4 Replies
2 Total Likes

Greetings, everyone. I hope you are doing ok during these strange times.

My question "How do you modify and overwrite a Dataset?" is motivated in the following problem:

I have imported a dataset (.csv file attached) using SemanticImport. I have added two columns to this dataset and created new corresponding datasets, using the Append function. These columns were created, as you can see, by doing some operations on data already in the dataset (The procedure is explained in this post:

d2sv6 = Append[#, 
    "theta40" -> #["Theta_0i"] + #["m0i"]*(40 - #["A"])] & /@ d1sv6

d3sv6 = Append[#, 
    "theta80" -> #["Theta_0i"] + #["m0i"]*(80 - #["A"])] & /@ d2sv6

Now I need to add a large number of columns (ranging from 20 to 100) to this dataset. Unlike the examples above, the columns I am needing to include require data not already in the dataset. For instance, I need one column to be generated by each of the 50 values in the list "test80":

test80 = {-4.465980214, -4.300742326, -4.135504438, -3.97026655, \
-3.805028661, -3.639790773, -3.474552885, -3.309314997, -3.144077109, \
-2.978839221, -2.813601333, -2.648363445, -2.483125556, -2.317887668, \
-2.15264978, -1.987411892, -1.822174004, -1.656936116, -1.491698228, \
-1.32646034, -1.161222452, -0.995984563, -0.830746675, -0.665508787, \
-0.500270899, -0.335033011, -0.169795123, -0.004557235, 0.160680653, 
  0.325918542, 0.49115643, 0.656394318, 0.821632206, 0.986870094, 
  1.152107982, 1.31734587, 1.482583758, 1.647821647, 1.813059535, 
  1.978297423, 2.143535311, 2.308773199, 2.474011087, 2.639248975, 
  2.804486863, 2.969724751, 3.13496264, 3.300200528, 3.465438416, 

The calculation takes each value of the columns "theta40" or "theta80" (each one represents a scenario) and computes a number using each of the 50 values of the list "test80". For instance, taking just the "theta40" scenario, 50 columns should be created: one for each row-value in "theta40" in combination with each of the 50 values in the list "test80".

Of course, it becomes impractical creating an object for each added column as I did in the first part of the problem. Ideally, after all columns are created, a single new dataset should be created for each scenario. There are two main questions I have on this problem:

  1. How do I append and overwrite the dataset so it includes the values from the list?
  2. How do I get each new column to have a different name? Perhaps should I use an array instead of a list?

For those of you interested, this problem is relevant to the fields of psychometrics and education. The numbers to be calculated are probabilities of correct response (following what is known as a Rasch model). The values of the list "test80" can be taught as item difficulties. The data to be generated simulates performance on a test. The need to add columns to the dataset is common in education; actually the problem is trivial using a spreadsheet. The problem is that calculating huge numbers of formulas in a spreadsheet seems to be very inefficient.

4 Replies

Thanks again, Rohit. Outer does exactly the work required. This is a really nice way to define transformations in columns of data for the work I am doing. (the "i" in my post was intended to index each element in the list "test80"; thanks for your careful reading). The important part was defining the operation (exp(theta40-test80)), and that works as intended.

Posted 5 months ago

Hi Jorge,

You can use Outer to do that. E.g.

row = {1, 2}; column = {3, 4, 5};

Outer[f, column, row]
(* {{f[3, 1], f[3, 2]}, {f[4, 1], f[4, 2]}, {f[5, 1], f[5, 2]}} *)

$\left( \begin{array}{cc} f(3,1) & f(3,2) \\\ f(4,1) & f(4,2) \\\ f(5,1) & f(5,2) \\\ \end{array} \right)$

For your example this should work (assuming that by i you mean the imaginary number I, if not, what does item80i mean?)

matrix = Outer[Exp[#1 - #2 I] &, theta40, test80]
Posted 5 months ago

Thank you, Rohit. That is, indeed, quite a nice solution to this problem. The way you interpreted the combination of the list-values with the column-values of the dataset is very much the kind of operation I need to do, something "Kronecker-like".

Now I'm wondering, how can I have an operation to behave Kronecker-like with a twist, i.e. not having just the two numbers multiplied as it happens in this case, but having any function of the two numbers? Particularly I am interested in exp(theta40-item80i). Perhaps is it better to do a transformation on the whole "matrix" array after it has been generated?

Posted 5 months ago

Hi Jorge,

d3sv6 can be computed in one step from d1sv6.

d1sv6 = Import["~/Downloads/D1SV6_.csv", "Dataset", HeaderLines -> 1];

d3sv6 = d1sv6[All, <|#,
    "theta40" -> #["Theta_0i"] + #["m0i"]*(40 - #["A"]), 
    "theta80" -> #["Theta_0i"] + #["m0i"]*(80 - #["A"])|> &] 

There are 1000 theta40 values and 50 test80 values. Can you explain what you mean by

50 columns should be created: one for each row-value in "theta40" in combination with each of the 50 values in the list "test80".

How are the 1000 theta40 values "combined" with the 50 test80 values to create a 1000 x 50 matrix? If you can construct that matrix then it is easy to add it to the dataset.

As an example, using KroneckerProduct to multiply each value in theta40 with the values in test80 to construct the matrix

theta40 = d3sv6[All, "theta40"] // Normal;
matrix = KroneckerProduct[theta40, test80];
(* {1000, 50} *)

Construct names for the new columns and generate Association

columnNames = Table["test80_" <> ToString[i], {i, 1, Length@test80}];
result = AssociationThread[columnNames -> #] & /@ matrix;

Final result

ds4v6 = MapThread[Append, {d3sv6 // Normal, result}] // Dataset
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract