# Modify and overwrite a Dataset?

Posted 5 months ago
836 Views
|
4 Replies
|
2 Total Likes
|
 Greetings, everyone. I hope you are doing ok during these strange times. My question "How do you modify and overwrite a Dataset?" is motivated in the following problem:I have imported a dataset (.csv file attached) using SemanticImport. I have added two columns to this dataset and created new corresponding datasets, using the Append function. These columns were created, as you can see, by doing some operations on data already in the dataset (The procedure is explained in this post: https://community.wolfram.com/groups/-/m/t/313491). d2sv6 = Append[#, "theta40" -> #["Theta_0i"] + #["m0i"]*(40 - #["A"])] & /@ d1sv6 d3sv6 = Append[#, "theta80" -> #["Theta_0i"] + #["m0i"]*(80 - #["A"])] & /@ d2sv6 Now I need to add a large number of columns (ranging from 20 to 100) to this dataset. Unlike the examples above, the columns I am needing to include require data not already in the dataset. For instance, I need one column to be generated by each of the 50 values in the list "test80": test80 = {-4.465980214, -4.300742326, -4.135504438, -3.97026655, \ -3.805028661, -3.639790773, -3.474552885, -3.309314997, -3.144077109, \ -2.978839221, -2.813601333, -2.648363445, -2.483125556, -2.317887668, \ -2.15264978, -1.987411892, -1.822174004, -1.656936116, -1.491698228, \ -1.32646034, -1.161222452, -0.995984563, -0.830746675, -0.665508787, \ -0.500270899, -0.335033011, -0.169795123, -0.004557235, 0.160680653, 0.325918542, 0.49115643, 0.656394318, 0.821632206, 0.986870094, 1.152107982, 1.31734587, 1.482583758, 1.647821647, 1.813059535, 1.978297423, 2.143535311, 2.308773199, 2.474011087, 2.639248975, 2.804486863, 2.969724751, 3.13496264, 3.300200528, 3.465438416, 3.630676304} The calculation takes each value of the columns "theta40" or "theta80" (each one represents a scenario) and computes a number using each of the 50 values of the list "test80". For instance, taking just the "theta40" scenario, 50 columns should be created: one for each row-value in "theta40" in combination with each of the 50 values in the list "test80".Of course, it becomes impractical creating an object for each added column as I did in the first part of the problem. Ideally, after all columns are created, a single new dataset should be created for each scenario. There are two main questions I have on this problem: How do I append and overwrite the dataset so it includes the values from the list? How do I get each new column to have a different name? Perhaps should I use an array instead of a list? For those of you interested, this problem is relevant to the fields of psychometrics and education. The numbers to be calculated are probabilities of correct response (following what is known as a Rasch model). The values of the list "test80" can be taught as item difficulties. The data to be generated simulates performance on a test. The need to add columns to the dataset is common in education; actually the problem is trivial using a spreadsheet. The problem is that calculating huge numbers of formulas in a spreadsheet seems to be very inefficient. Attachments:
4 Replies
Sort By:
Posted 5 months ago
 Hi Jorge,d3sv6 can be computed in one step from d1sv6. d1sv6 = Import["~/Downloads/D1SV6_.csv", "Dataset", HeaderLines -> 1]; d3sv6 = d1sv6[All, <|#, "theta40" -> #["Theta_0i"] + #["m0i"]*(40 - #["A"]), "theta80" -> #["Theta_0i"] + #["m0i"]*(80 - #["A"])|> &] There are 1000 theta40 values and 50 test80 values. Can you explain what you mean by 50 columns should be created: one for each row-value in "theta40" in combination with each of the 50 values in the list "test80". How are the 1000 theta40 values "combined" with the 50 test80 values to create a 1000 x 50 matrix? If you can construct that matrix then it is easy to add it to the dataset. As an example, using KroneckerProduct to multiply each value in theta40 with the values in test80 to construct the matrix theta40 = d3sv6[All, "theta40"] // Normal; matrix = KroneckerProduct[theta40, test80]; Dimensions@matrix (* {1000, 50} *) Construct names for the new columns and generate Association columnNames = Table["test80_" <> ToString[i], {i, 1, Length@test80}]; result = AssociationThread[columnNames -> #] & /@ matrix; Final result ds4v6 = MapThread[Append, {d3sv6 // Normal, result}] // Dataset 
Posted 4 months ago
 Thank you, Rohit. That is, indeed, quite a nice solution to this problem. The way you interpreted the combination of the list-values with the column-values of the dataset is very much the kind of operation I need to do, something "Kronecker-like".Now I'm wondering, how can I have an operation to behave Kronecker-like with a twist, i.e. not having just the two numbers multiplied as it happens in this case, but having any function of the two numbers? Particularly I am interested in exp(theta40-item80i). Perhaps is it better to do a transformation on the whole "matrix" array after it has been generated?
 Hi Jorge,You can use Outer to do that. E.g. ClearAll@f row = {1, 2}; column = {3, 4, 5}; Outer[f, column, row] (* {{f[3, 1], f[3, 2]}, {f[4, 1], f[4, 2]}, {f[5, 1], f[5, 2]}} *)  $\left( \begin{array}{cc} f(3,1) & f(3,2) \\\ f(4,1) & f(4,2) \\\ f(5,1) & f(5,2) \\\ \end{array} \right)$For your example this should work (assuming that by i you mean the imaginary number I, if not, what does item80i mean?) matrix = Outer[Exp[#1 - #2 I] &, theta40, test80]