Group Abstract

Message Boards

WOLFRAM COMMUNITY

7.3K Views

4 Replies

2 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Modify and overwrite a Dataset?

Jorge Mahecha

Jorge Mahecha, Boston College

Posted 6 years ago

Greetings, everyone. I hope you are doing ok during these strange times. My question "How do you modify and overwrite a Dataset?" is motivated in the following problem: I have imported a dataset (.csv file attached) using SemanticImport. I have added two columns to this dataset and created new corresponding datasets, using the Append function. These columns were created, as you can see, by doing some operations on data already in the dataset (The procedure is explained in this post: https://community.wolfram.com/groups/-/m/t/313491). d2sv6 = Append[#, "theta40" -> #["Theta_0i"] + #["m0i"](40 - #["A"])] & /@ d1sv6 d3sv6 = Append[#, "theta80" -> #["Theta_0i"] + #["m0i"](80 - #["A"])] & /@ d2sv6 Now I need to add a large number of columns (ranging from 20 to 100) to this dataset. Unlike the examples above, the columns I am needing to include require data not already in the dataset. For instance, I need one column to be generated by each of the 50 values in the list "test80": test80 = {-4.465980214, -4.300742326, -4.135504438, -3.97026655, \ -3.805028661, -3.639790773, -3.474552885, -3.309314997, -3.144077109, \ -2.978839221, -2.813601333, -2.648363445, -2.483125556, -2.317887668, \ -2.15264978, -1.987411892, -1.822174004, -1.656936116, -1.491698228, \ -1.32646034, -1.161222452, -0.995984563, -0.830746675, -0.665508787, \ -0.500270899, -0.335033011, -0.169795123, -0.004557235, 0.160680653, 0.325918542, 0.49115643, 0.656394318, 0.821632206, 0.986870094, 1.152107982, 1.31734587, 1.482583758, 1.647821647, 1.813059535, 1.978297423, 2.143535311, 2.308773199, 2.474011087, 2.639248975, 2.804486863, 2.969724751, 3.13496264, 3.300200528, 3.465438416, 3.630676304} The calculation takes each value of the columns "theta40" or "theta80" (each one represents a scenario) and computes a number using each of the 50 values of the list "test80". For instance, taking just the "theta40" scenario, 50 columns should be created: one for each row-value in "theta40" in combination with each of the 50 values in the list "test80". Of course, it becomes impractical creating an object for each added column as I did in the first part of the problem. Ideally, after all columns are created, a single new dataset should be created for each scenario. There are two main questions I have on this problem: How do I append and overwrite the dataset so it includes the values from the list? How do I get each new column to have a different name? Perhaps should I use an array instead of a list? For those of you interested, this problem is relevant to the fields of psychometrics and education. The numbers to be calculated are probabilities of correct response (following what is known as a Rasch model). The values of the list "test80" can be taught as item difficulties. The data to be generated simulates performance on a test. The need to add columns to the dataset is common in education; actually the problem is trivial using a spreadsheet. The problem is that calculating huge numbers of formulas in a spreadsheet seems to be very inefficient. Attachments: D1SV6_.csv

POSTED BY: Jorge Mahecha

4 Replies

Sort By:

Jorge Mahecha

Jorge Mahecha, Boston College

Posted 6 years ago

Thanks again, Rohit. Outer does exactly the work required. This is a really nice way to define transformations in columns of data for the work I am doing. (the "i" in my post was intended to index each element in the list "test80"; thanks for your careful reading). The important part was defining the operation (exp(theta40-test80)), and that works as intended.

POSTED BY: Jorge Mahecha

Rohit Namjoshi

Posted 6 years ago

Hi Jorge, You can use `Outer` to do that. E.g. ClearAll@f row = {1, 2}; column = {3, 4, 5}; Outer[f, column, row] (* {{f[3, 1], f[3, 2]}, {f[4, 1], f[4, 2]}, {f[5, 1], f[5, 2]}} *) $\left( \begin{array}{cc} f(3,1) & f(3,2) \\\ f(4,1) & f(4,2) \\\ f(5,1) & f(5,2) \\\ \end{array} \right)$ For your example this should work (assuming that by `i` you mean the imaginary number `I`, if not, what does `item80i` mean?) matrix = Outer[Exp[#1 - #2 I] &, theta40, test80]

POSTED BY: Rohit Namjoshi

Updating Name

Posted 6 years ago

Thank you, Rohit. That is, indeed, quite a nice solution to this problem. The way you interpreted the combination of the list-values with the column-values of the dataset is very much the kind of operation I need to do, something "Kronecker-like". Now I'm wondering, how can I have an operation to behave Kronecker-like with a twist, i.e. not having just the two numbers multiplied as it happens in this case, but having any function of the two numbers? Particularly I am interested in exp(theta40-item80i). Perhaps is it better to do a transformation on the whole "matrix" array after it has been generated?

POSTED BY: Updating Name

Rohit Namjoshi

Posted 6 years ago

Hi Jorge, `d3sv6` can be computed in one step from `d1sv6`. d1sv6 = Import["~/Downloads/D1SV6_.csv", "Dataset", HeaderLines -> 1]; d3sv6 = d1sv6[All, <\|#, "theta40" -> #["Theta_0i"] + #["m0i"](40 - #["A"]), "theta80" -> #["Theta_0i"] + #["m0i"](80 - #["A"])\|> &] There are 1000 `theta40` values and 50 `test80` values. Can you explain what you mean by 50 columns should be created: one for each row-value in "theta40" in combination with each of the 50 values in the list "test80". How are the 1000 `theta40` values "combined" with the 50 `test80` values to create a 1000 x 50 matrix? If you can construct that matrix then it is easy to add it to the dataset. As an example, using `KroneckerProduct` to multiply each value in `theta40` with the values in `test80` to construct the matrix theta40 = d3sv6[All, "theta40"] // Normal; matrix = KroneckerProduct[theta40, test80]; Dimensions@matrix (* {1000, 50} *) Construct names for the new columns and generate `Association` columnNames = Table["test80_" <> ToString[i], {i, 1, Length@test80}]; result = AssociationThread[columnNames -> #] & /@ matrix; Final result ds4v6 = MapThread[Append, {d3sv6 // Normal, result}] // Dataset

POSTED BY: Rohit Namjoshi

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback