Message Boards Message Boards

0
|
12257 Views
|
9 Replies
|
2 Total Likes
View groups...
Share
Share this post:
GROUPS:

Can I Replace A Value In A Dataset[], Without Rebuilding the Whole Thing?

Posted 10 years ago

I was playing around with a simple implementation of a neural network, where I want to simulate interconnected neurons being stimulated by incrementing their action potential, and if passing a certain threshold, firing and resetting the action potential to 0.

I thought I would use Dataset[] to store neuron characteristics and values. But I worry about scaling up to large numbers of neurons:

Am I right in thinking the only way to replace or update a value in a Dataset[] always effectively involves making a new copy of the whole dataset?

Is this true even in cases where what I want to update is of fixed length (like an Integer)?

Are there any plans to change this?

Assuming the answer to these questions is Yes, Yes and No, is there a better storage mechanism in the Wolfram Language to allow me to achieve my goal?

POSTED BY: Brad Varey
9 Replies

Dear Brad,

yes, I see what you are saying. I guess that what is even more problematic is this:

ByteCount[dsBig]
(*384000080*)

as compared to

ByteCount[arBig]
(*8000152*)
ByteCount[SparseArray@arBig]
(*8880*)

I am not convinced that Dataset is the best structure for what you want to achieve. I do not know whether ReplacePart actually makes a new copy of the data set, but I doubt it. Arrays or SparseArrays seem to be more useful for your problem. I think that "rule based" programming might be too slow for that.

Best wishes,

Marco

PS: This is a little bit faster still, particularly for large arrays:

arBig = SparseArray@ConstantArray[0, {10, 10}];
AbsoluteTiming[arBig[[2, 2]] = 100;][[1]]
arBig = SparseArray@ConstantArray[0, {100, 100}];
AbsoluteTiming[arBig[[2, 2]] = 100;][[1]]
arBig = SparseArray@ConstantArray[0, {1000, 1000}];
AbsoluteTiming[arBig[[2, 2]] = 100;][[1]]
POSTED BY: Marco Thiel

If you have a large number of elements (Neurons) use a key to identify each one and let the value be another association which contains your properties. Essentially you would have an Association of Associations.

randomproperties[x_]:= AssociationThread[{"Property1", "Property2"}, RandomChoice[Range[100], 2]];
randomproperties[]
<|"Property1" -> 74, "Property2" -> 60|>

Creating Association of Associations

AbsoluteTiming[data = AssociationMap[randomproperties, Range[1, 1000000]];];
data[[997 ;; 999]]
<|997 -> <|"Property1" -> 91, "Property2" -> 35|>,  998 -> <|"Property1" -> 67, "Property2" -> 88|>, 
 999 -> <|"Property1" -> 79, "Property2" -> 20|>|>

Then reset a property of a given element.

AbsoluteTiming[data[998]["Property1"] = 449]
{0.0000198775, 449}

The data has been modified

 data[[997 ;; 999]]
 <|997 -> <|"Property1" -> 91, "Property2" -> 35|>,  998 -> <|"Property1" -> 449, "Property2" -> 88|>, 
 999 -> <|"Property1" -> 79, "Property2" -> 20|>|>

Even creating a new copy of your data doesn't really make much of a difference to the timing.

In[95]:= AbsoluteTiming[data[998]["Property1"] = 5000; x = data;  x[998]] 
Out[95]= {0.0000222869, <|"Property1" -> 5000, "Property2" -> 88|>}
POSTED BY: Emerson Willard

You should run some experiments and see what happens with Dataset.

If you use an Association then any modification of the association would be done very efficiently while still preserving immutability. You could still query this data structure using the new query syntax.

This video contains may be relevant. http://www.wolfram.com/broadcast/video.php?c=377&p=1&v=1240

POSTED BY: Emerson Willard

Hi,

what about:

ds = Dataset[{<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 1|>}]

enter image description here

which we take as an example. Now you can replace the b in the second row by:

ds = ReplacePart[ds, {2, "b"} -> 5 ]

enter image description here

Cheers,

Marco

POSTED BY: Marco Thiel
Posted 10 years ago

Marco, thanks.

But I think the following code proves ReplacePart[] is doing something pretty time-consuming, like an underlying copy itself...

buildElement[value_Integer] := <| "A" -> 1, "B" -> value|>

dsBig = buildElement[#] & /@ Range[100]; AbsoluteTiming[ReplacePart[dsBig, {2, "B"} -> 5]] dsBig = buildElement[#] & /@ Range[10000]; AbsoluteTiming[ReplacePart[dsBig, {2, "B"} -> 5]] dsBig = buildElement[#] & /@ Range[1000000]; AbsoluteTiming[ReplacePart[dsBig, {2, "B"} -> 5]]

When I run it, the three returned timings are 0.000014, 0.000275 and 0.032637.

If, in place of a Dataset, I use an Array build with ConstantArray, performance is much better, but still degrades quite a bit with time....

arBig = ConstantArray[0, {10, 10}];
AbsoluteTiming[arBig[[2, 2]] = 100;]
arBig = ConstantArray[0, {100, 100}];
AbsoluteTiming[arBig[[2, 2]] = 100;]
arBig = ConstantArray[0, {1000, 1000}];
AbsoluteTiming[arBig[[2, 2]] = 100;]

Running the above, I get

{9.*10^-6, Null}
{0.000015, Null}
{0.006307, Null}

Respectively.

POSTED BY: Brad Varey
Posted 10 years ago

Sorry, during my last post, my first block of code seems to have got mangled. It should have been...

dsBig = buildElement[#] & /@ Range[100];
AbsoluteTiming[ReplacePart[dsBig, {2, "B"} -> 5]]
dsBig = buildElement[#] & /@ Range[10000];
AbsoluteTiming[ReplacePart[dsBig, {2, "B"} -> 5]]
dsBig = buildElement[#] & /@ Range[1000000];
AbsoluteTiming[ReplacePart[dsBig, {2, "B"} -> 5]]
POSTED BY: Brad Varey
Posted 10 years ago

This is a good example of one of the many things I find confusing in the new Association syntax. I have found it a useful addition, but every time I use it I feel like I'm having to learn it all over again. I rarely accomplish anything without a great deal of re-study and experimentation.

In many ways, the syntax tries to reuse familiar structures, but often up to a point. For example, here is how Part could be used to perform this task with a list structure. Up to a point, it works with a Dataset -- but then fails in the end:

In[1]:= (* With lists *)

In[2]:= list = {{1, 2}, {2, 1}};

In[3]:= list[[1, 2]]

Out[3]= 2

In[4]:= list[[1, 2]] = 7;

In[5]:= list

Out[5]= {{1, 7}, {2, 1}}

In[6]:= (* With a dataset *)

In[7]:= ds = Dataset[{<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 1|>}]

Out[7]= Dataset[{__Association}]

In[8]:= ds[[1, "a"]]

Out[8]= 1

In[9]:= ds[[1, "a"]] = 7

During evaluation of In[9]:= Set::partd: Part specification ds[[1,a]] is longer than depth of object. >>

Out[9]= 7

In[10]:= ds

Out[10]= Dataset[{__Association}]
POSTED BY: David Keith
Posted 10 years ago

"Updating" -- that is very useful. But I am still finding these waters to be murky.

In[1]:= ds = Dataset[{<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 1|>}];

In[2]:= ds[[1, "a"]]

Out[2]= 1

In[3]:= (* this doesn't work the same way *)
ds[[1, "a"]] = 7

During evaluation of In[3]:= Set::partd: Part specification ds[[1,a]] is longer than depth of object. >>

Out[3]= 7

In[4]:= ds[1]["a"]

Out[4]= 1

In[5]:= ds[1]["a"] = 7

During evaluation of In[5]:= Set::write: Tag Dataset in a   1
b   2
1 level | 2elements \[SpanFromLeft]

[a] is Protected. >>

Out[5]= 7
POSTED BY: David Keith

Look at the normal form of your data you have a list of Associations not an Association of Associations. You will not be able to set the value in your case. Taliesin Beynon explains some of the different possible data structures and different uses of part syntax in this video

In[2]:= Normal@Dataset[{<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 1|>}]

Out[2]= {<|"a" -> 1, "b" -> 2|>, <|"a" -> 2, "b" -> 1|>}
POSTED BY: Emerson Willard
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract