Message Boards Message Boards

Initializing data structures so application works in the cloud

I have notebooks that start by pulling data in from local spreadsheets and databases (such as PostgreSQL). The notebook then proceeds to perform tasks using this data.

At a point, the data I use becomes stable. I want to publish this notebook to the cloud with data embedded in the notebook so the active parts of the notebook continue to work such as manipulates, and tooltips, without requiring the notebook to continue to import the data from external sources.

I want to be able to access my cloud notebooks from an iPad.

How do I code a notebook to allow this functionality?

POSTED BY: Lawrence Winkler
10 Replies
Posted 4 years ago

Lawrence:

Have you considered deleting the resource objects before you deploy a new one?

Check out DeleteObject[].

POSTED BY: Mike Besso

I started with ResourceObject, ResourceData, and the like. Giving up seems like a cop-out. Using these functions has got to work, doesn't it?

POSTED BY: Lawrence Winkler

Your assumption about multiple rows in the ResourceSearch results is correct. If you deploy the same content multiple times to cloud, if can appear multiple times in the search results.

To be clearer, each time you evaluate ResourceObject[<|...|>] it creates a new ResourceObject with a unique uuid (use ResourceObject[...]["UUID"] to see the uuid). ResourceObjects with different uuids are treated as unique resources and will appear separately in search results.

DeleteObject[ResourceObject[...]] removes the stored content of only that specific ResourceObject, identified by its uuid. It will not delete other deployments of the same name. It should remove both local and cloud caches. Based on your "I've also tried several ways..." paragraph, it is possible there is a bug, either in DeleteObject or ResourceSearch, which maintains its own local cache of cloud metadata. Which $Version are you using?

Looking at the big picture, if your goal is only to make this data available for yourself in multiple environments, I agree with Jan that CloudPut or CloudExport are more straightforward.

If the goal is to make something to share with others and includes documentation with examples and metadata, then ResourceObject is the way to go. In that case, I encourage using the definition notebook interface CreateNotebook["DataResource"]. That notebook was overhauled in version 12.1 adds a lot of explanation that is missing in the programmatic CloudDeploy[ResourceObject[...]].

If you are dedicated to your current workflow, I recommend using ResourceData@ResourceObject[CloudObject[...]] with the CloudObject returned by CloudDeploy in order to ensure that you are using the correct deployment.

POSTED BY: Bob Sandheinrich

I remain in a quandary on using Resource Objects. I'm adding and correcting information in my local spreadsheets then importing the CSV version of them into WL -- using the basic code referenced above.

Each time I run the code to preserve the data changes, using CloudDeploy, running ResourceSearch returns a Dataset with many rows. I only want to preserve the last version. Instead the cloud seems to be preserving each version. I suppose that is okay, except I cannot tell which ResourceObject is the latest one -- that's the only version I'm interested it.

I've also tried several ways to delete all the versions out on the cloud before doing another CloudDeploy. Nothing seems to work. I've executed DeleteObject on each ResourceObject for each Dataset row returned from the ResourceSearch. The ResourceObjects are deleted, it seems, ResourceSearch still returns a Dataset of many rows referencing ResourceObjects which no longer exist.

Thus, I still seem stuck finding the latest Resource Object I'm interested in.

With 50 years of computing under my belt, I think I have an idea of how something this trivial should appear to work.

I really need this to work. It's taking too much of my time.

POSTED BY: Lawrence Winkler
Posted 4 years ago

Hi Lawrence,

Is there a reason for not using CloudPut and CloudGet as suggested by @Jan Poeschko

POSTED BY: Rohit Namjoshi

Nearing a solution?

Performing

ResourceSearch["ElectionCloseDataset"]

I get a dataset back, now with two rows, and a column heading of "ResourceObject" whose contents seem to be the dataset I assigned to the "Content" label when I created the resource object (see the code above). Why do I have two rows returned? My guess is Ive run the CloudDeploy two times and so I have two copies of my data in the cloud. ResourceUpdate seems to work only for local resources.

Get[ResourceSearch["ElectionCloseDataset", "Objects"] // First]

Works to get my original dataset back. Hooray! Still in Hack mode but perhaps getting closer.

POSTED BY: Lawrence Winkler

So, I'm now trying use ResourceData to push my data into the cloud. Then I want to reference that data in another NB to process it. Seems simple enough.

importSpreadsheet[] := Module[{items, names, data, assoc},
   items = 
    Import["/Users/wkr/Documents/Elections/Closing Graph.csv", 
     "CSV"];
   names = items // First;
   data = items // Rest;
   assoc = AssociationThread[names, #] & /@ data;
  Dataset[assoc]
   ];
buildDataResource[ds_] := Module[{},
   ResourceObject[Association[
     "ResourceType" -> "DataResource",
     "Name" -> "ElectionCloseDataset",
     "Content" -> ds,
     "Description" -> 
      "Dataset created from the Closing Graph Spreadsheet"
     ]]];
storeGraphDataset[] := Module[{ds},
   ds = importSpreadsheet[];
   buildDataResource[ds]
   ];

Now, generate a resource object and deploy to the cloud

ro = storeGraphDataset[];
CloudDeploy[ro, "ElectionCloseDataset"]

The above seems to work.

Now, in order to use this resource, I want get back the dataset portion of the resource, with code in a different NB.

ro = ResourceSearch["ElectionCloseDataset", ResourceObject]

I get back the resource object but none of my attempts to return the dataset that should be in the ResourceObject works.

What is the correct syntax?

POSTED BY: Lawrence Winkler

Maybe ResourceData is the function you're looking for?

If all you want to do is store an expression in the cloud (e.g. coming from your local data sources) and get it back later (e.g. in a cloud notebook), CloudPut and CloudGet might be all you need. There's nothing wrong with ResourceObject, it just seems a little more "heavy-weight" for this use case.

Both approaches would store the data as a separate cloud object. If you want to access the data from different places, that's probably what you want, anyway. But if you want to make data truly part of a notebook, you'll need something else...

A convenient way to do this (fairly new and still marked as experimental) is PersistentValue. For instances, you can say PersistentValue["myvalue", "Notebook"] = 100!; Dynamic[PersistentValue["myvalue", "Notebook"]] and it will use data persisted in the notebook itself. When you copy the notebook to the cloud (using CopyFile, CloudDeploy or CloudPublish etc.), the value will still be there. Under the hood, the "Notebook" persistence location just uses a notebook option "NotebookPersistence" to store the notebook in the notebook. Before PersistentValue existed, people sometimes did something similar with the notebook option TaggingRules.

Another way to persist data in a notebook, that I should point out here, is the Initialization option (of DynamicModule, Manipulate and similar constructs). It's a more imperative way of managing your data dependencies (essentially saying "when this interface appears, make sure to evaluate X"). SaveDefinitions / IncludeDefinitions use it under the hood to pull in dependencies, which happens by default in CloudDeploy and CloudPublish. So if your Manipulate depends on pure data that is already in your local kernel, it might "just work" by default when deployed to that cloud using these functions. (But if the function inside your Manipulate loads data from some external resource on the fly, e.g. by connecting to a database or reading a file, that's when you need more fine-grained control using the mechanisms I described here.)

Hope this helps. Let me know if you have more questions or run into any issues.

POSTED BY: Jan Poeschko

I've been playing around with WL ResourceObjects and related functions. I'm guessing I might need a notebook to pull in the data and push it out to the cloud. Then a notebook to find the resource objects in the cloud, pull it in and run the processing.

I think I made a mistake using New -> Repository Item -> Data Repository Item to create a resource object. Seems too complicated for what I wanted. It's a notebook template for documentation of the data, and there is a button to deploy and another to submit. The submit became a request to ask Wolfram to make the data available to others as though my data would be of interest to anyone but me. I really didn't want to do that.

Simple and stupid is all I can muster at the present.

With Mathematica, I'm not yet much of a risk-taker to just try something and see if it works. I don't have the knowledge to know what problems I could be causing when stuff doesn't work and how to debug it when stuff doesn't work. I don't want to clutter up Wolfram's cloud with my poor decisions.

POSTED BY: Lawrence Winkler
Posted 4 years ago

Lawrence:

This is a great question and I will also soon need a solution to this.

I did google "Mathematica store data in notebook" and there are a few suggestions.

And, if I remember correctly, one of the recent upgrades added the ability for a dataset to store its data inside of a notebook.

Check out $NotebookInlineStorageLimit

I am hoping others will chime in with some workflows and best practices.

POSTED BY: Mike Besso
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract