Community RSS Feed
https://community.wolfram.com
RSS Feed for Wolfram Community showing any discussions in tag Graphics and Visualization sorted by active[WSC19] 3D Cellular Automata
https://community.wolfram.com/groups//m/t/1732558
#Introduction
I first became interested in cellular automata when I learned about Conway's Game of Life, and I was interested in Wolfram Language's machine learning and graphics, so I was very excited for this project. The 3D Cellular Automata project uses machine learning to classify the general shape of 3D models generated by cellular automata and specifically looks for rules that generate irregular shapes. In order to achieve my goals, I trained a function to recognize familiar shapes like spheres and cubes from 3D models that have the general shape of the 3D figures. Additionally, I generated both a training and testing set of cellular automata models to run through the machine learning. Also, a function that uses the rules of specific cellular automata has been constructed to generate the locations of each cell.
![rule 182 iterations][1]
The end result of my code is a classification of cellular automata with the possible classifications of Cube, Sphere, Less Interesting Irregular, and Interesting Irregular.
![rule 182][2]
#What are Cellular Automata?
A cellular automaton is a set of rules iteratively applied to a configuration of cells. This means that the automaton changes during each iteration as the states of the cells change based on the rules. For example, if one cell is alive at the beginning and the rule is to change a cell to become alive if its neighboring cells are alive, then in the next iteration the four cells next to the first cell will be alive.
![cellular automaton explanation][3]
Rules 0255 are a set of deterministic rules for cellular automata. In my project, I only study rules 2254 going up by 4 since these rules operate on the idea that each cell has two possible states.
#3D Cellular Automata
To generate the figures made by cellular automata, I used the CellularAutomaton function in the Wolfram Language. This function returns a list of lists with numbers, so the Image3D function is used to make the output into an image. Therefore, I made the following function to generate the 3D models of cellular automata, and I used it to make both the test and training set.
caVal[{n_, k_, t_}] :=
Image3D[#, ImageSize > 100] & /@
CellularAutomaton[{n, {k, 1}, {1, 1, 1}}, {{{{1}}}, 0}, {{t}}]
![cube][4]
I also used code from "A New Kind of Science" and Wolfram Documentation to generate other 3D cellular automata.
![Cube, Game of Life, Sphere][5]
#Machine Learning
To make the training set, I used a Manipulate to generate figures using cellular automata. Then, I classified the figures based on their shape. At first, I had Cube, Sphere, and Irregular as the classes. However, after meeting with Stephen Wolfram, we decided to divide the Irregular class into Less Interesting Irregular and Interesting Irregular. The purpose of this was to filter out models that were not intriguing to examine (they were just not a cube or a sphere). Thus, the possible classes are Cube, Sphere, Less Interesting Irregular, and Interesting Irregular.
Manipulate[
Image3D[#, ImageSize > 250] & /@
CellularAutomaton[{x, {2, 1}, {1, 1, 1}}, {{{{1}}}, 0}, {{t}}], {x,
k2rules[[Key[2]]]}, {t, 2, 35, 1}]
![Manipulate of different cellular automata and iterations][6]
To make the training data, I stored each figure and its classification in an association.
tST = Join[
Thread[sT > Table["Sphere", Length[sT]]],
Thread[cT > Table["Cube", Length[cT]] ],
Thread[IIT > Table["Interesting Irregular", Length[IIT]]] ,
Thread[IBT > Table["Less Interesting Irregular", Length[IBT]]]
]
Association values for interesting irregular class.
![association values for interesting irregular class][7]
Next, I put the training set into the Classify function with the PerformanceGoal of Quality to make the machine learning function.
totalisticClassifier = Classify[tST, PerformanceGoal > "Quality"]
After testing the function and looking at information on it, I found that it classified figures constructed according to cellular automata rules with 83% accuracy.
![testData classifications][8]
![more testData classifications ][9]
Also, I wrote functions to return only the cellular automata models of a particular classification.
interesting[n_] := Cases[n, {"Interesting Irregular", _}] // TableForm
lessInteresting[n_] :=
Cases[n, {"Less Interesting Irregular", _}] // TableForm
cube[n_] := Cases[n, {"Cube", _}] // TableForm
sphere[n_] := Cases[n, {"Sphere", _}] // TableForm
![only interesting cellular automata][10]
#Extensions
I created functions that make it easier for a user to classify rules of cellular automata.
I made a function that takes both a rule number and an iteration and returns the classification for each step to get to the final iteration
classifyByRuleIteration[{r_, i_}] :=
Table[Image3D[#, ImageSize > 100] & /@
CellularAutomaton[{r, {2, 1}, {1, 1, 1}}, {{{{1}}},
0}, {{t}}], {t, i}] // totalC
I also made a function that returns a specific iteration's shape classification when given the rule number and desired iteration.
classifyRI[{r_,
i_}] := {Image3D[#, ImageSize > 200] & /@
CellularAutomaton[{r, {2, 1}, {1, 1, 1}}, {{{{1}}}, 0}, {{i}}]} //
totalC
Additionally, I made a Manipulate so that 3D cellular automata can be classified in a user friendly way.
Manipulate[ classifyRI[{x, t}], {x, 2, 254, 4}, {t, 0, 50, 1}]
![Manipulate that classifies the cellular automata generated][11]
Finally, my mentor, Sylvia Haas, helped me make a Manipulate to display 3D cellular automata in rainbow colors!
Manipulate[
Graphics3D[
Riffle[Table[
ColorData["Rainbow", c], {c, 0, 1, 1/Length[#]}], #] & [
Cuboid /@ Position[#, 1]] & /@
CellularAutomaton[{x, {2, 1}, {1, 1, 1}}, {{{{1}}},
0}, {{t}}]], {x, k2rules[[Key[2]]]}, {t, 2, 35, 1}]
![rainbow Manipulate][12]
#Summary
The 3D Cellular Automata project has a machine learning function that can classify 3D cellular automata iterations as cubes, spheres, less interesting irregulars, and interesting irregulars with 83% accuracy. The classification is specifically for iterations of rules 2  254 going up in steps of 4 with the k value being 2, meaning that each cell can be in two different states. The classifier was tested with 20 iterations of all the cellular automata rules that have k values of 2. Using this function, classifications for different iterations of cellular automata with the same rule can be compared to find patterns in the evolving shape. I also wrote a function that displays only the cellular automata with a particular classification. For example, one could see only the cubes in a set. To make classifying cellular automata easier for the user, the program has many different functions that can be used to classify rules. There is a function that classifies figures for a specific rule number and iteration, or for a rule number and a range of iterations. In addition, a function generating animations for the evolution of each cellular automata rule and a Manipulate displaying the cellular automata figure in rainbow colors have been created.
#Future Work
In the future, I would like to classify figures for states other than 2. This would mean that k could equal 3, 4, or any other number. Furthermore, the current model classifies based on shapes created when one cell is alive in the beginning and the rest are null. In the future, the classification would include figures created with more than one live cell at the start so that cellular automata figures from different starting configurations of the same rule can be compared. Additionally, the current model includes only four different classes: cube, sphere, less interesting irregular, and interesting irregular. With the future development of the project, there would be more classes like rectangular prisms, which would be added with the varying starting configurations.
![rainbow gif][13]
#Acknowledgements
Thank you my amazing mentor, Sylvia Haas, for providing support during the entire process! Also, thank you to all of the mentors at Wolfram Summer Camp for helping me!
Github link: https://github.com/adriennewlai21/WSSTemplate
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at6.43.02PM.png&userId=1725145
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=rule182.gif&userId=1725145
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at6.51.16PM.png&userId=1725145
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at6.55.56PM.png&userId=1725145
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at6.57.58PM.png&userId=1725145
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at4.24.29PM.png&userId=1725145
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at7.01.49PM.png&userId=1725145
[8]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at7.07.04PM.png&userId=1725145
[9]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at7.08.03PM.png&userId=1725145
[10]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at7.35.13PM.png&userId=1725145
[11]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at4.58.08PM.png&userId=1725145
[12]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ScreenShot20190711at4.58.56PM.png&userId=1725145
[13]: https://community.wolfram.com//c/portal/getImageAttachment?filename=rainbow.gif&userId=1725145Adrienne Lai20190712T01:01:57Z[WSC19] Fitting the World: Physical Scale from Satellite Images
https://community.wolfram.com/groups//m/t/1732295
![Cover image][1]
## What are we trying to do?
The goal of this project is to produce an algorithm which can find the physical scale of any satellite image: the relationship between the content of the image, and the area it covers in kilometers.
![Examples of training data][2]
This is a very challenging problem because of the characteristics of the world's terrain. At different zoom levels, it's highly selfsimilar, like a fractal. We need to find a method to extract useful information from these images, while also ignoring the parts which are repeated across zoom levels.
There are several ways to approach the problem. Some of the more manual methods involve trying to label image parts which only show up at certain scales. For instance, you could rely on the fact that all rooftops are generally roughly the same size, all roads have similar widths, and so on.
![Examples of manmade structure][3]
Those approaches all need lots of manual work, though. We attempted to solve the problem automatically, exploring ideas that might let us predict map zoom without relying on hardcoded rules.
Although we explored several approaches to this problem, the most successful solution used feature extraction to convert each image to a relatively small vector, then trained a small feedforward neural network to predict zoom from each set of features. Although the selfsimilarity of each image makes more traditional CV approaches very difficult, a neural network can learn the right features to approximate the zoom very well.
## Methodology
We wound up gathering several datasets of satellite images over the course of the project. The most useful one was a dataset of three thousand images of random locations in the continental United States, all taken at different zoom levels:
```
(* define a function to download images of a specific location and zoom *)
satelliteImage[location_, size_?NumericQ] :=
GeoImage[
GeoDisk[location, Quantity[size/2, "km"]],
GeoServer > "DigitalGlobe",
RasterSize > 400
]
sampleCount = 3000;
(* Get a list of 3000 random locations in the continental US *)
randomLocations =
RandomGeoPosition[Entity["Country", "UnitedStates"],
sampleCount] /. GeoPosition[locs_List] :> GeoPosition /@ locs;
(* Get a list of 1000 random zoom levels, from 1 to 200 km *)
randomZooms = Table[RandomReal[{1, 200}], sampleCount];
(* create a dataset of image specifications *)
samples = MapThread[
<"Location" > #1, "Zoom" > #2> &,
{randomLocations, randomZooms}
] // Dataset;
(* to see the precise scripts used to download each image, download the attached notebookthey have dynamic progress indicators and checkpoint the data to disk, so they're a little complicated *)
```
Working on this project, the majority of the time was spent trying several different methods to extract information from images.
One attempt was visual preprocessing: running segmentation or binarization on each image before training a model or extracting some metric. Every time we did this, although the images looked really neat, the accuracy of our predictions were far worse.
![Examples of image preprocessing][4]
All of those preprocessing methods reduce the dimensionality of the images, but it turns out that they focus on the wrong features in the process. Our network was supposed to pick up on the large patterns: rivers, landmasses, and so on. When we segmented images, the network was forced to focus on the small details we highlighted. All in all, preprocessing using traditional CV was a failure.
We also tried training a convolutional neural network from scratch on the images, which also failed. The network would, at first, immediately overfit the training set. When we reduced its size, it would fail to converge at all. Despite hours of tweaking, this method did not work either.
The next idea we had was just to throw the entire dataset into `Predict[]` and see what would happen. This worked shockingly well, considering how easy it was:
![Predict results][5]
So the next question we had was: how? What was `Predict[]` doing behind the scenes? Doing some research, we discovered a couple starting points for our own neural network model:
 `Predict[]` preprocesses images using WL's feature extraction functionality. This is based on the first few layers of the trained `ImageIdentify` convnet, combined with an autoencoder.
 `Predict[]` generally trains small feedforward networks, without convolutional layers.
Our general approach was to replicate this setup, with several tweaks and optimizations for better performance.
Our first large improvement was a preprocessing step: image augmentation. We applied several different constant crops, translations, rotations, and reflections to each image, in order to increase the number of examples we had by a factor of 10. We then performed feature extraciton on the augmented data.
```
(* use the builtin augmentation functionality *)
(* note: this can't go in the network itself because we need to extract features with FeatureExtraction[] after this step *)
augmentor = ImageAugmentationLayer[
(* Final image dimensionsfrom cropping *)
{200, 200},
(* 50% change of either reflection *)
"ReflectionProbabilities" > {0.5, 0.5},
"Input" > NetEncoder[{"Image", {400, 400}}], (* 400x400 input images *)
"Output" > NetDecoder["Image"] (* output an image for the feature extractor *)
]
(* each original image gets 10 augmented images generated from it *)
augmentationMultiplier = 10;
(* actually augment the image set *)
augmentedImages = Join@@Table[
imageSet[ All, <
#,
"Image" > augmentor[#Image, NetEvaluationMode > "Train"]
> &],
augmentationMultiplier
] // RandomSample;
```
Our next approach was to replicate thisuse `FeatureExtraction[]` to reduce the dimensionality of the images, and train a small, mostlylinear neural network on the result.
```
(* create a feature extractor trained with the first 5000 images*)
fExtractor = FeatureExtraction[augmentedImages[;; 5000, "Image"]];
(* extract features from the images *)
features = imageSet[All, <#, "Features" > fExtractor[#Image]> &]
```
When you plot the feature vectors of a few images, you can see the variance between them is clear:
![Featureimage matrix][6]
Although the network structure was fairly simple, we automated the generation of good hyperparameters. We trained hundreds of slightlydifferent networks, and evaluated their statistical performance. Out of many different permutations of layer count, layer size, activation function choice, training speed, and so on, we picked the ones which work best. The final network design we settled on was this:
```
geoNet = NetChain[
{
200, Ramp, DropoutLayer[0.3],
100, Ramp,
20,
1
},
"Input" > {324}, (* our feature vectors are 335dimensional *)
"Output" > NetDecoder["Scalar"] (* decode into a single number:
zoom in km *)
];
geoNet = NetTrain[
geoNet,
(* feature(TrainTest) all have the shape { {feature, ...} > zoom, ...} *)
featureTrain,
ValidationSet > featureTest,
TargetDevice > "GPU",
MaxTrainingRounds > 600,
(* bail if loss stops decreasing *)
TrainingStoppingCriterion > <"Criterion" > "Loss", "Patience" > 50>
]
```
In the end, the general architecture of the most successful model looked like this:
++
 
 Original dataset  Generated from GeoImage[]
 
++
   
   
V V V V
++
 ++
  ++
      Generated with
   Augmented dataset    ImageAugmentationLayer[]
     
++  
++ 
++
   
   
V V V V


 Extracted feature vectors  using FeatureExtraction[]


   
   
V V V V
# # # # # # # # # # # # # # # Smallish neural network
\/\/\/\/\/\/\/ / / / / / / /
/\/\/\/\/\/\/\///////
# # # # # # # #
\/\/\/ / / / /
/\/\/\////
# # # #
///

# < our final prediction!
( https://textik.com/#e650301054ce435f )
## Results
When we evaluated each image in our test set with this network, we got the following results:
![Plot of prediction results][7]
This plot shows actual zoom levels on the $x$axis, and estimated zoom on the $y$. A reference line shows what a perfect prediction would look like. Analyzed statistically, the network had a standard deviation of 30.37 km, and an $r^2$ value of 0.732.
This network clearly "gets the gist" of the data it's presented. However, these results were not portable to different satellite image datasets. When we evaluated a separate test set, gathered from Wolfram satellite imagery, we got this result:
![Plot of prediction results on Wolfram satellite data][8]
It is clear that the network is learning something specific to the DigitalGlobe dataset we used to train it. We would hesitate to call that overfitting, because it can extrapolate to locations it has never seen before, but it relies on the specific look and tone of the DigitalGlobe data. After all, the two image sets look very different:
![Satellite image comparison][9]
It is possible that the network is just confused by the new colors, rather than confused by the structure of the terrain in the Wolfram dataset. Either way, though, the scope of this result is limited to images similar to the satellite dataset on which it was trained.
We used another dataset of 20,000 mostlyoverlapping images of Massachusetts and exploited overfitting to achieve a much more accurate (yet fragile) prediction. By managing to overfit the terrain, the model could achieve up to an $r^2$ of 0.99, but completely fell apart on any other dataset. Theoretically, you could take this bruteforce approach with the entire planet, overfitting deliberately to learn the terrain. However, in our scope, we could not attempt this.
## Future work
One of the larger problems with this research was gathering data. In future attempts, it would be better to gather a much larger dataset (somewhere around 50,000 images) from several different satellite providers. To stop the network from overfitting on the finegrained style of the images, we would need to find satellite providers whose data are significantly different.
One other option is to improve the model's feature extraction layer. Right now, the FeatureExtraction function uses the first few convolutional layers of the Wolfram ImageIdentify classifier as a starting point. By training our own convolution step specific to satellite images on a much larger dataset, we might be able to get more accurate results.
# Final thoughts
This research was successful in very limited scope. Future attempts at cracking this problem will have to successfully generalize to the entire globe, across several satellite image providersa problem requiring a lot of time, and access to large computational resources, to solve.
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=coverimage.png&userId=1619260
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=fig_examples.png&userId=1619260
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=2123fig_structure.png&userId=1619260
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=fig_processing.png&userId=1619260
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=2381fig_predict.png&userId=1619260
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=fig_featurematrix.png&userId=1619260
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=fig_resultplot.png&userId=1619260
[8]: https://community.wolfram.com//c/portal/getImageAttachment?filename=fig_resultplotwolfram.png&userId=1619260
[9]: https://community.wolfram.com//c/portal/getImageAttachment?filename=fig_satcomparison.png&userId=1619260William Goodall20190711T23:29:28Z[WSC19] Automatic Metrical Scansion of Latin Poetry in Dactylic Hexameter
https://community.wolfram.com/groups//m/t/1732445
## Introduction ##
I've been studying Latin since eighth grade (I'm currently going into twelfth), and every year I fall more in love with the language. Last school year, I took AP Latin and was immediately drawn to the complex world of Latin poetry. At the same time, I entered an Artificial Intelligence class and continued to develop a passion for computer science. The idea of coding something to do with Latin (especially with poetry) has been simmering in my mind for a while, and I finally came up with and created this project over the past two weeks at camp.
Many significant works of Latin poetry follow the format of dactylic hexameter, meaning that every line is composed of some combination of six metrical feet, each of which is either a spondee (two long syllables) or a dactyl (one long syllable followed by two short syllables). The determination of the length of each syllable is based on the word itself, its position in the line, and the surrounding words and metrical feet. Scansion is the process of identifying the pattern of metrical feet in a line of Latin poetry. Having a quick way to scan lines makes reciting Roman poetry aloud easier, allowing an audience beyond just "Latin people" to appreciate the beauty of Latin rhythmical patterns, as well as the beauty of the language itself. In this project, I used machine learning to scan lines of Latin poetry in dactylic hexameter.
![image 1][1]
## Scansion ##
Latin poetry primarily uses two types of metrical feet: spondees, or two long syllables, and dactyls, or one long syllable followed by two short syllables. Different meters mandate varying quantities of total feet or of specific types of feet. In dactylic hexameter, the last two of the six total feet are almost always a dactyl and a spondee, and the first four can be either dactyls or spondees. The rest of the feet are determined using a series of rules and eventually process of elimination. Because the syllable lengths and endings of each word change based on its function in a sentence, scansion becomes complicated to automate by simply using a Latin dictionary.
A few basic rules for understanding the result of the project:
 Some vowels are designated long or short by nature of the word itself
 Diphthongs are always long
 Vowels followed by two consonants are usually long
 When a word ending with a vowel (or a vowel and the letter "m") is followed by a word starting with a vowel (or the letter "h" and a vowel), the last syllable of the first word "disappears" in what's called an elision
## Method ##
I originally planned to implement automatic scansion using the rules I'd learned in Latin class. However, I already knew how I could accomplish that, and, ultimately, I came to camp to learn and to explore coding in the Wolfram Language, not to do something I could have done at home. Hence, I embarked on an exciting (and probably less successful objectively) journey through neural networks in the Wolfram Language.
After finding and parsing data, I experimented with four different input/output combinations when developing my neural network and fiddled around with the numbers and layers for each. I ended up using a sequencetosequence network with an input of a line of text and an output of a list of probabilities that each character will be a long vowel; short vowel; or consonant, punctuation mark, or ignored vowel. From there, I converted the neural network output into a colorcoded line with metrical markings.
## Data ##
I got training data from AP Latin resources on scanning Vergil's The Aeneid on www.handsupeducation.org. The lines included syllabic length and metrical foot breaks and were formatted as such: "Ārmă vĭrūmquĕ cănō,  Trōiaē quī prīmŭs ăb ōrīs."
After removing markings for metrical feet, I converted the line and its markings into a list of integers (one for each character) in which 1 indicated a long vowel; 2 a short vowel; and 3 a consonant, punctuation mark, or ignored vowel. I then converted the file into a list of rules with the plain line leading to its associated list of character "lengths."
formatData[lines_]:=Table[plainLine[x] > allCharLengths[removeFootMarks[x]],{x, lines}]
Here is an example of a data point I used:
"Hinc via Tartarei quae fert Acherontis ad undas"> {3,1,3,3,3,3,2,2,3,3,1,3,3,2,3,2,1,3,3,3,3,1,3,3,1,3,3,3,2,3,3,2,3,1,3,3,2,3,3,2,3,3,1,3,3,1,3}
## Neural Network ##
I used 824 lines as my training data and 20 lines as my testing data. I trained a sequencetosequence neural network. I varied the EmbeddingLayer argument and the MaxTrainingRounds in order to find the best combination. I ended up using on 12 and 100.
net=NetChain[{EmbeddingLayer[12], NetBidirectionalOperator[LongShortTermMemoryLayer[32]], NetMapOperator[LinearLayer[3]],SoftmaxLayer[]}, "Input"> NetEncoder[{"Characters",{{"!","(",")",".",",",";","?",":",""",""", ""}>1," ",CharacterRange["a","z"], _}, "IgnoreCase"> True}]]
result = NetTrain[net, trainingData, All, LossFunction>CrossEntropyLossLayer["Index"], ValidationSet>testData, MaxTrainingRounds>1000]
The neural network I obtained had a error of 2.9% on the validation data, so an incorrectly classified character appears every few lines.
## Formatting the Neural Network Output ##
The neural network outputs a list of probabilities of 1, 2, and 3 for each character like this:
{{0.233388, 0.747286, 0.0193251}, {0.0237603, 0.102153,
0.874087}, {0.000247855, 0.00735917, 0.992393}, {0.0196234,
0.932275, 0.0481015}, {0.000736693, 0.0109938,
0.98827}, {0.000217165, 0.00133861, 0.998444}, {0.912511, 0.0845969,
0.00289202}, {0.00648514, 0.0000726016, 0.993442}, {0.941853,
0.00259834, 0.0555489}, {0.000389534, 0.0000140205,
0.999596}, {0.000185425, 0.000205691, 0.999609}, {0.00244669,
0.0108199, 0.986733}, {0.165211, 0.616111, 0.218678}, {0.000199178,
0.0000279639, 0.999773}, {0.0000528181, 3.42483*10^6,
0.999944}, {0.920206, 0.000883119, 0.0789105}, {0.00396218,
8.48394*10^7, 0.996037}, {0.992206, 0.0000137013,
0.00778067}, {0.00125184, 5.09544*10^7, 0.998748}, {0.000426706,
1.91505*10^6, 0.999571}, {0.000130377, 4.82552*10^6,
0.999865}, {0.0000298044, 9.46518*10^6, 0.999961}, {0.172772,
0.0202387, 0.80699}, {0.942826, 0.0060539, 0.05112}, {0.30773,
0.0000756251, 0.692195}, {0.850813, 0.000620744,
0.148566}, {0.000179449, 3.39269*10^6, 0.999817}, {0.0000538242,
0.00011357, 0.999833}, {0.000553092, 0.0406213,
0.958826}, {0.00931922, 0.985319, 0.00536204}, {0.0000223564,
0.000134355, 0.999843}, {0.00933962, 0.0149729,
0.975688}, {0.000112443, 0.000270864, 0.999617}, {0.470264,
0.527769, 0.00196643}, {0.000150167, 0.0000335595,
0.999816}, {0.239256, 0.209718, 0.551026}, {0.0000601962,
0.000211833, 0.999728}, {0.0000575157, 0.00195887,
0.997984}, {0.0931677, 0.556272, 0.35056}, {0.00254616, 0.00115129,
0.996302}, {0.0340716, 0.00185011, 0.964078}, {0.987841, 0.00371802,
0.00844058}, {0.0313373, 0.000806895, 0.967856}, {0.752181,
0.226585, 0.021234}, {0.00223789, 0.00123329, 0.996529}}
I formatted the probabilities into a list of the most probable integers for each character.
calcOutput[probs_]:= Table[If[Max[x] == x[[1]], 1, If[Max[x] == x[[2]], 2, 3]], {x, probs}]
From the list of integers, I formatted and colorcoded the newly scanned line.
markings = {Style["", Bold], Style["u", Smaller]}
overscripts[line_, lengths_] :=
scannedLine =
Table[If[lengths[[x]] == 3, StringPart[line, x],
Overscript[StringPart[line, x], markings[[lengths[[x]]]]]], {x,
Range[Length[lengths]]}]
colors[line_, lengths_] := (
newLine =
Table[If [lengths[[x]] == 3, line[[x]],
Style[line[[x]], If[lengths[[x]] == 1, Blue, Red]]], {x,
Range[Length[lengths]]}];
Style[Row[newLine], Large]
)
## Final Product ##
Given a line, the program guesses the syllabic pattern and outputs the colorcoded line with markings.
scan[line_]:=(
out = calcOutput[trainedNet[line]];
colors[overscripts[line, out], out]
)
![enter image description here][2]
## Conclusions and Future Extensions ##
Through this project, I was able to create a neural network that successfully determines and formats the metrical pattern of lines of Latin poetry in dactylic hexameter (2.9% error). The program takes a plain line of poetry and outputs the resulting colorcoded line with metrical markings.
In the future, markings between metrical feet could be added to the displayed lines. The program could also be extended to other types of meter or to poetry in other languages.
## Acknowledgements ##
I would like to thank my wonderful mentor, Christian Pasquel, for his endless support throughout the process. I would also like to thank the other Wolfram Summer Camp mentors and the Wolfram Summer School mentors and students.
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=newpresentationimage.PNG&userId=1725027
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=presentationline.png&userId=1725027Laney Moy20190712T00:50:30Z[WSC19] A Computational Method to Predict X Ray Diffraction (XRD) Patterns
https://community.wolfram.com/groups//m/t/1732535
![Predicted vs Experimental Silver XRD Pattern. Experimental plot obtained from: Koohpeima, Fatemeh & Mokhtari, Mohammad & Samaneh, KHALAFI. (2017). The effect of silver nanoparticles on composite shear bond strength to dentin with different adhesion protocols. Journal of Applied Oral Science. 25. 367373. 10.1590/1678775720160391. ][1]
## A Computational Method to Predict X Ray Diffraction (XRD) Patterns ##

## Background ##
Ever wondered how DNA's double helix structure was discovered? How drugs are investigated? Well, welcome to the world of Xray diffraction! 13 Nobel prizes were awarded for developments involving this old but effective technique, in fields ranging from physics to medicine. But, how is it so effective?
XRD is a powerful technique employed in various domains of science to determine the chemical makeup and thereby physical properties of various structures. Each lattice structure has its own "XRD fingerprint" which keys scientists in to its chemical makeup. This fingerprint is characterized by peaks with different intensities at different angles. Here is a sample for a facecentered cubic copper lattice structure:
![Copper XRD pattern: From XRay Diffraction Studies of Copper Nanopowder (arXiv:1003.6068v1 [physics.genph])][2]
The first image in this post is a comparison of experimental and predicted results for a Silver crystal structure.
However, predicting these fingerprints given little experimental data is a mathematically involved procedure. This summer, as part of the Wolfram High School Summer Camp, I implemented a framework for predicting these fingerprints for various cubic lattice structures.
## Creating the Program ##

## Getting the Bragg Peak Positions ##
You might be wondering what the numbers on the top of the peaks mean. These numbers are Miller indices, which are descriptions of the planes in a unit cell that are producing the peaks. The first step is to use these planes to generate the Bragg peak positions:
$$d=\frac{a}{\sqrt{h^2+k^2+l^2}}$$
$$ \theta =2\arcsin{\left(\frac{\lambda}{2 d}\right)} $$
Here, $a$ denotes the lattice constant (length of a side in a cubic unit cell), $(h,k,l)$ denote the Miller indices, and $\lambda$ denotes the wavelength of the Xray used. This is based on Bragg's law, see https://demonstrations.wolfram.com/BraggsLaw/ .
However, certain $(h,k,l)$ are forbidden in some structures. For example, in a bodycentered cubic structure, $ h+k+l$ has to be even. The function `PossiblePlanes` accounts for these and has access to an extensive dataset of compounds and their structures.
To make coding easier, a list of associations was made with a certain $\theta$ being the key for a list of $hkl$ values.
grouped[elementlist_, n_] :=
GroupBy[ PossiblePlanes[elementlist, n],
1/Sqrt[(#[[1]]^2 + #[[2]]^2 + #[[3]]^2)] &]
association[elementlist_, n_, wavelength_] :=
Sort[MapThread[#1 > #2 &, {ToTheta[wavelength, elementlist, n],
grouped[elementlist, n] // Values}]]
## Atomic Form Factor ##
To account for different electron densities, atomic form factors were calculated using a dataset tabulated by the International Tables for Crystallography: http://it.iucr.org/Cb/ch6o1v0001/. These form factors vary by angle; shown below is copper:
![Copper's Atomic Form Factor][3]
These atomic form factors are then used in the structure factor calculation, which is directly proportional to the square root of intensity. For unary systems, the structure factor calculation is relatively easy. For binary systems, however, the parity of the Miller indices must be taken into account.
evenodd[b_, elementlist_, theta_, w_] :=
If[b, Total[atomdata[#, theta, w] & /@ elementlist],
Differences[atomdata[#, theta, w] & /@ elementlist] // First]
Here, `atomdata` gives the atomic form factor at a specific point. This function is mapped to a set of True/False (Even or not) values and returns the structure factor. For a facecentered cubic cell, if the parity of $hkl$ is even, then the atomic form factors are summed, but if the parity is odd, the atomic form factors are subtracted.
## Multiplicities ##
Now, back to the Miller indices. Take a look at the following graphic:
![Miller Indices Felix Kling.svg. (n.d.). Retrieved from WikiMedia ][4]
You might notice that if we reflect $(100)$ we can get $(010)$ and $(001)$ . We can also get negative indices, usually denoted $\overline{1}$ instead of $1$. This gives us 6 total planes that are symmetryequivalent, and correspond to the same peak. Hence, we say that the class of Miller indices $(h00)$ has a multiplicity of 6. These multiplicities range from 6 to 48 for a cubic lattice structure, but can get as low as 2 with less symmetric structures.
Therefore, instead of calculating the contributed intensity of each plane, we count them as one plane and multiply the resultant intensity by a specific multiplicity. This multiplicity is then used to calculate peak intensity.
## Intensity Calculation ##
$$I_{hkl}=\underbrace{\frac{1+\cos^2 (2\theta)}{\sin^2(\theta)}}_{\text{Lorentz Polarization Correction}} \times \ \ \ \ \ \text{Multiplicity}_{hkl} \ \ \ \times \underbrace{F_{hkl}^2}_{\text{Structure Factor}}$$
The Lorentz polarization correction was introduced to improve accuracy and match experimental conditions as Xrays will not be completely polarized at every angle.
intensity[w_, elementlist_, n_] :=
Transpose@{(association[Flatten @ elementlist, n, w] //
Keys), (.5 (1 + (Cos[#])^2)/(Sin[#/2]^2 *
Cos[#/2])) & /@ (association[Flatten @ elementlist, n, w] //
Keys) *
(multiplicity /@ (Last /@ (association[Flatten @ elementlist, n,
w] // Values)))*(structurefactor[elementlist,
w, (association[Flatten @ elementlist, n, w])]) ^2 }
`intensity` gives a list of Bragg peak positions and their respective intensities using the aforementioned formula. This intensity function is then inputed in a function which finally plots the diffraction pattern.
peak[{theta_, intensity_}] :=
intensity * Exp[10000 Pi (t  theta)^2]
Where $t$ is the variable to be plotted against.
Here is a comparison of the predicted XRD pattern vs the real diffracted pattern for a Copper FCC structure:
![Experimental vs. Predicted XRD Pattern][5]
The absolute intensities have little use, as relative intensities are primarily used to analyze these patterns.

## Future Research ##
For future research, I have many ideas I want to implement. Thanks to Mr. Wolfram, I certainly have a lot to do this summer! Perhaps the most ambitious of my future plans is doing the inverse problem: predicting the lattice structure from a given XRD pattern.
## Acknowledgements ##
I would like to thank my mentor, Eryn Gillam, for helping me throughout my project. I would also like to thank the other mentors for their help, and Mohammad Bahrami for his lectures. Wolfram Summer Camp truly gave me an outlet to express my creativity in novel ways, and the two weeks I spent here were invaluable. Wolfram Summer Camp gave me a novel perspective on how to approach all aspects of life, and key insight into how computational thinking can change the world. For these reasons and more, I am beyond grateful to have been a part of this camp, and am looking forward to apply my new skills.
## Computational Essay ##
https://github.com/hamza314/WSSTemplate/blob/master/Final%20Project/Final%20Submission/Hamza%20Alsamraee%20WSC19.nb
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=main.PNG&userId=1725111
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=copperrealxrd.png&userId=1725111
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Capture.PNG&userId=1725111
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=360pxMiller_Indices_Felix_Kling.svg.png&userId=1725111
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=dual.PNG&userId=1725111Hamza Alsamraee20190712T00:56:45Z[WSC19] Use Machine Learning to Predict if Topography is Land or Ocean
https://community.wolfram.com/groups//m/t/1731676
##Goal of the Project##
Analyze and study algorithms for predicting if topography is land or ocean by using simple image processing to find the differences between onland relief plots and underwater relief plots because there is a difference between simple image processing and machine learning on images
##What is Topography?##
Topography is a broad term used to describe the detailed study of the earth's surface. This includes changes in the surface such as mountains and valleys as well as features such as rivers and roads. It can also include the surface of other planets, the moon, asteroids, and meteors. Topography is closely linked to the practice of surveying, which is the practice of determining and recording the position of points in relation to one another. For example, this is a topography of a random place.
![enter image description here][1]
##Collecting Data##
Mathematica provides lots of data of various places over the world, so I take advantages of this to scrap data of onland and underwater features for my topographies training set. The training data set includes:
 200 Onland features such as capital cities, cities, buildings, mountains.
buildings = EntityList["Building"]
buildings = Take[buildings, 50]
mountains = MountainData[All]
mountains = Take[mountains, 50]
cities = EntityList["City"] // Flatten
cities = Take[cities, 50]
 200 Underwater features such as trenches, basins, seamounts, etc.
underseafeatures =
DeleteMissing[
EntityValue[EntityClass["Ocean", "Seas"], "UnderseaFeatures"]] //
Flatten
underseafeatureslist = Take[underseafeatures, 200]
##Image Processing##
I've done various approaches onto this Image Processing step but only one is suitable.
**First Proposal**
First, I was trying to calculate the slope of each topography by the following formula:
![enter image description here][2]
However, it will not work effectively if there are onland topography and underwater one having the same slope. Consequently, the machine will not be able to differentiate which is land and which is ocean.
**Second Proposal**
Therefore, I moved on to my second proposal from my mentor Harshal, which is using Discrete Fourier Transformation to convert an image into a matrix. As I can see, the transformedimages using DFT do not help much because I can not distinguish the differences between images although the right one is land and the left one is ocean
![enter image description here][3]
As I can see, the transformed images using DFT do not help much because I can not distinguish the differences between images although the right one is land and the left one is ocean.
**Third Proposal**
Finally, my final solution is applying a highfrequency filter on the images. High frequency filtered output was easy to distinguish since surface above water is generally rough when compared to the surface underwater.
Initially, start with 2 relief plots, one is the plot of Mount Everest and one is the plot of Mariana Trench.
![enter image description here][4]
From this point, I can clearly see that the left one is Everest and the right one is Mariana Trench. The reason I can differentiate those two images because the image on the right side is rough, which means it has a high probability to be land and the image on the left side is smooth, which implies that it has a high probability to be ocean. So, let's try to find out a way to programmatically figure out these differences!
capitalcitieselevationPlots =
ReliefPlot[
Reverse[GeoElevationData[#, GeoRange > Quantity[4, "Kilometers"],
GeoProjection > Automatic]], ImageSize > Medium,
ColorFunction > GrayLevel, Frame > False,
PlotRangePadding > None] & /@ capitalcities
buildingselevationPlots =
ReliefPlot[
Reverse[GeoElevationData[#, GeoRange > Quantity[4, "Kilometers"],
GeoProjection > Automatic]], ImageSize > Medium,
ColorFunction > GrayLevel, Frame > False,
PlotRangePadding > None] & /@ buildings
mountainselevationPlots =
ReliefPlot[
Reverse[GeoElevationData[#, GeoRange > Quantity[4, "Kilometers"],
GeoProjection > Automatic]], ImageSize > Medium,
ColorFunction > GrayLevel, Frame > False,
PlotRangePadding > None] & /@ mountains
citieselevationPlots =
ReliefPlot[
Reverse[GeoElevationData[#, GeoRange > Quantity[4, "Kilometers"],
GeoProjection > Automatic]], ImageSize > Medium,
ColorFunction > GrayLevel, Frame > False,
PlotRangePadding > None] & /@ cities
underseafeatureselevationPlots =
ReliefPlot[
Reverse[GeoElevationData[#, GeoRange > Quantity[4, "Kilometers"],
GeoProjection > Automatic]], ImageSize > Medium,
ColorFunction > GrayLevel, Frame > False,
PlotRangePadding > None] & /@ underseafeatureslist
Those lines of code above transfer all the locations we got into the relief plots.
In the next step, I am going to apply a highfrequency filter on the images. To illustrate, for each relief plot, I turned in into black and white and also made the black and white version blurred then found the differences between those two and thus dilate it. This is how I did it in code.
finalcapitalcitieslist =
Dilation[ImageDifference[Blur[#, 20], #], 10] & /@
capitalcitieselevationPlots
finalbuildingslist =
Dilation[ImageDifference[Blur[#, 20], #], 10] & /@
buildingselevationPlots
finalmountainslist =
Dilation[ImageDifference[Blur[#, 20], #], 10] & /@
mountainselevationPlots
finalcitieslist =
Dilation[ImageDifference[Blur[#, 20], #], 10] & /@
citieselevationPlots
finalunderseafeatureslist =
Dilation[ImageDifference[Blur[#, 20], #], 10] & /@
underseafeatureselevationPlots
This returns:
![enter image description here][5]
These are two images are the images of Mount Everest and Mariana Trench after applying the highfrequency filter. As mentioned above, the left one is Mount Everest and the right one is Mariana Trench. Henceforth, the differences are lots of "bubbles" in the left image compare to the right one where the "bubbles" represent the abrupt changes in the initial topography.
##Creating Training Set##
I gathered all the separated training sets (capital cities, cities, buildings, mountains, undersea features) into a larger training set.
finalonlandlist =
Join[finalcapitalcitieslist, finalcitieslist, finalbuildingslist,
finalmountainslist]
finalunderseafeatureslist = finalunderseafeatureslist
finaltraininglist = Join[finalonlandlist, finalunderseafeatureslist]
##Generating Classify Function##
Creating the classify function base on the training set and also the result of each element in the set and the machine got the best fit line for my model by applying Logistic Regression method.
landExamplesAndClasses = (# > "Land" &) /@ finalonlandlist;
seaExamplesAndClasses = (# > "Sea" &) /@ finalunderseafeatureslist;
predict =
Classify[Union[landExamplesAndClasses, seaExamplesAndClasses]]
##Creating the Final Function to Return Probabilities##
I created the final function which returns the probabilities of both land and sea as the input is a topography of an arbitrary area.
predictor[img_] :=
predict[Dilation[ImageDifference[Blur[img, 20], img], 10],
"Probabilities"] // Normal
test = ReliefPlot[topography, Frame > False, PlotRangePadding > None]
predictor[test]
##An Application: "Generating Temperature Map and Smooth Plot"##
In this extra challenge, I randomly chose a place and generate the heat map of it and the smooth map as well.
**Generating Temperature Map**
What I have done was followed by this list:
 Get the coordinates of the current location.
 From current location, move to its right 15 times for 8000 meters in each step to get a list of new coordinates.
 From each coordinate in the list above, move downwards 15 times for 8000 meters in each step to get a list of new coordinates.
 Rearrange the list of coordinates vertical instead of horizontal.
 Join the lists of coordinates above all together into a big list.
 Make a list of plots for each coordinate in the list above.
 For each plot, apply the Final Function the get the Probability of Land and Water.
 Collect all the Land Probability of each plot and Arrange them into a 16x16sized matrix.
 From the 16x16sized matrix, convert each number into Thermometer Colors ranging from  0,5 to 0.5.
vitri = GeoPosition[Here]
vitringang =
Table[GeoDestination[GeoPosition[vitri], {dist, 90}], {dist, 0,
120000, 8000}]
vitridoc =
Table[GeoDestination[GeoPosition[#], {dist, 180}], {dist, 8000,
120000, 8000}] & /@ vitringang
vitridocdachinhsua = Table[vitridoc[[All, n]], {n, 1, 15}]
tatcavitri = Join[vitridocdachinhsua, vitringang] // Flatten
bando = ReliefPlot[
Reverse[GeoElevationData[{#, {(# /. GeoPosition[x_] > x)[[1]] 
0.05, (# /. GeoPosition[x_] > x)[[2]] + 0.05}},
GeoRange > Quantity[4, "Kilometers"],
GeoProjection > Automatic, GeoCenter > Automatic]],
ImageSize > Medium, ColorFunction > GrayLevel, Frame > False,
PlotRangePadding > None] & /@ (GeoPosition /@ tatcavitri)
topo1 = predictor[#] & /@ bando // Flatten
bandomoi =
Partition[
Values[topo1][[#]] & /@ Flatten[Position[Keys[topo1], "Land"]], 16]
MatrixPlot[bandomoi  .5,
ColorFunction > ColorData["ThermometerColors"]]
And it returns:
![enter image description here][6]
Let's compare it to the actual map:
![enter image description here][7]
The temperature map approximately matches the actual map. However, at some points, for instance, in the sea, the color of the pixel is more red than blue. I tried to figure out the reason and found out that it might be rough because it became a water body only recently approximately couple of tens of thousands of years ago.
**Generating Smooth Plot**
To do this, I just applied quite same process as generating the Temperature Map above.
point1 = GeoPosition[{11.111457, 107.636774}]
point2 = GeoPosition[{11.111457, 109.803591}]
point3 = GeoPosition[{8.991341, 107.636774}]
point4 = GeoPosition[{8.991341, 109.803591}]
rong = UnitConvert[GeoDistance[point1, point2], \!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "meters", Typeset`boxes$$ =
TemplateBox[{
InterpretationBox[" ", 1], "\"m\"", "meters", "\"Meters\""},
"Quantity", SyntaxForm > Mod],
Typeset`allassumptions$$ = {{
"type" > "Clash", "word" > "meters",
"template" > "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" > "2",
"Values" > {{
"name" > "Unit", "desc" > "a unit",
"input" > "*C.meters_*Unit"}, {
"name" > "Word", "desc" > "a word",
"input" > "*C.meters_*Word"}}}},
Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2},
Typeset`querystate$$ = {
"Online" > True, "Allowed" > True,
"mparse.jsp" > 0.3739998`7.0244163634533505,
"Messages" > {}}},
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic,
Dynamic[Typeset`query$$],
Dynamic[Typeset`boxes$$],
Dynamic[Typeset`allassumptions$$],
Dynamic[Typeset`assumptions$$],
Dynamic[Typeset`open$$],
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache>{80., {8., 18.}},
TrackedSymbols:>{
Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$,
Typeset`assumptions$$, Typeset`open$$,
Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle>{"Deploy"},
DeleteWithContents>True,
Editable>False,
SelectWithContents>True]\)] // QuantityMagnitude
dai = UnitConvert[GeoDistance[point1, point3], \!\(\*
NamespaceBox["LinguisticAssistant",
DynamicModuleBox[{Typeset`query$$ = "meters", Typeset`boxes$$ =
TemplateBox[{
InterpretationBox[" ", 1], "\"m\"", "meters", "\"Meters\""},
"Quantity", SyntaxForm > Mod],
Typeset`allassumptions$$ = {{
"type" > "Clash", "word" > "meters",
"template" > "Assuming \"${word}\" is ${desc1}. Use as \
${desc2} instead", "count" > "2",
"Values" > {{
"name" > "Unit", "desc" > "a unit",
"input" > "*C.meters_*Unit"}, {
"name" > "Word", "desc" > "a word",
"input" > "*C.meters_*Word"}}}},
Typeset`assumptions$$ = {}, Typeset`open$$ = {1, 2},
Typeset`querystate$$ = {
"Online" > True, "Allowed" > True,
"mparse.jsp" > 0.3739998`7.0244163634533505,
"Messages" > {}}},
DynamicBox[ToBoxes[
AlphaIntegration`LinguisticAssistantBoxes["", 4, Automatic,
Dynamic[Typeset`query$$],
Dynamic[Typeset`boxes$$],
Dynamic[Typeset`allassumptions$$],
Dynamic[Typeset`assumptions$$],
Dynamic[Typeset`open$$],
Dynamic[Typeset`querystate$$]], StandardForm],
ImageSizeCache>{80., {8., 18.}},
TrackedSymbols:>{
Typeset`query$$, Typeset`boxes$$, Typeset`allassumptions$$,
Typeset`assumptions$$, Typeset`open$$,
Typeset`querystate$$}],
DynamicModuleValues:>{},
UndoTrackedVariables:>{Typeset`open$$}],
BaseStyle>{"Deploy"},
DeleteWithContents>True,
Editable>False,
SelectWithContents>True]\)] // QuantityMagnitude
soluongbuocnhayngang = rong/8000 + 1 // Floor
soluongbuocnhaydoc = (dai  8000)/8000 + 1 // Floor
vitringang1 =
Table[GeoDestination[GeoPosition[point1], {dist, 90}], {dist, 0,
rong, 8000}]
vitridoc1 =
Table[GeoDestination[GeoPosition[#], {dist, 180}], {dist, 8000, dai,
8000}] & /@ vitringang1
vitridoc1dachinhsua =
Table[vitridoc1[[All, n]], {n, 1, soluongbuocnhaydoc}]
vitridoc1dachinhsua =
Table[vitridoc1[[All, n]], {n, 1, soluongbuocnhaydoc}]
tatcavitri1 = Join[vitridoc1dachinhsua, vitringang1] // Flatten
bando1 = ReliefPlot[
Reverse[GeoElevationData[{#, {(# /. GeoPosition[x_] > x)[[1]] 
0.05, (# /. GeoPosition[x_] > x)[[2]] + 0.05}},
GeoRange > Quantity[4, "Kilometers"],
GeoProjection > Automatic, GeoCenter > Automatic]],
ImageSize > Medium, ColorFunction > GrayLevel, Frame > False,
PlotRangePadding > None] & /@ (GeoPosition /@ tatcavitri1)
topo11 = predictor[#] & /@ bando1 // Flatten
bandomoi1 =
Partition[
Values[topo11][[#]] & /@ Flatten[Position[Keys[topo11], "Land"]],
soluongbuocnhayngang] // Flatten
kethopvitrichiso = MapThread[Rule, {tatcavitri1, bandomoi1}]
GeoSmoothHistogram[kethopvitrichiso,
ColorFunction > ColorData["ThermometerColors"]]
And here it is, this is the Smooth Map of a part of Vietnam which contains sea and land.
![enter image description here][8]
Red regions in the heat map show land and blue show ocean and as visible
##Wrappingup##
To summarize, in the first step, I collected the topographies from both onland and underwater. Then in the second one, the most complicated step, I had tried to use Discrete Fourier Transform (DFT) for a topography image into the matrix, but the problem was that all Fourier images were very similar. Then I tried applying a highfrequency filter on the images. High frequency filtered output was easy to distinguish since surface above water is generally rough when compared to the surface underwater. After the Image Processing Step, I created the two set of training sets, one is onland training set and another one is underwater features training set. Hence, I went straight to build the classify function and the Final Function to Return Probabilities of "Land" and "Sea" as well. Finally, for an extra challenge also a nice application in other words for my project, I generated a Temperature Map of a part of Boston and tried to compare it to the actual map and the generated heat map is similar to the real map.
##Future Works##
As we can see, from an arbitrary relief plot of a location, it is possible to turn the relief one into the thermometer surface where blue represents "Sea" and red represents "Land" by using the probabilities that the classifier returned. The first thing that needs to be done is to think of other crucial parameters that can be used, like water bodies around buildings, etc. to increase the accuracy. After increasing the accuracy and interesting thing to do would be to analyze the topography of other celestial bodies (like Mars) and then predict regions where liquids flowed on the surface.
**Are these the images of wet Mars in the past and current Mars? Who knows.**
![enter image description here][9]
##Special Thanks##
I would like to give my special thanks to my mentor Mr. Harshal Gajjar for guiding me and also Mr. Mads Bahrami for giving me more challenges!
[Final Project GitHub link][10]
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img6.jpg&userId=1725013
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img1.png&userId=1725013
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img12.png&userId=1725013
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img4.png&userId=1725013
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img11.png&userId=1725013
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img8.png&userId=1725013
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img9.png&userId=1725013
[8]: https://community.wolfram.com//c/portal/getImageAttachment?filename=4910img10.png&userId=1725013
[9]: https://community.wolfram.com//c/portal/getImageAttachment?filename=img7.jpg&userId=1725013
[10]: https://github.com/mkhangg/WSSFinalProjectMinh Khang Nguyễn20190711T20:52:51ZWhich countries did @realDonaldTrump tweet about?
https://community.wolfram.com/groups//m/t/1727272
Introduction

A couple of days ago on 1 July [The Economist][1] tweeted [this][2]:
> Since he was elected in 2016 Donald Trump has made 1,384 mentions of foreign countries on Twitter. Can you guess which one he named most often?
[It claims][3] that in spite of the "special relationship" with the UK, it is only ranked 15th of the countries and territories tweeted about. It also says that Puerto Rico, Mexico and China are in fifth, fourth and third places respectively (countries and territories). According to The Economist North Korea is ranked in second place with 163 mentions.
A couple of years ago I read the excellent book "A Mathematician Reads the Newspaper" by John Allen Paulos; and I wonder how much of the daily news coverage can we check using the Wolfram Language. *In a future post I will speak about another project that we are doing with several members of this community that goes in a similar direction. We call it "computational conversations". With a bit of luck you might hear about it at the [Wolfram Technology Conference][4] later this year.*
Initial analysis

It turns out that I have been monitoring @realDonaldTrump's tweets using IFTTT since early 2017. I attach excel files to this post. To have a look at the first tweet we first set the directory and load the raw data files:
SetDirectory[NotebookDirectory[]]
dataraw = Import /@ FileNames["Trump*.xlsx"];
As the first file (without a number) will be read in last (alphabetical order), this is the first tweet data:
dataraw[[5, 1, 1]] // TableForm
![enter image description here][5]
It is from January 26th, 2017, a couple of days after his inauguration.
In oder to figure out which countries Mr Trump talks about we use the function TextCases, a recently updated function:
tweettexts = Join[dataraw[[1, 1]], dataraw[[2, 1]], dataraw[[3, 1]], dataraw[[4, 1]], dataraw[[5, 1]]][[All, 2]];
locations = TextCases[StringJoin[tweettexts], "LocationEntity" > "Interpretation", VerifyInterpretation > True];
I find
Length@locations
5768 locations; these will not only include direct mentions of countries but also locations within countries. These locations will be in Entityform:
locations[[1;;20]]
![enter image description here][6]
Let's get that apart. First we make a list of all countries in the world:
purecountries = # > {#} & /@ EntityList[EntityClass["Country", "Countries"]];
If we select all direct mentions of countries we obtain:
Select[locations, MemberQ[purecountries[[All, 1]], #] &] // Length
3624 mentions; if we exclude the 1349 mentions the US, we are left with 2275 country names. Despite our list starting with later tweets we obtain substantially more mentions of countries than The Economist (1,384). We can now generate a table of the mentions of all countries:
TableForm[Flatten /@ Transpose[{Range[Length[#]  1], Delete[#, 5]}] &@({#[[1]], #[[2]]} & /@
Normal[ReverseSort[Counts[CommonName@(Select[locations, MemberQ[purecountries[[All, 1]], #] &])]]])]
![enter image description here][7]
(This is only the top of the list.) Note, that North Korea is missing, but will be very prominent in the next table.... Next we can check for "indirect" mentions of a country, i.e. Louvre would lead to a mention of France etc. We will find many more entities and will first generate a list of substitution rules:
countriesrules = # > Check[GeoIdentify["Country", #], {#}] & /@ (Complement[DeleteDuplicates[locations], EntityList[EntityClass["Country", "Countries"]]]);
We will ignore the error messages for now. We can then generate a table that includes the "indirect" mentions, too:
TableForm[Flatten /@ Transpose[{Range[Length[#]  1], Delete[#, 5]}] &@({#[[1]], #[[2]]} & /@
Normal[ReverseSort[Counts[CommonName@(DeleteMissing[Flatten[locations /. countriesrules]])]]])]
![enter image description here][8]
Note, that on rank 4 we find Media, which is not a country. It is easy to clean out, but I leave it in to show the performance of the code so far. We could now make typical representations such as GeoBubbleCharts:
GeoBubbleChart[Counts[DeleteMissing[Flatten[locations /. countriesrules]]], GeoBackground > "Satellite"]
![enter image description here][9]
We can now make a BarChart (on a logarithmic scale) selecting "purecountries" like so:
BarChart[ReverseSort@<
Select[Normal@
Counts[DeleteMissing[Flatten[locations /. countriesrules]]],
MemberQ[purecountries[[All, 1]], #[[1]]] &]>,
ScalingFunctions > "Log",
ChartLabels > (Rotate[#, Pi/2] & /@
CommonName[
ReverseSortBy[
Select[Normal@
Counts[DeleteMissing[Flatten[locations /. countriesrules]]],
MemberQ[purecountries[[All, 1]], #[[1]]] &], Last][[All,
1]]]), PlotTheme > "Marketing",
LabelStyle > Directive[Bold, 15]]
![enter image description here][10]
We can also represent that on a world wide map:
styling = {GeoBackground > GeoStyling["StreetMapNoLabels",
GeoStylingImageFunction > (ImageAdjust@ColorNegate@ColorConvert[#1, "Grayscale"] &)],
GeoScaleBar > Placed[{"Metric", "Imperial"}, {Right, Bottom}], GeoRangePadding > Full, ImageSize > Large};
GeoRegionValuePlot[
Log@<Select[Normal@Counts[DeleteMissing[Flatten[locations /. countriesrules]]], MemberQ[purecountries[[All, 1]], #[[1]]] &]>, Join[styling, {ColorFunction > "TemperatureMap"}]]
![enter image description here][11]
Further analysis

We can of course look at many other features of the tweets. One is a simple sentiment analysis. I am not at all convinced that the result of this attempt are useful or representing an actual pattern. But this is what we could do:
emotion[text_] := "Positive"  "Negative" /. Classify["Sentiment", text, "Probabilities"]
and then
tweetssentiments = emotion /@ tweettexts;
ListPlot[tweetssentiments, PlotRange > All, LabelStyle >
Directive[Bold, 15], AxesLabel > {"tweet number", "sentiment"}]
![enter image description here][12]
Using a SmoothHistogram, we see a pattern of "extremes", negative, neutral, positive:
SmoothHistogram[tweetssentiments, PlotTheme > "Marketing",
FrameLabel > {"sentiment", "probablitiy"},
LabelStyle > Directive[Bold, 16], ImageSize > Large]
![enter image description here][13]
We can also ask for less relevant information, such as the colours mentioned in the tweets:
textcasesColor = TextCases[StringJoin[tweettexts], "Color" > "Interpretation", VerifyInterpretation > True]
![enter image description here][14]
So there is a lot of white, some black, red and green:
ReverseSort@Counts[textcasesColor]
![enter image description here][15]
Let's blend these colours together:
Graphics[{Blend[textcasesColor], Disk[]}]
![enter image description here][16]
We can also look for "profanity" in tweets:
textcasesProfanity = TextCases[StringJoin[tweettexts], "Profanity"];
and represent these tweets in a table:
Column[textcasesProfanity, Frame > All]
![enter image description here][17]
It is not quite clear to my why some of the tweets are classified as containing profanity. For some tweets it is relatively obvious, I think.
Twitter handles

Another interesting analysis is to look at the twitter handles that @realDonaldTrump uses:
textcasesTwitterHandle = TextCases[StringJoin[tweettexts], "TwitterHandle"];
Here are counts of the 50 most common handles:
twitterhandles50 = Normal[(ReverseSort@Counts[ToLowerCase /@ textcasesTwitterHandle])[[1 ;; 50]]]
![enter image description here][18]
Last but not least we can make a BarChart of that:
BarChart[<twitterhandles50>, ChartLabels > (Rotate[#, Pi/2] & /@ twitterhandles50[[All, 1]]),
LabelStyle > Directive[Bold, 14]]
![enter image description here][19]
and to compare the same on a logarithmic scale:
BarChart[<twitterhandles50>, ChartLabels > (Rotate[#, Pi/2] & /@ twitterhandles50[[All, 1]]),
LabelStyle > Directive[Bold, 14], ScalingFunctions > "Log"]
![enter image description here][20]
A little word cloud

Just to finish off we will generate a little word cloud like so:
allwords = Flatten[TextWords /@ tweettexts];
WordCloud[ToLowerCase /@ DeleteCases[DeleteStopwords[ToString /@ allwords], "&amp;"]]
![enter image description here][21]
The cloud picks up on "witch hunt" and "collusion", "@foxandfrieds" and "Russia", "fake", "border" as well as other terms that indeed are relatively prominent in the media.
Conclusion

The main objective of this was to look try to reproduce at least qualitatively the results of the twitter analysis of @realDonaldTrump's tweets by The Economist using the Wolfram Language. We have been using a slightly different period of the tweets. We have been looking at direct mentions and "indirect" ones. I have not made any manual comparison of the results. I am not sure whether the recognition has worked and I only post it as a first cursory analysis.
It was relatively easy to go beyond the analysis and look at other features of the tweets, too.
[1]: https://www.economist.com
[2]: https://twitter.com/TheEconomist/status/1145467208950329344
[3]: https://www.economist.com/graphicdetail/2019/06/04/theworldaccordingtodonaldtrump
[4]: http://www.wolfram.com/events/technologyconference/2019/
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.12.59.png&userId=48754
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.27.41.png&userId=48754
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.34.11.png&userId=48754
[8]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.41.34.png&userId=48754
[9]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.43.14.png&userId=48754
[10]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.45.38.png&userId=48754
[11]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.24.25.png&userId=48754
[12]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.53.33.png&userId=48754
[13]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071000.55.22.png&userId=48754
[14]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.05.39.png&userId=48754
[15]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.07.12.png&userId=48754
[16]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.08.07.png&userId=48754
[17]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.09.33.png&userId=48754
[18]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.12.15.png&userId=48754
[19]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.13.26.png&userId=48754
[20]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.14.50.png&userId=48754
[21]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot2019071001.33.04.png&userId=48754Marco Thiel20190710T00:40:42Z[Numberphile] James ❤️ A Card Trick Simulation
https://community.wolfram.com/groups//m/t/1718762
####*Please download the notebook at the end of the discussion including dynamic feature*####

![simulation][1]
July 4th is a fabulous holiday season for American families to enjoy the summer with road trip and BBQ party. You can use a deck of poker to entertain your kids and family friends with [James Grime's][2] new card trick on [Numberphile][3].
##Background##
You are given $10$ cards, from Ace to 10. Shuffle them and split them into $2$ rows with five cards in each. Sort one row in ascending order, the other in descending order. Both orders are oriented from left to right. Calculate the absolute values of the difference between two rows and add them up. You can play the poker game multiple times and you will find the sum is always $25$. If you do not have poker at hand, please take a look at the animation above.
##Code##
Let's work on the problem with Mathematica to help us understand. In the code I use the icons from this website:
$icon="https://www.iconfinder.com/icons/1331558/card_game_diamond_gambling_game_jewelry_poker_sport_icon";
then I directly assign them to `spade` and `diamond` in a notebook:
![spadediamond][4]
Use `Framed` to create a nice and simple poker card:
cardpicture1[{suit_, val_}] := Framed[Column[{val, suit}, Center, 0, ItemSize > {6, 2},
ItemStyle > Directive[24, "Label", Bold]], RoundingRadius > 8]
![poker][5]
Use `Map` to create a list of poker cards:
n=10;
numbers = Range[n];
pokers1 = AssociationMap[cardpicture1[{spade, #}] &, numbers];
differences = AssociationMap[cardpicture1[{diamond, #}] &, numbers];
Now lets shuffle the deck of 10 cards:
shuffle=Permute[numbers,RandomPermutation[10]]
(* {3,6,7,5,9,10,8,4,1,2} *)
I use this generic permutation method because you can replace `numbers` with other list like 10 prime numbers.
Then split it into two rows
{row1,row2}={#[[1;;n/2]],#[[n/2+1;;]]}&[shuffle]
(* {{3,6,7,5,9},{10,8,4,1,2}} *)
Style[Grid[Map[pokers1, {row1, row2}, {2}]], ImageSizeMultipliers > {1, 1}]
![deck][6]
The last two steps are to sort two rows and find the sum of the absolute values of the pairwise differences between the two rows. I use `Grid` an `Style` to make the output neat.
With[{r1=Sort@row1,r2=Sort[row2,Greater]},
Module[{sum},sum=Inactive[Plus]@@Abs[r2r1];
Style[Grid[{
pokers1/@r1,
pokers1/@r2,
(differences/@Abs[r2r1]),
{Null,SpanFromLeft},
{Style[Row@{sum," = ",Activate[sum]},Large,Italic],SpanFromLeft}}],
ImageSizeMultipliers>{1, 1}]
]
]
![sorteddeck][7]
##Proof##
You might already notice I have variables like `pokers1` and `cardpicture1` in the code above. I am definitely hiding from something from you.
The companion pokers I have, namely `pokers2`, is a deck of super power cards of which the styles get updated given the face value greater than $5$:
pokers2 = AssociationMap[cardpicture2[{spade, #}] &, numbers];
where
cardpicture2[{suit_, val_}] :=
Framed[Column[{val, suit}, Center, 0, ItemSize > {6, 2},
ItemStyle > Directive[24, "Label", Bold]],
FrameStyle > Directive[If[val > 5, Dashed, Black]],
RoundingRadius > 8]
Once we use the super power card, we can see a pattern matches exactly what James claimed in the Numberphile video (using contradiction of stacking dashed/purple numbered card):
![hint][8]
Lets take look at one frame:
![direction][9]
The third diamond line is from the following flipping transition (dashsolid or purpleblack)
![transition][10]
$ 25 = 7 + 3 + 2 + 5 + 8$
$ = (10  3) + (8  5 ) + (6  4 )+ (7  2 ) + (9  1) $
$= (10 + 9 + 8 + 7 + 6) (1 + 2 + 3 + 4 + 5)$
The sorted rows tells you that all dashed card must sit together. Same happens to the solid line cards. Therefore there is exactly one transition from solid > dash or dash > solid in each row.
If dashed cards overlapping vertically means we either have at least 6 such cards or one row is not properly sorted. This is contradiction.
##Comment##
There are nice functions used here
 Use `Inactive[Plus]@@{...}` and then to active the sum expression with `Activate[]`: prints nice hold sum formula and delayed evaluation
 `AssociationMap` to create keypoker face dictionary
 Animation is generated by [Refresh][11]. Attached at the end of the thread.
 Poker design code is from [this demonstration][12]
Code snippet just for the calculation (minimum visualization):
n = 12;
{row1, row2} = {#[[1 ;; n/2]], #[[n/2 + 1 ;;]]} &[Permute[Range[n], RandomPermutation[n]]]
f[m_] := If[m > n/2, Framed[m], m]
{Grid[{row1,row2}],Sequence@@With[{r1=Sort@row1,r2=Reverse@Sort@row2},{Grid[Map[f,{r1,r2},{2}]],Abs[r1r2]//Total}]}
![shortcode][13]
Formula:
Sum[k, {k, 2*m}]  2*Sum[k, {k, m}] // Simplify
(* m^2 *)
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=simulation.gif&userId=23928
[2]: http://bit.ly/grimevideos
[3]: https://www.youtube.com/watch?v=_Wv_qw3nQnI
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=input.png&userId=23928
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=input2.png&userId=23928
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=usdeck.png&userId=23928
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=sdeck.png&userId=23928
[8]: https://community.wolfram.com//c/portal/getImageAttachment?filename=simulation_hint.gif&userId=23928
[9]: https://community.wolfram.com//c/portal/getImageAttachment?filename=labeld2.png&userId=23928
[10]: https://community.wolfram.com//c/portal/getImageAttachment?filename=labeld3.png&userId=23928
[11]: https://reference.wolfram.com/language/ref/Refresh.html#Examples
[12]: http://demonstrations.wolfram.com/Poker/
[13]: https://community.wolfram.com//c/portal/getImageAttachment?filename=quick.png&userId=23928Shenghui Yang20190703T18:35:12ZAnamorphic Writing and a Birthday card
https://community.wolfram.com/groups//m/t/1724556
While surfing to a web page of the [Ephemara Society by Richard D. Sheaff][1], I came across these amazingly strange "Puzzle cards, written in "anamorphic writing" and alledgedly popular back in the 19th century.
![puzzle card][2]
These anamorphic writings are an application of perspective anamorphism. A well defined viewpoint or view direction is needed to see these images correctly. In this case, the view direction has to be at a very sharp angle, almost flat across the surface of the text image.
Another, less extreme, example of this are the "STOP" or "BUS" signs painted on city streets. Here the correct viewpoint is at an angle of approximately 25 degrees as is the case of a driver approaching the signs.
![stop sign][3]
To create similar cards with Mathematica, a hefty stretching of the text vertically(about 25 times) is needed. We can use ScalingTransform[{1,25}] as I did in my Wolfram Demonstration "Anamorphic Text Messages".
This is code for a similar "Puzzle card" with two perpendicular words stretched in two directions.
Module[{g1, g2, gt1GP,
gt2GP}, {g1,
g2} = (First[
First[ImportString[
ExportString[
Style[#1, FontFamily > "Arial", FontSize > 72,
FontTracking > "Extended"], "PDF"]]]] &) /@ {"TREACHEROUS",
"PERSPECTIVES"}; {gt1GP,
gt2GP} = (GeometricTransformation[#1,
ScalingTransform[{1, 25}]] &) /@ {g1, g2};
Graphics[Rotate[{gt1GP, Rotate[gt2GP, \[Pi]/2]}, \[Pi]/4]]]
![enter image description here][4]
And this is a GIF showing how to read the card in both directions by rotating it around a vertical axis while looking almost parallel to the card's surface:
![perspectives][5]
More than 2 directions for the stretched text are possible.
Module[{g1, g2, g3, gt1GP, gt2GP,
gt3GP}, {g1, g2,
g3} = (First[
First[ImportString[
ExportString[
Style[#1, FontFamily > "Arial", FontSize > 72,
FontTracking > "Extended"], "PDF"]]]] &) /@ {"A HAPPY",
"BIRTHDAY", "LOVE YOU"}; {gt1GP, gt2GP,
gt3GP} = (GeometricTransformation[#1,
ScalingTransform[{1, 25}]] &) /@ {g1, g2, g3};
gr = Graphics[{gt1GP, Rotate[gt2GP, \[Pi]/3],
Rotate[gt3GP, (2 \[Pi])/3]}]]
![happy bday][6]
The above can be used in an "*anamorphic puzzle birthday card*" (file attached). The card may cause some deciphering by the recipient. You can adapt the texts in the code and make your own secret card to surprise your nerd friends!
Graphics[{gr[[1]], FaceForm[{Blue, Opacity[.12]}],
EdgeForm[{Darker[Blue, .5], AbsoluteThickness[6]}],
Rectangle[{400, 100}, {600, 1500}],
Text[Style["A secret mesage for you today...", Darker[Red, .35],
15], Scaled[{.5, .95}]],
Text[Style[
"Instruction: Close one eye, hold the card at an angle and look \
flat over the surface...\nlook in the 3 directions...", Black, 6],
Scaled[{.5, .05}]]}]
![enter image description here][7]
[1]: http://www.sheaffephemera.com/list/anamorphic_writing.html
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=3872puzzlecard2.jpg&userId=68637
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=busprintonstreet.jpg&userId=68637
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=2656treacherousperspectives.png&userId=68637
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=treacherousperspecives.gif&userId=68637
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=10420happybday.png&userId=68637
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=birthdaycard.jpeg&userId=68637Erik Mahieu20190708T14:35:27ZDashed lines in points
https://community.wolfram.com/groups//m/t/1742916
hi,
i'made a graphic represantation of f(x)=2x function and i want to mark some of the points with dashed lines across the axes x and y.Can anybody help me please?Below i give you the code so far.
Thank you.
Plot[{2 x}, {x, 0, 10}, PlotLabel > Γραφική Παράσταση, AxesLabel > {Σοκολάτες, Χρήματα},
Epilog > {PointSize[Large], Point[{{0, 0}, {1, 2}, {2, 4}, {3, 6}, {6, 12}, {10, 20}}]},
Ticks > {{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, {0, 1, 2, 3, 5, 10, 12,15, 20}}]Dimitris PAPAIOANNOU20190721T19:24:28Z[WSS19] Monitoring the development and spread of cities
https://community.wolfram.com/groups//m/t/1727982
# Introduction
Cities all over the world grow over time at different classes and rates. It is beneficial for citizens, companies, governments, ...etc to get an estimate of these rates and the direction of the development. In this study we explore several ways to collect and process data (manual and through an API), and try to find suitable way to predict the development a city using satellite images.
# Data
Satellite data images are becoming more available than ever across the globe. They offer images with different information layers that highlight points (areas) of interest accompanied with numeric and nominal data. There are several ways to obtain this kind of datasets such as purchasing them from a provider, but there are some other methods to obtain datasets for free such as for example downloading images through Google Earth Pro application, which allows user to explore images of certain position through time in a simple way and it switches between images from different satellites while the user changes parameters like zoom level and time. Another way to get free satellite images is by using an API from data provider which is more time efficient and can provide prehighlighted areas of interest in the given images.
### Manual collection of satellite images
The following animation is a timelapse of the development of Dubai and its surrounding area through the time interval 1984  2016 made by images we collected manually from Google Earth Pro and stored them as a gif file.
![enter image description here][1]
Using such dataset specially if it contains sand covered areas, allows limited control of choosing specific class to study. For example if the desired class to study is urban change, then separating urban developed areas from sand areas based on color classification may produce in accurate data even with high resolution images.
If the desired study is for overall development of the city including all the classes, then this dataset can be used by binarize the image differences over time.
imgList = Import["your list of collected images.jpg"];
diffList = Differences[imgList];
ImageMeasurements[Binarize[#, .1], "Total"] & /@ diffList // ListLinePlot
and to monitor the change rate between images over time we plotted the sum of data values in the binarized image differences.
diffdata = ImageMeasurements[Binarize[#, .1], "Total"] & /@ diffList;
diffdata // ListLinePlot
![enter image description here][2]
The spicks here are due to some anomalies in the images, in that specific data case the spicks are due to wind or sand storms that changed the surface structure of some sand covered areas, these changes appeared in two of the image differences. In order to remove these anomalies we removed the outliers from the list of total data in the binarized image differences.
q = Quartiles[diffdata];
maxq = q[[3]] + 1.5 (q[[3]]  q[[1]]);
minq = q[[1]]  1.5 (q[[3]]  q[[1]]);
newimgList =
Delete[imgList, Position[diffdata, _?(minq > #  # > maxq &)]];
newdiffList = Differences[newimgList];
newdiffdata =
ImageMeasurements[Binarize[#, .1], "Total"] & /@ newdiffList;
newdiffdata // ListLinePlot
![enter image description here][3]
ListAnimate[Accumulate[Binarize[#, .1] & /@ newdiffList],
ImageSize > Large]
![enter image description here][4]
In order to compare the change rate for multiple areas (with the same parameters values) we calculated the average of change rate over time.
Mean@newdiffdata
10852.4
### API utilization for collecting Satellite images
We were able to connect to the API of NASA's website and get images with acceptable resolution and many classes. We first got the classes color code then we started to receive images of the metropolitan area of Shanghai since it is the city with the highest change rate over time among the most populated ones in the world.
rawLegend =
Import["https://gibs.earthdata.nasa.gov/colormaps/v1.3/MODIS_\
Combined_IGBP_Land_Cover_Type_Annual.xml"];
legend = Association@
Cases[rawLegend,
XMLElement[
"LegendEntry", {"rgb" > color_, "tooltip" > name_,
"id" > _}, {}] :>
RGBColor[ToExpression /@ StringSplit[color, ","]/255] > name,
Infinity];
legendReverse = AssociationMap[Reverse, legend]
topcities =
EntityList[
EntityClass["MetropolitanArea", "Population" > TakeLargest[10]]]
Which get the list of Tokyo, Mexico City, Seoul, Mumbai, Sao Paulo, Manila, New YorkNorthern New JerseyLong Island, NYNJPA, Jakarta, New Delhi, Shanghai.
GeoListPlot[
EntityClass["MetropolitanArea", "Population" > TakeLargest[10]][
"Position"], GeoRange > "World", GeoProjection > "Robinson"]
![enter image description here][5]
cityImgs10 =
Function[p,
GeoImage[GeoDisk[p, Quantity[70, "Miles"]], GeoZoomLevel > 8,
GeoServer > \
{"https://gibs.earthdata.nasa.gov/wmts/epsg3857/best/MODIS_Combined_\
L3_IGBP_Land_Cover_Type_Annual/default/" <>
DateString[#, {"Year", "", "Month", "", "Day"}] <>
"/GoogleMapsCompatible_Level8/`1`/`3`/`2`.png",
"ZoomRange" > {1, 8}}, Background > Black] & /@
DateRange[DateObject[{2001, 1, 1}], DateObject[{2017, 1, 1}],
"Year"]] /@
EntityClass["MetropolitanArea", "Population" > TakeLargest[10]][
"Position"];
binaryMaps =
Map[Binarize[
ColorDetect[#, legendReverse["Urban and Builtup Lands"]],
0.9999] &, cityImgs10, {2}];
newdiffList10 = Differences[#] & /@ binaryMaps;
newdiffdata10 = ImageMeasurements[#, "Total"] & /@ newdiffList10;
ListLinePlot[newdiffdata10,
PlotLegends >
ReplacePart[EntityValue[topcities, "Name"], 7 > "NY NJ"],
PlotLabel > Style["change rate of urban growth", Black, 15]]
![enter image description here][6]
With[{data =
ReverseSort@
AssociationThread[
ReplacePart[EntityValue[topcities, "Name"], 7 > "NY NJ"],
Mean /@ newdiffdata10]},
BarChart[data, ChartLabels > (Rotate[#, Pi/2] & /@ Keys[data]),
PlotLabel >
Style["Average change rate of urban growth", Black, 15]]]
![enter image description here][7]
{ListAnimate[cityImgs10[[10]]], ListAnimate[binaryMaps[[10]]]}
![enter image description here][8]
![enter image description here][9]
# Prediction of future urban development
To predict the next urban development in a certain area, we need historical data which we already obtained in the previous section. Since Wolfram language provide several methods for prediction we decided to look for a suitable method for our study. In case of classified images, only 17 years available in historical data that are available to be obtained via API (2001  2017), which is not enough to train Neural Network for prediction, so we decided to simulate a convolution layer and use it among a statistical model. First we use the binarized classified images, by that we only limit the study to the urban areas. Then we transform the image into binary vectors, a vector for each raw of pixels. A pixel place is one if this place is part of the urban area, and zero otherwise. Next we divide the training images into smaller parts, each part and its historical versions will be used to predict a corresponding pixel.
We divided the training set into sets of five consecutive images with offset equal to one. each group of five is used such that the first four images lead to the fifth one in prediction, by that we increased the size of training data.
### Methods
Decision trees is one of the suitable methods for these kind of predictions. We used the first sixteen images to predict the following 7 images.
binaryMapsShanghai = binaryMaps[[10]];
rules = Catenate[
Function[maps,
Module[{flatMatrices},
flatMatrices =
Catenate@
Transpose[
Partition[ImageData[#, "Bit"], {3, 3}, {1, 1}, 2, 0] & /@
Most[maps], {3, 1, 2, 4, 5}];
Thread[flatMatrices > Catenate@ImageData[Last[maps], "Bit"]]
]] /@ Partition[binaryMapsShanghai[[;; 2]], 5, 1]
];
pf = Predict[rules, Method > "DecisionTree"]
predictimg[pf_] := Module[{output, predimg},
output =
Map[pf, Transpose[
Partition[ImageData[#, "Bit"], {3, 3}, {1, 1}, 2, 0] & /@
binaryMaps[[4 ;; 1]], {3, 1, 2, 4, 5}], {2}];
predimg = Binarize[Image@output, 0.1];
AppendTo[binaryMaps, predimg];
];
predictimg[pf];
Every time the function `predictimg` is being called, it generates a new predicted binary image and add it to the list of binary images of the area in study. Here is an example of seven images generated by this function.
![enter image description here][10]
A closer look.
![enter image description here][11]
In order to see how good the model is, we fitted a linear regression ,model on the change rate of urban development for Shanghai and then compared the expected results with the ones we got from our model.
rateChange =
LinearModelFit[Transpose[{Range[16], newdiffdata10[[10]]}], x, x]
ListPlot[Transpose[{(rateChange[#] & /@ Range[17, 22]), {271.`, 152.`,
88.`, 57.`, 34.`, 15.`}}], Filling > Axis,
PlotLabel >
Style["Relation between expexted and generated change rate", Black,
15]]
![enter image description here][12]
The model was quite cautious in deciding the next spread parts but we believe with some tweaks of the inputs we can get better results in the future versions of that model. We tried other statistical methods, for example Linear regression and Random Forest, but they did not show any change in the predicted images. We also tried the Nearest Neighbor method, which need much more time to produce an output and due to time limitations, we couldn't produce an output using this method.
# Future plan
The model is in its early stage. More investigation in available methods. We still do not know how the Nearest Neighbor method will behave. Also there may be other API's that allow more data to be imported which may give better results. Tweaking the inputs may also affect the outputs and with some optimizations we may make better use of data at hand.
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=9161Dubai2.gif&userId=1700735
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=spikes.png&userId=1700735
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=clean.png&userId=1700735
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=4117dubaibinary.gif&userId=1700735
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=cities.png&userId=1700735
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=changerate.png&userId=1700735
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=changerate2.png&userId=1700735
[8]: https://community.wolfram.com//c/portal/getImageAttachment?filename=shang.gif&userId=1700735
[9]: https://community.wolfram.com//c/portal/getImageAttachment?filename=binshang.gif&userId=1700735
[10]: https://community.wolfram.com//c/portal/getImageAttachment?filename=8462results.gif&userId=1700735
[11]: https://community.wolfram.com//c/portal/getImageAttachment?filename=8195closerlook.gif&userId=1700735
[12]: https://community.wolfram.com//c/portal/getImageAttachment?filename=deviation.png&userId=1700735Ahmed Elbanna20190710T04:49:48Z[WSC19] Analysing Global Emoji Usage
https://community.wolfram.com/groups//m/t/1735657
Introduction:

Emojis over time have become an important form of expressing emotions, especially in the social media world. Understanding how people use emojis and how the usage varies across countries can help us understand human behaviour better. The main aim of this project was to explore which emojis are dominantly popular in a specific geographic area.
Further, a short exploration was made into how different socioeconomic factors may affect emoji usage.
Methodology:


All the data was obtained in the form of geolocation tagged tweets from Twitter. Further, the emojis were isolated from each tweet (also based on country), and their usage was analysed.
###Initial Attempts
Initially I started off by developing a pseudo dataset, forming a list of countries and their respective emoji usage frequency, randomly generated. This was done as an attempt to better understand possible approaches to solving the problem.
In the beginning, the goal of the study was to explore the emoji usage patterns for a few countries, say the ‘top 10 twitter using’ countries. However, being able to extract emojis from tweets is a more powerful tool than I originally anticipated. So, I decided to do it for all the countries that the Wolfram Language has in its repository.
###Extracting Data from Twitter
I used `ServiceConnect`, a built in function of the Wolfram Language, to connect to Twitter, and then used the “TweetSearch” attribute to extract Twitter data from 226 countries. Unfortunately, around 22 countries’ data was not available, and so are excluded from the current study.
twitter=ServiceConnect[“Twitter”]
twitter["TweetSearch", "Query" > "", GeoLocation > #,
MaxItems > 2000 ]
countries = CountryData[];
Block[{results, filename},
results =
twitter["TweetSearch", "Query" > "", GeoLocation > #,
MaxItems > 2000 ];
filename =
FileNameJoin[{"/Users/anweshadas/Desktop/Wolfram emoji \
project/Extracting the Emoji/", "Results_" <> #["Name"] <> ".m"}];
Export[filename, results];
Pause[120]; ]
All the data that was collected was stored on a local folder on the computer.
###Extracting Emoji from Tweets
Extracting emojis from a tweet was by far the most difficult part of the problem. The problem being that all the tweets on Twitter are encoded in UTF16, whereas the Wolfram Language `FromCharacterCode` function uses UTF8.
As a first step, I imported the unicodes for all the emojis into the Mathematica notebook.
string = Import[
"http://unicode.org/Public/emoji/12.0/emojidata.txt"];
goodLines =
Select[StringTrim /@ StringSplit[string, "\n"] /. "" > Nothing,
StringTake[#, 1] =!= "#" &];
codes = First /@ (StringSplit[#] & /@ goodLines);
allCodesUNICOD =
Union@Select[
With[{d = FromDigits[#, 16] & /@ Flatten[StringSplit[#, "."] /. "" > Nothing]},
If[Length[d] == 1, d, Range @@ d[[{1, 1}]]]] & /@ codes //
Flatten, # >= 9728 &];
Next, I imported the twitter data (which was stored on the computer in the previous step) into the notebook.
filenames = FileNames["*.m", NotebookDirectory[]];
missing = {"Canada", "Chad", "Democratic Republic of the Congo",
"Egypt", "Falkland Islands", "Libya", "Mauritania", "Mongolia",
"Myanmar", "Nicaragua", "Niger", "Norfolk Island", "Romania",
"Somalia", "Svalbard", "Sweden", "Syria", "Tonga", "Turkmenistan",
"Tuvalu", "Uzbekistan", "Zambia"};
missingEntity = Interpreter["Country"][#] & /@ missing;
cData = Complement[ StringCases[filenames, ___ ~~ "/Results_" ~~ x__ ~~ ".m" :> x] //
Flatten, missing];
newfilenames = filenames // Select[MemberQ[cData, StringCases[#, ___ ~~ "/Results_" ~~ x__ ~~ ".m" :> x] //
First] &];
alldata = Import[#] & /@ newfilenames;
Next, I created a list of all the countries for which the data was collected.
countries = CountryData[];
newcountries = Select[countries, ! MemberQ[missingEntity, #] &];
Then I extracted the text part from the tweets, and converted them into unicode.
alltext = alldata[[#]][All, "Text"] & /@ Range[alldata // Length];
allcodes = ToCharacterCode[alltext[[#]] // Normal, "Unicode"] & /@
Range[alltext // Length];
Then threaded them as an association.
allthread = AssociationThread[newcountries > allcodes];
The final step to extracting the emojis was converting all of the UTF16 codes into UTF8, which allowed the Wolfram Language to interpret them.
toCodePoint[{a_, b_}] /; 16^^d800 <= a <= 16^^dbff && 16^^dc00 <= b <= 16^^dfff :=
(a  16^^d800)*2^10 + (b  16^^dc00) + 16^4.
datacleaned = Map[toCodePoint[#] /. _toCodePoint > First@# & /@ Partition[#, 2, 1] &, allthread, {2}] /. r_Real :> Floor@r;
The output of this step was a list of all the emojis used in all the Tweets for each country.
justEmojis = Cases[#, Alternatives @@ allCodesUNICOD, {2}] & /@ datacleaned;
###Exploration: GDP and Literacy Fraction
I feel that emojis are a very powerful outlet towards understanding human behavior. Hence, I wondered if socio economic conditions of a country might influence their emoji usage. In this project, I took the case of two very common social indicators GDP (Gross Domestic Product) and Literacy Fraction.
Results


The next step was to get a list of number of emojis present in tweets per country.
sortedCountries = Length /@ justEmojis // Normal;
In order to have a better understanding of the data, I decided to calculate the emoji usage density (number of emojis per tweet) for each country.
a = sortedCountries // Values;
theactualnumberoftweets =
AssociationThread[newcountries > Length /@ alldata] // Normal;
b = theactualnumberoftweets // Values;
theemojiratio =
Table[N[Part[a, n]/Part[b, n]], {n, Length@newcountries}];
I was also curious to find out that on an average what percentage of tweets contain an emoji. Results suggest that on an average, 63% of tweets use emojis. Next, I plotted a weighted map for emoji usage density.
GeoRegionValuePlot[AssociationThread[newcountries > theemojiratio],
ImageSize > 500]
![enter image description here][2]
###Density Plots by Continent
After this, I decided to have an emoji density map for each continent, in order to understand emoji usage frequency in specific regions.
####Asia
AsiaGeoRegion = Part[amapfortheemojiratio, #] & /@ Position[newcountries, #] & /@
Entity["GeographicRegion", "Asia"][EntityProperty["GeographicRegion", "Countries"]] // Flatten;
GeoRegionValuePlot[AsiaGeoRegion, ImageSize > 500]
![enter image description here][3]
####Europe
EuropeGeoRegion = Part[amapfortheemojiratio, #] & /@ Position[newcountries, #] & /@ Entity["GeographicRegion", "Europe"][ EntityProperty["GeographicRegion", "Countries"]] // Flatten;
GeoRegionValuePlot[EuropeGeoRegion, ImageSize > 500]
![enter image description here][4]
####Africa
AfricaGeoRegion = Part[amapfortheemojiratio, #] & /@ Position[newcountries, #] & /@
Entity["GeographicRegion", "Africa"][EntityProperty["GeographicRegion", "Countries"]] // Flatten;
GeoRegionValuePlot[AfricaGeoRegion, ImageSize > 500]
![enter image description here][5]
####North America
NorthAmericaGeoRegion = Part[amapfortheemojiratio, #] & /@ Position[newcountries, #] & /@
Entity["GeographicRegion", "NorthAmerica"][EntityProperty["GeographicRegion", "Countries"]] // Flatten;
GeoRegionValuePlot[NorthAmericaGeoRegion, ImageSize > 500]
![enter image description here][6]
####South America
SouthAmericaGeoRegion =
Part[amapfortheemojiratio, #] & /@ Position[newcountries, #] & /@
Entity["GeographicRegion", "South America"][EntityProperty["GeographicRegion", "Countries"]] // Flatten;
GeoRegionValuePlot[SouthAmericaGeoRegion, ImageSize > 500]
![enter image description here][7]
###Most Common Emojis by Country
One of the most important parts of the project was to investigate the most common emoji for each country.
cuteEmojis = KeyMap[FromCharacterCode, #] & /@ ReverseSort /@ Counts /@ justEmojis;
ds = Dataset@cuteEmojis;
ds[1 ;;, Association@MaximalBy[Normal@#, Last] &]
![enter image description here][8]
###Most Common Emoji Globally
To illustrate the global frequency of specific emoji usage, I plotted a bar chart for the 20 most common emojis.
BarChart[Last /@ Take[Reverse[SortBy[MostCommonEmojis, Last]], 20],
ChartLabels > First /@ Take[Reverse[SortBy[MostCommonEmojis, Last]], 20] ,
ImageSize > 750]
![enter image description here][9]
###GDP and Literacy Fraction
As mentioned in the methods section, as an exploratory question for the project was to try and see if I could find any correlation between the GDP and emoji usage and between literacy ratio and emoji usage.
Unfortunately, no clear trend was observed for the GDP vs emoji usage plot. However, for the literacy fraction vs emoji usage, a positive trend was observed, i.e. emoji usage goes up as the literacy fraction goes up.
####GDP vs EmojiRatio
gdp = CountryData[#, "GDP"] & /@ newcountries;
listplotdata = SortBy[Select[Thread[gdp > theemojiratio], ! MissingQ[#[[1]]] &] //
Normal, Keys];
gdpdataplotted = ListPlot[List @@@ listplotdata, AxesLabel > {"GDP", "emojiratio"},
ImageSize > 500]
![enter image description here][10]
####Literacy Fraction vs EmojiRatio
literacyfraction = CountryData[#, "LiteracyFraction"] & /@ newcountries;
literaryfractionsorted = SortBy[Select[Thread[literacyfraction > theemojiratio], !
MissingQ[#[[2]]] &] // Normal, Keys];
literacyfractionplot = ListPlot[List @@@ literaryfractionsorted,
AxesLabel > {"literacyfraction", "theemojiratio"}, ImageSize > 500]
![enter image description here][11]
Conclusion:


In this project, around 220 000 tweets from 226 countries was used. The ‘tears of joy’ emoji was found to be the most popular globally, in addition to being the most used for 104 countries. This was followed by the ‘crying loudly' emoji !
Future Work


This work can of course be improved with more consistent data. In addition to this, one interesting extension of the project could be to explore how the first/official language of a country affects its emoji usage patterns. A possible approach is to try and find a correlation between alphabet length (for eg. 26 for English) and emoji usage density.
Acknowledgement:


A big shout out to my mentor who made this project possible—Rory Foulger. Also, a big ‘Thank You’ to mentors Kyle Keane, Philip Maymin, Christian Pasquel and Mads Bahrami—who too were an integral part of this project.
References:


 http://getemoji.com
 https://unicode.org/emoji/charts12.0/fullemojilist.html
 https://unicode.org/emoji/charts12.0/fullemojilist.html
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=emojiusagedensitypage0.jpg&userId=1724965
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=emojiusagedensitypage0.jpg&userId=1724965
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.41.19PM.png&userId=1724965
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.43.05PM.png&userId=1724965
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.44.44PM.png&userId=1724965
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.46.05PM.png&userId=1724965
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.49.03PM.png&userId=1724965
[8]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.50.53PM.png&userId=1724965
[9]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at1.34.43PM.png&userId=1724965
[10]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.54.58PM.png&userId=1724965
[11]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Screenshot20190711at7.56.32PM.png&userId=1724965Anwesha Das20190712T23:15:35Zipynb2docx  Convert Jupyter notebooks to Word preserving LaTeX formatting
https://community.wolfram.com/groups//m/t/1740967
Wolfram Language and Jupyter notebooks are going to be well integrated technologies. But neither of them can export jupyter notebooks to a format which is still considered standard in the industrial sector: Word (docx).
Single command line utilities that properly converts a jupyter notebook containing a mix of markdown, html and (especially) LaTeX to docx does not exist to my knowledge, and it is rather difficult to find the correct sequence of conversions that properly produces a docx with LaTeX equations in the original placement.
Therefore I developed the following utility based on wolframscript and on the powerfull string manipulation and format conversion tools present in the Wolfram Language.
https://github.com/Ludwiggle/ipynb2docx
Thank You.Nicola Cattabiani20190719T10:00:23Z[WSS19] Time Series Summarization and Application In Weather Data
https://community.wolfram.com/groups//m/t/1729183
# Introduction
Suppose one wants to know the average temperature throughout a time series. With the existing resources, the only option is to calculate the average of every single data point for the entire time interval. This option, however, is extremely inefficient as every point is needed in order to generate the average. To solve this problem, we examine how we may divide the time series into years, months, weeks, dates, and hours so that when the average is calculated, we can simply extract the temperature values for the years, months, etc. and reduce the number of objects that is needed for the average to be calculated.
I will start by extracting the complete years that occur during this time series. By doing so, we created objects that will represent all the data points for that year. We may then continue this process and extract the complete months from the time that does not assemble a complete year. Then, we continue this process by grouping the dates that are left into combinations of six  day weeks and days.
# Time Series Summarization
I have decided that I will write out the time series as a combination of years, months, weeks, days, and hours. The logic that I will use to break down a time series will be the following:
![Logic Flow Of Time Series Summarization][1]
Notice that one unique feature of my optimization model considers sixday weeks instead of the traditional sevenday format. I have decided to do so for several reasons:
 It is actually more efficient to write out time series combinations when the weeks are six days long. I wrote the program that will determine for week length of n days how many week/day objects will be needed to assemble a time series of m days for all m that is less than the max number of days in a month (Thus, 1 <= m <= 30). When I sum of all the number of objects needed across all m values for each of the n values, I realized that n=6 actually has the lowest total, which means that n = 6 will be the most efficient. Also, it takes less objects to assemble periods of 30, 31 days when n=6 because 6 divides these values almost evenly.
 This will eliminate any potential for the need to reconsider the optimization strategy for the beginning and the ending of the time series: If the traditional sevenday week format is used, then the optimizing strategy will take one extra step because if we consider the time series [June 1st, July 5th], then grouping this time series into 5 weeks instead of the month of June plus 5 days will mean that less objects is needed when we calculate the average. Using the sixday week format will eliminate such concerns. Once we extracted the weeks and dates, we will create the most efficient combination for this time series.
Based on the tree diagram, I can go ahead and assemble the most efficient combination. I will construct the TimeSeriesCombination to obtain the most optimal time series combination.The following is a combination of years (expressed as yyyy, where yyyy stands for the year), months (yyyymm), weeks (yyyymmddW), dates (yyyymmdd), hours (yyyymmddhh), expressed in chronological order.
TimeSeriesCombination[DateObject[{2019, 6, 2, 1}],
DateObject[{2019, 7, 5, 0}]]
{"2019060201", "2019060202", "2019060203", "2019060204",
"2019060205", "2019060206", "2019060207", "2019060208", "2019060209",
"2019060210", "2019060211", "2019060212", "2019060213", "2019060214",
"2019060215", "2019060216", "2019060217", "2019060218", "2019060219",
"2019060220", "2019060221", "2019060222", "2019060223", "20190603W",
"20190609W", "20190615W", "20190621W", "20190627W", "20190703",
"20190704", "2019070500"}
To make this time series more readable, we can go ahead and group it by the type of object. In this case, all consecutive hours will be grouped together, than days, weeks, etc.
![enter image description here][2]
#Evaluation: In Application To Weather Data
Using this optimized time series summarization, I can find the maximum, minimum, and the mean air temperature for a time series. For now, the functions only apply to Waltham, MA. An extra parameter will be needed to allow users to select their locations. The temperature interval can be determined by looking at the objects generate by the TimeSeriesCombination function and using these as keys to search up generated weather temperature data (I have created a data file producing process that is intended to improve the Wolfram Knowledge Base) and using the weather temperature data to calculate the overall maximum, minimum, and mean value during this time interval.
MaxTemperatureInterval[DateObject[{2016, 8, 13, 0}],
DateObject[{2018, 2, 23, 23}]]
Quantity[81.788, "DegreesFahrenheit"]
MinTemperatureInterval[DateObject[{2016, 8, 13, 0}],
DateObject[{2018, 2, 23, 23}]]
Quantity[0.710001, "DegreesFahrenheit"]
MeanTemperatureInterval[DateObject[{2016, 8, 13, 0}],
DateObject[{2018, 2, 23, 23}]]
Quantity[48.5953, "DegreesFahrenheit"]
#Evaluation: How Efficient Is This Function?
We decided to conduct a test to see how much the generated function improves the overall efficiency of finding data of that time series. We determine how time, in seconds, it takes for the algorithm we determined, and also the existing algorithm, to determine the mean temperature throughout 300 time series, each ending at 2018/12/31 but have lengths of 1 day, 2 days, ... We find that initially the existing method is more efficient. However, as the number of days increase my method starts to prevail.
![Comparison Of The Run Of The Two Functions][3]
Based on the plot, we can see that past 90 days my function will be significantly more efficient than the existing method. I anticipate that as the number of days increase, my method will become more and more efficient compared to the existing method.
# Why This Function Is Actually More Important Than Just The Efficiency
To double check that my code makes sense, I go back to check the calculated value against the value that is calculated using the existing method. When I look at the values, there seems to be some differences.
![enter image description here][4]
The difference is even bigger when we are looking at smaller time intervals.
![enter image description here][5]
While I wonder what is causing this difference, I remembered my initial experiment on time series and remembered that the number of data points for each day is not equal:
![enter image description here][6]
Using the data files that I have, I calculated and confirmed that the existing system simply takes all the data points during that time interval and averages it out. What this means is that the mechanism will favor some mechanisms over the others. In the case between 2018/11/13 to 2018/11/15, the temperature is larger because the date, whose temperature is 46.5 F (higher than the other two dates) is favored during the calculation. Thus, my method of calculation will provide a more calculation of the mean because every day will have the same weight during the calculation.
Note that my mechanism is not perfect. This comes from the fact that the daily temperatures are calculated with the same method that caused the inaccuracies in the existing mechanism, and the unequal representation of hours is a glaring issue in calculating daily temperatures:
![enter image description here][7]
In this case, the temperature during 2:00 and 10:00 is not even calculated! When calculating averages, 18:0024:00 will be heavily favored and the mean temperature for the date will be a lot close to the 70 F mark as a result. Thus, daily temperatures should eventually be calculated by averaging the values for all the hours.
# Future Work
I can examine whether inserting other time intervals (quarters, decades, centuries, minutes, seconds) will allow for an even more efficient processing, and implement these intervals if it is indeed more efficient to do so. I can also use the time series summation to look at other data (Dow Jones index, etc.) to generate relevant statistical analysis such as forecasting.
#Further Exploration
GitHub link: [https://github.com/altan4377/WSS2019][8]
[1]: https://community.wolfram.com//c/portal/getImageAttachment?filename=TreeDiagram.jpg&userId=1726252
[2]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ResultsColumnForm.jpg&userId=1726252
[3]: https://community.wolfram.com//c/portal/getImageAttachment?filename=ComparisonOfMethodsMineVSExisting.jpg&userId=1726252
[4]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Inaccuracy1.jpg&userId=1726252
[5]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Inaccuracy2.jpg&userId=1726252
[6]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Inaccuracy3.jpg&userId=1726252
[7]: https://community.wolfram.com//c/portal/getImageAttachment?filename=Inaccuracy3WithTitle.jpg&userId=1726252
[8]: https://github.com/altan4377/WSS2019Yunchi Tang20190710T18:47:16Z