Message Boards Message Boards

The Global Terrorism Database (GTD)

GROUPS:

In some of my recent posts I have looked at databases and information about crime and new articles world-wide. In particular news articles allow us to data mine incidents of specific events, such as terrorism related acts. In this post I will look at the data from a projet called the "Global Terrorism Database" (GTD) https://www.start.umd.edu/gtd/. The reference for the data base is here:

National Consortium for the Study of Terrorism and Responses to Terrorism (START). (2016). Global Terrorism Database [Data file]. Retrieved from https://www.start.umd.edu/gtd

Here is one of the animations you can generate:

enter image description here

The data can be downloaded as an xlsx spreadsheet. I have used the file generated in June 2016. I understand that the data is updated about six months after the end of the respective year. So my data is for 2015. It turns out that without increasing the heap space the file cannot be imported, so I will use the following

Needs["JLink`"]
SetOptions[InstallJava, JVMArguments -> "-Xmx32g"]
SetOptions[ReinstallJava, JVMArguments -> "-Xmx32g"]
ReinstallJava[]

to get a bit more of it. Now I can import the file (takes a little while).

gtdb = Import["/Users/thiel/Desktop/globalterrorismdb_0616dist.xlsx"];

Let's see how many entries we have:

gtdb[[1]] // Dimensions

So that is 156k events and lots of information on each of them. The total dataset has a ByteCount of

ByteCount[gtdb]

which is slightly less than 1.2 GB - the file itself is only about 77MB. Let's look at the information provided for each event.

Grid[{TableForm /@ Partition[Rule @@@ Transpose[{Range[Length[gtdb[[1, 1]]]], gtdb[[1, 1]]}], 34]}, Frame -> All, Background -> LightGray]

enter image description here

There is a "codebook" (https://www.start.umd.edu/gtd/downloads/Codebook.pdf) where the GTDB team provide more information about all of these entries. Most of the entries are quite straight forward to read. I will not go through every entry of the list now, but only use the ones I need. If in doubt, you can always consult the codebook.

Where do attacks happen?

First we visualise where terrorism incidents happen in the world. Entries 14 and 15 are the longitude and latitude of an event. Here are the first 10 events:

gtdb[[1, -11 ;; -2, {14, 15}]]

enter image description here

I will use a representation I have now used in several other posts. I think that it is pretty intuitive to read.

toCoordinates[coords_] := FromSphericalCoordinates[{#[[1]], Pi/2 - #[[2]], Mod[Pi + #[[3]], 2 Pi, -Pi]}] & /@ (Flatten[{1., #/360*2 Pi}] & /@ coords)

lengths[inputdata_] := 2.*(inputdata/Max[inputdata])

myGeoHistogram[data_, radius_] := Show[SphericalPlot3D[radius, {u, 0, Pi}, {v, 0, 2 Pi}, Mesh -> None, TextureCoordinateFunction -> ({#5, 1 - #4} &), 
PlotStyle -> Directive[Specularity[White, 10], Texture[Import["/Users/thiel/Desktop/backgroundimage.gif"]]], Lighting -> "Neutral", Axes -> False, RotationAction -> "Clip", Boxed -> False, PlotPoints -> 100, PlotRange -> {{-3, 3}, {-3, 3}, {-3, 3}}, ImageSize -> Full], 
Graphics3D[Flatten[{Green, Thick, Line[{#[[1]], (radius + #[[2]])*#[[1]]}] & /@ Transpose[{toCoordinates[#[[All, 1]]], lengths[#[[All, 2]]]} &@data]}]]]

The background image is attached to this post. With that we can represent the events like so:

myGeoHistogram[Transpose@{DeleteCases[gtdb[[1, -2071 ;; -2, {14, 15}]], {"", ""}], ConstantArray[1, Length[DeleteCases[gtdb[[1, -2071 ;; -2, {14, 15}]], {"", ""}]]]}, 3.]

enter image description here

There is obviously a very high number of events in the Middle East and in Afghanistan. Here is another visualisation of the same data:

GeoListPlot[GeoPosition /@ DeleteCases[gtdb[[1, -2071 ;; -2, {14, 15}]], {"", ""}], GeoRange -> "World", GeoProjection -> "Robinson", ImageSize -> 800]

enter image description here

Number of people killed

In column 101 each entry contains information on the number of people that were killed in the attack. We can make a histogram of that.

ListLogLogPlot[SortBy[Select[DeleteCases[Tally[Round /@ gtdb[[1, 2 ;; -2]][[All, 101]]], {"", _}], #[[1]] >= 1 &], First], PlotTheme -> "Marketing", 
 ImageSize -> Large, LabelStyle -> Directive[Bold, 16], FrameLabel -> {"number of casulties", "number of incidents"}]

enter image description here

It turns out that occasionally the number of people killed is not an integer.

Select[SortBy[Select[DeleteCases[Tally[gtdb[[1, 2 ;; -2]][[All, 101]]], {"", _}], #[[1]] >= 1 &], First][[All, 1]], # - Floor[#] != 0 &]

enter image description here

In the codebook they say:

When several cases are linked together, sources sometimes provide a cumulative fatality total for all of the events rather than fatality figures for each incident. In such cases, the preservation of statistical accuracy is achieved by distributing fatalities across the linked incidents. Depending on the number of linked events and the cumulative total of fatalities, it is possible for fractions to appear in the fatality field for individual events.

This is why I used the "Round" function of the histogram. As expected there are fewer incidents with very high casualty numbers. In fact, the behaviour seems to be nearly "scale-free", i.e. a straight line in a log-log plot.

Country by country

We can also look at the data for a specific country. These are incidents in Germany all from December 2015:

Select[gtdb[[1, -1071 ;; -2]], #[[9]] == "Germany" &] // TableForm

enter image description here

Here are descriptions of what happened:

Select[gtdb[[1, -1071 ;; -2]], #[[9]] == "Germany" &][[All, 19]] // TableForm

enter image description here

Most of the attacks are on refugee shelters.

Attacks over the years

The dataset also contains historic case since 1970. We can extract the years from the database like this:

enter image description here

Note that there are no incidents reported for 1993. Let's sort this by year and tally:

attacksvsyear = Transpose[{DateObject[ToString[#]] & /@ DeleteCases[Range[1970, 2015], 1993], Length /@ GatherBy[gtdb[[1, All]], #[[2]] &][[2 ;;]]}]

enter image description here

Now we can plot the number of incidents from 1970 until 2015.

DateListPlot[attacksvsyear, Background -> Black, PlotStyle -> Directive[RGBColor[{0.7, 0, 0}], Thickness[0.005]], 
 LabelStyle -> Directive[Yellow, Bold, Medium, FontSize -> 22], Filling -> Bottom, FillingStyle -> RGBColor[1, 0.5, 0.5], 
 ImageSize -> Full, PlotRange -> All]

enter image description here

On the GTDB website they say that some time in the middle of the sequence the methodology of the data acquisition has changed, so the methodology might not be completely consistent. In particular at the end of 2011 the method has changed (see page 4 of the codebook); it has been tried though to be as consistent as possible.

Graphics of geo-distribution of attacks and casualties

The following image shows terrorism incidents on a map of the world; the sizes of the disks correspond to the number of people killed. Here are the deadly attacks of 2015:

GeoGraphics[Join[{Red}, GeoDisk[{#[[1]], #[[2]]}, Sqrt[#[[3]]]*Quantity[40, "Kilometers"]] & /@ 
Select[Cases[gtdb[[1, -14806 ;; -2, {14, 15, 101}]], {_Real, _Real, _Real}], #[[3]] > 0. &]], GeoRange -> "World", 
GeoProjection -> "Mercator", ImageSize -> Full, GeoBackground -> "CountryBorders"]

enter image description here

We can change the styling a little bit:

styling = {GeoBackground -> GeoStyling["StreetMapNoLabels", GeoStylingImageFunction -> (ImageAdjust@ColorNegate@ColorConvert[#1, "Grayscale"] &)], GeoScaleBar -> Placed[{"Metric", "Imperial"}, {Right, Bottom}], GeoRangePadding -> Full, ImageSize -> Large};
GeoGraphics[Join[{Red}, GeoDisk[{#[[1]], #[[2]]}, Sqrt[#[[3]]]*Quantity[40, "Kilometers"]] & /@ Select[Cases[gtdb[[1, -8000 ;; -2, {14, 15, 101}]], {_Real, _Real, _Real}], #[[3]] > 0. &]], GeoRange -> "World", GeoProjection -> "Equirectangular", ImageSize -> Full, styling]

enter image description here

We can now also plot this on a 3D representation of the world:

backgrnd = GeoGraphics[Join[{Red, GeoStyling[Opacity[0.4]]}, GeoDisk[{#[[1]], #[[2]]}, Sqrt[#[[3]]]*Quantity[40, "Kilometers"]] & /@ 
Select[Cases[gtdb[[1, -8000 ;; -2, {14, 15, 101}]], {_Real, _Real, _Real}], #[[3]] > 0. &]], GeoRange -> "World", GeoBackground -> "Satellite" , 
GeoProjection -> "Equirectangular", ImageSize -> Large]; SphericalPlot3D[3, {u, 0, Pi}, {v, 0, 2 Pi}, 
 Mesh -> None, TextureCoordinateFunction -> ({#5, 1 - #4} &), PlotStyle -> Directive[Specularity[Black], Texture[backgrnd]], 
 Lighting -> "Neutral", Axes -> False, RotationAction -> "Clip", Boxed -> False, PlotPoints -> 100, PlotRange -> {{-3, 3}, {-3, 3}, {-3, 3}}, ImageSize -> Large]

enter image description here

We can also build a function to deal with data of the form {lat,lon,nkilled}. First, we prepare the data:

datalonlatkill = Select[Cases[gtdb[[1, -80 ;; -2, {14, 15, 101}]], {_Real, _Real, _Real}], #[[3]] > 0. &];

then we generate the function:

myBlobPlot[data_] := Module[{}, backgrnd = GeoGraphics[
Join[{Red, GeoStyling[Opacity[0.4]]}, GeoDisk[{#[[1]], #[[2]]}, Sqrt[#[[3]]]*Quantity[40, "Kilometers"]] & /@ data], 
GeoRange -> "World", GeoBackground -> "Satellite" , GeoProjection -> "Equirectangular", ImageSize -> Large]; 
SphericalPlot3D[3, {u, 0, Pi}, {v, 0, 2 Pi}, Mesh -> None, TextureCoordinateFunction -> ({#5, 1 - #4} &), 
PlotStyle -> Directive[Specularity[Black], Texture[backgrnd]], Lighting -> "Neutral", Axes -> False, RotationAction -> "Clip", 
Boxed -> False, PlotPoints -> 100, PlotRange -> {{-3, 3}, {-3, 3}, {-3, 3}}, ImageSize -> Large]]

and plot:

myBlobPlot[datalonlatkill]

enter image description here

Let's try to plot this as a movie in 3 month windows. First we need to create data with also contains the year and month of the attack.

datalonlatkill = Select[Cases[gtdb[[1, ;; -2, {14, 15, 101, 2, 3}]], {_Real, _Real, _Real, _, _}], #[[3]] > 0. &];

Then we calculate a list of positions in the data of the "last case belonging to one of the particular time window.

threemonthwindows = Flatten[Table[{_, _, _, i, j}, {i, 1970., 2015., 1.}, {j, {3., 6., 9., 12.}}], 1];

Then we calculate the indices of all elements in each window.

borders = Prepend[Quiet[Table[Position[datalonlatkill, Cases[datalonlatkill, threemonthwindows[[k]]][[-1]]][[1, 1]], {k, 1, Length[threemonthwindows]}] /. {}[[1, 1]] -> Missing["NotAvailable"]], 0];

and then the bins

bins = {1, 0} + # & /@ Partition[borders, 2, 1];

Now we can make the 3D globe for all time points:

myBlobPlot[datalonlatkill[[#[[1]] ;; #[[2]]]][[All, 1 ;; 3]]] &@ bins[[22]]

Now we want to create a little animation. The globe should turn and show a sequence of states. I could use something like rotate or Viewpoint, but chose to do something a bit different: I will actually move the background image and map it onto a stable sphere. First of all we need to know the dimensions of the background image

backgrnd // ImageDimensions
(*{576,288}*)

Let's remove the bins without any events and count how many there are.

Cases[bins, {_?NumberQ, _?NumberQ}] // Length
(*179*)

We can now make a Manipulate structure to get the idea. As I move the slider the image is cut into to pieces and reassembled. It is actually quite responsive.

Manipulate[ImageAssemble[{ImageCrop[backgrnd, {576 - k, Full}, Left], ImageCrop[backgrnd, {k, Full}, Right]}], {k, 1, 575, 1}]

enter image description here

Now we can use this idea to "rotate" the background and map it onto the sphere. I generate the frames that will make up my movie.

Monitor[frames = 
   Table[backgrnd = 
     GeoGraphics[
      Join[{Red, GeoStyling[Opacity[0.4]]}, 
       GeoDisk[{#[[1]], #[[2]]}, 
          Sqrt[#[[3]]]*
           Quantity[40, 
            "Kilometers"]] & /@ (datalonlatkill[[#[[1]] ;; \
#[[2]]]][[All, 1 ;; 3]] &@ (Cases[
             bins, {_?NumberQ, _?NumberQ}][[k]]))], 
      GeoRange -> "World", GeoBackground -> "Satellite" , 
      GeoProjection -> "Equirectangular", ImageSize -> Large];
    backgrnd = 
     ImageAssemble[{ImageCrop[backgrnd, {576 - Mod[k*10, 576], Full}, 
        Left], ImageCrop[backgrnd, {Mod[k*10, 576], Full}, Right]}];
    Image[
     SphericalPlot3D[3, {u, 0, Pi}, {v, 0, 2 Pi}, Mesh -> None, 
      TextureCoordinateFunction -> ({#5, 1 - #4} &), 
      PlotStyle -> Directive[Specularity[Black], Texture[backgrnd]], 
      Lighting -> "Neutral", Axes -> False, RotationAction -> "Clip", 
      Boxed -> False, PlotPoints -> 100, 
      PlotRange -> {{-3, 3}, {-3, 3}, {-3, 3}}, ImageSize -> Large, 
      Background -> Black]], {k, 1, 179}];, k]

We can now animate the frames or export them.

Monitor[Do[Export["~/Desktop/GTDBmovie/frame" <> ToString[1000 + k] <> ".jpg", frames[[k]], ImageResolution -> 100], {k, 1, Length[frames], 1}],k]

The resulting video is shown at the top of this post.

Incidents by country

The following histogram looks at the number of cases in different countries; only the first 40 are displayed.

BarChart[#[[All, 2]], ChartLabels -> (Rotate[Style[#, 16], Pi/2] & @@@ 
Reverse[SortBy[Tally[gtdb[[1, -5000 ;;, 9]]], Last]][[1 ;; 40]]), ColorFunction -> Function[{height}, ColorData["Temperature"][height]], 
ImageSize -> Full, ScalingFunctions -> "Log"] & @ Reverse[SortBy[Tally[gtdb[[1, -5000 ;;, 9]]], Last]][[1 ;; 40]]

enter image description here

Other information in the database

Again, this is a very rich database and much more information can be extracted. Let's look at textual data. We can first find the positions of textual data in the entries like so:

textpos = Flatten[Position[gtdb[[1, 1]], #] & /@ Select[gtdb[[1, 1]], StringContainsQ[#, "_txt"] &]];

We can now create a lookup table for these columns.

Rule @@@ Transpose[{textpos, gtdb[[1, 1, textpos]]}]

enter image description here

In column 85 for example there are the weapon types.

gtdb[[1, 2 ;; 100, 85]]

enter image description here

Here is a tally of frequently used weapons.

Reverse@SortBy[Tally[RandomChoice[gtdb[[1, All, 85]], 5000]], Last] // TableForm

enter image description here

We can create a WordCloud to visualise this:

WordCloud[RandomChoice[gtdb[[1, All, 85]], 1000], ImageSize -> Large, WordOrientation -> {{0, \[Pi]/3}}]

enter image description here

Have the weapons changed over time? Let's look at the first 5000 attacks:

Reverse@SortBy[Tally[gtdb[[1, 2 ;; 5000, 85]]], Last] // TableForm

enter image description here

And the last 5000.

Reverse@SortBy[Tally[gtdb[[1, -5000 ;;, 85]]], Last] // TableForm

enter image description here

There seem to be more vehicle based attacks. Let's look at that more closely:

Reverse@SortBy[Tally[gtdb[[1, 2 ;;, 85]]], Last] // TableForm

enter image description here

Let's define

attacktypes = Reverse@SortBy[Tally[gtdb[[1, 2 ;;, 85]]], Last][[All, 1]]

enter image description here

We can now look at overlapping windows of 5000 and count the weapons used.

Monitor[weaponsovertime = Table[Count[gtdb[[1, 2 + k*50 ;; 5002 + k*50, 85]], #] & /@ attacktypes, {k, 0, 3000}], k]

Let's try to make this nice. First we normalise.

(1. #/Total[#]) & /@ (weaponsovertime[[1 ;; 3000]])

enter image description here

Now we plot that:

ListLinePlot[Transpose[Accumulate /@ ((1. #/Total[#]) & /@ (Reverse /@ weaponsovertime[[1 ;; 3000]]))], PlotRange -> All, 
Filling -> (#[[2]] -> {#[[1]]} & /@ Partition[Range[12], 2, 1]), ImageSize -> Large, PlotLegends -> Reverse@attacktypes, LabelStyle -> Directive[Bold, 14], AxesLabel -> {"Time / au", "percentage"}]

enter image description here

Note that the legend is in "reverse order" the top most curve in the list corresponds to the last entry in the legend.

Conclusion

This post is only scratching the surface of what is possible with this data set. I find it very exciting that with the Wolfram Language and freely available databases you can answer your own questions about many of the most urgent questions we face today.

POSTED BY: Marco Thiel
Answer
6 months ago

enter image description here - Congratulations! This post is now a Staff Pick! Thank you for your wonderful contributions. Please, keep them coming!

POSTED BY: Moderation Team
Answer
6 months ago

Cool! Thanks for sharing! Are you aware of GeoHistogram doc? Pretty nifty function. Might be useful for visualisations of distributions...

POSTED BY: Sander Huisman
Answer
6 months ago

Hi Sander,

thank you. Yes, I am ware of GeoHistogram, and I have used it before, e.g. here. I often don't get it to look exactly as I want it, particularly if the density of points is low.

For distributions I usually use, well a distribution rather than a histogram, like here. I have used that for the surface of a sphere, too. There are some nice figures for the representation in random matrix theory with that sort of distribution. I look at certain transition/probability matrices, i.e. with largest eigenvalue 1, so that they are on the surface of a sphere. Things like that have come up in models I developed in population matrix theory.

I do, however, agree that GeoHistogram is a very useful function.

I also think that I haven't even begun to properly analyse the GTD, I wanted to try some machine learning on it, now that GPUs are supported again on OSX. It would also be cool to combine the data here with what can be found in the GDELT database from my other post. GDELT also contains data on acts of terrorism, but complementary information. I'd love to see what can be learnt from that.

Thank you for your comment!

Cheers,

Marco

POSTED BY: Marco Thiel
Answer
6 months ago

I somehow missed your road accidents in NYC post. Also very nice! Yeah tweaking such that the appearance is good can be tricky indeed. I also like the smoothed histograms for certain distributions/uses...

I'm on mac OSX but I have an AMD graphics card unfortunately...

You've been on a posting-spree lately, very cool, every time I learn some things...

--SH

POSTED BY: Sander Huisman
Answer
6 months ago

Hi Sander,

yeah, representing data is more of an art than a science and I am not particularly good at that. I think it's crucial for story-telling with data, but it's difficult to get right.

Regarding the GPU issue: I am also on Mac, but the external Bizon Box was a Game changer for me. I have a couple of them and they work really nicely. One of the examples in the documentation runs a day on my CPU and in just over 3 minutes on my GPU. I'll try to post some benchmarks and explain how to set it up in one of my forthcoming posts.

Yes, I have been posting a bit. I have more things in the pipeline, but it often takes time to get the ok from the sites offering the data etc. I'd also live it if one could embed external videos into the websites. Sometimes it takes quite some time to "get the videos small".

You always have visually stunning images in your posts and indeed use very sexy code to get there. It's vwry nice to look forward to some stimulating distraction every time you post something!

I am looking forward to reading from you,

Marco

POSTED BY: Marco Thiel
Answer
6 months ago

Group Abstract Group Abstract