Message Boards Message Boards


Crime Rates in US major cities (2014)

Posted 1 year ago
8 Replies
25 Total Likes

A new twitter storm was initiated by the US President Donald Trump today when referring to one of his critics in Congress, Representative Elijah Cummings. enter image description here

It begs the question to explore which are actually the most dangerous cities in the United States.

Crime statistics in the US are compiled by the Federal Bureau of Investigations. The Uniform Crime Reporting Program (UCR) is a nationwide, voluntary effort by nearly 18000 law enforcement agencies that report back data on crimes.

Using the table-building tool we proceeded to download the police reports of all agencies in the US for the year 2014.

Data Preparation

All files downloaded were located in a single directory for mass processing. We've identified that the records of interest have 24 fields available. We'll use this knowledge to select the rows of interest.

files= FileNames[All, "C:\\Users\\user\\Downloads\\Local Crime"];
vals = Flatten[
   Table[Cases[Import[files[[n]], "Data"], 
     a_ /; MatchQ[Length@a, 24]], {n, Length@files}], 1];

Keys for our dataset have beend defined as follows.

keys = {"Agency", "State", "Months", "Population", 
   "ViolentCrimeTotal", "Murder", "LegacyRape", "RevisedRape", 
   "Robbery", "AggravatedAssault", "PropertyCrimeTotal", "Burglary", 
   "LarcenyTheft", "MotorVehicleTheft", "ViolentCrimeRate", 
   "MurderRate", "LegacyRapeRate", "RevisedRapeRate", "RobberyRate", 
   "AggravatedAssaultRate", "PropertyCrimeRate", "BurglaryRate", 
   "LarcenyTheftRate", "MotorVehicleTheftRate"};

Table[{n, keys[[n]]}, {n, Length@keys}]
({{1, "Agency"}, {2, "State"}, {3, "Months"}, {4, "Population"}, {5, 
  "ViolentCrimeTotal"}, {6, "Murder"}, {7, "LegacyRape"}, {8, 
  "RevisedRape"}, {9, "Robbery"}, {10, "AggravatedAssault"}, {11, 
  "PropertyCrimeTotal"}, {12, "Burglary"}, {13, "LarcenyTheft"}, {14, 
  "MotorVehicleTheft"}, {15, "ViolentCrimeRate"}, {16, 
  "MurderRate"}, {17, "LegacyRapeRate"}, {18, "RevisedRapeRate"}, {19,
   "RobberyRate"}, {20, "AggravatedAssaultRate"}, {21, 
  "PropertyCrimeRate"}, {22, "BurglaryRate"}, {23, 
  "LarcenyTheftRate"}, {24, "MotorVehicleTheftRate"}}*)

valsNew = vals;
valsNew[[All, 3 ;; 14]] = 
  ToExpression[vals[[All, 3 ;; 14]]] /. Null -> 0;
valsNew[[All, 16 ;;]] = 
  ToExpression[vals[[All, 16 ;;]]] /. Null -> Missing["NotAvailable"];

By analyzing the data we noticed that several errors needed to be corrected. Some totals were mising, and rates were missing from the downloaded table. Thus we used the regenerated values using the details in the columns.

Recalculate Totals

valsNew[[All, 5]] = Total /@ valsNew[[All, 6 ;; 10]];
valsNew[[All, 11]] = Total /@ valsNew[[All, 12 ;; 14]];

Recalculate crime rates for those records where a population is given.

population = Flatten[Position[valsNew[[All, 4]], a_ /; a > 0]];
valsNew[[population, 15 ;;]] = 
  N[100000 valsNew[[population, 5 ;; 14]]/valsNew[[population, 4]]];


We are now ready to create the dataset.

usCrimes2014 = Dataset[AssociationThread[keys, #] & /@ valsNew]

enter image description here


Let's filter out the crime statistics for those police departments for which the population is greater than 400,000 people and chart property crime rates (burglaries, larceny, etc) vs violent crime rates (murder, rape, aggravated assault, etc).

 usCrimes2014[Select[#Population >= 400000 &], 
  Callout[{#PropertyCrimeRate, #ViolentCrimeRate, #Population}, 
    With[{pos = StringPosition[#Agency, "Police"]}, 
     Style[If[pos == {}, #Agency, 
       StringTake[#Agency, First@First@pos - 2]], FontSize -> 8]]] &],
  PlotTheme -> "Detailed", 
 FrameLabel -> {"Property Crime Rate (per 100k)", 
   "Violent Crime Rate (per 100k)"}, PlotTheme -> "Detailed", 
 PlotLabel -> "Crime Rates by Policy Department (2014)", 
 AspectRatio -> 1/3, ImageSize -> Full]

enter image description here

Baltimore is in the top 5 big cities with the worst violent crime rates (top place goes to Detroit's police department territory), and in the top 15 worst with regards to property crimes (Seattle Police Department has the highest rate in property crime).

Dataset is attached to the posting for further analysis by the reader.

8 Replies

enter image description here - Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!

I live in St. Louis which frequently appears in crime statistics as one of the "most dangerous" cities. While city lines are important for the agencies reporting this data, they can lead to misleading statistics. Large metropolitan areas with small central cities tend to rank higher (more crime) on these lists that similar metropolitan areas where the central city is larger. This is not necessarily because there is more crime, but because the city include less of the affluent suburbs. I'm not suggesting you are doing anything wrong in your post, just that we can go further.

Luckily there is more data to help!

The census defines "Metropolitan Statistical Area" to try to measure exactly what it says, the metropolitan area. The data from that link does not include crime stats, but we can use it along with the data you posted to see if the hypothesis I laid out above may be true.

msadata = 
  Import["/Users/bobs/Downloads/cbsa-est2018-alldata.csv", "Dataset", 
   "HeaderLines" -> 1];
msa = msadata[Select[#LSAD === "Metropolitan Statistical Area" &]]

enter image description here

I made a little utility to handle names and included a few more agencies from your Dataset (I wanted St. Louis).

agencyToName[str_] := StringReplace[str, {" Police Dept" -> "", " City" -> "", 
" Sheriff Department" -> ""}];
notsmall = usCrimes2014[Select[#Population >= 200000 &]]

I made this somewhat hacky map from agency names to MSA names, city names seems to be unique enough that it worked pretty well.

msanames = Normal[msa[All, "NAME"]]; 
agencyToMSAMap = 
    SelectFirst[msanames, StringContainsQ[#, agencyToName@name] &]], 
   Normal@notsmall[All, "Agency"]]]

I combined the data into a single Dataset

combinestats[crimedata_] := 
   Normal /@ {KeyTake[
      crimedata, {"Agency", "Population", "PropertyCrimeRate", 
       msa, #NAME === 
          crimedata["Agency"]] &], {"POPESTIMATE2014"}]}] /; 
  KeyExistsQ[agencyToMSAMap, crimedata["Agency"]]
combinestats[_] := Missing[]

mydata = DeleteMissing[combinestats /@ notsmall][
  Select[NumberQ[Total[KeyDrop[#, "Agency"]]] &]]

enter image description here

Finally I added a second set of bubbles to your plot so each city has both the city population and the MSA population:

    Callout[{#PropertyCrimeRate, #ViolentCrimeRate, #POPESTIMATE2014},
       agencyToName[#Agency]] &],
    All, {#PropertyCrimeRate, #ViolentCrimeRate, #Population} &]}, 
 FrameLabel -> {"Property Crime Rate (per 100k)", 
   "Violent Crime Rate (per 100k)"}, PlotTheme -> "Detailed", 
 PlotLabel -> "Crime Rates by Policy Department (2014)", 
 PlotTheme -> "Detailed", AspectRatio -> 1/3, ImageSize -> Full]

enter image description here

If my hypothesis is correct then the highest crime cities (top right) will show lots of blue around their yellow and in fact we see a lot of them are that way (Atlanta, St. Louis, Detroit). However, there are cities in the lower crime area that also show a lot of blue. It turns out these are all not the major center of their MSA (Newark is in with New York, Anaheim with L.A., Scottsdale with Phoenix, etc). So the primary idea checks out that city crime data (while important for the agencies reporting it) is screwed by municipal lines and does not reflect a standard measurement for comparing the crime rates of two areas that people outside of those agencies probably really care about.

Note: Just like every year in football, something went wrong with Cleveland. The yellow is bigger than the blue. Maybe there are two Clevelands in the MSA names?

Hi Bob, great work! Would you like to give a stab at recalculating the rates and doing the bubble chart based on the census data you downloaded? Are the rankings changed significantly between cities or are their relative positions in this space still maintained?

it looks like @Rohit Namjoshi already did it with the correct data. However that data looks like it has important holes when not all of the municipalities in an MSA report full data.

Visualizing more dimensions

I would like to show a multidimensional way of plotting the crime data using Chernoff Faces.

First we massage the dataset to keep only rows and numerical columns of interest.

dsSmall = dsUCRCrimesStats2014[Select[#Population >= 400000 && #Months == 12 &]];
dsSmall = Dataset[dsSmall[All, #[[ Join[{1}, Range[4, Length[#]]] ]] &]];
dsSmall = Dataset[dsSmall[All, KeySelect[#,  StringMatchQ[#, ___ ~~ "Agency" | "Rate" ~~ ___] &] &]];
dsSmall = Dataset[dsSmall[All, KeySelect[#, ! StringMatchQ[#, ___ ~~ "Legacy" ~~ ___] &] &]]

enter image description here

Here we import a package that makes and interactive Chernoff Faces data browser:


Here we make the browser:

ChernoffFacesDataBrowser[<|"Crimes" -> dsSmall|>, ImageSize -> {1300, 700}, "FaceImageSize" -> 85]

enter image description here

More details about that data browsing approach can be found in the MathematicaVsR project "Browsing data with Chernoff faces".

We can see from the plots:

  • that Detroit Police Dept. deals the highest violent crime rates and rape rates (face length, eyes position), and

  • that Oakland Police Dept. deals with no rapes but with very high robbery and burglary rates (face length, eyes high up, eye size, iris size).

Experimenting with the application of different color schemes can bring attention to different outliers or correlations.

Other things to do

A few years ago Diego proclaimed here his RadarChart package. Ideally, the data browser I am showed above should make it easy to switch between ChernoffFace, RadarChart, and SectorChart for the multidimensional visualizations.

Data for 2017 is available here. The downloadable Excel is in a different format and Violent / Property are missing for some MSA's

data = Import[
   "~/Documents/Mathematica/US Crime/2017-table-6.xls", {"Dataset", 1, ;; -9}, 
   "SkipLines" -> 3, HeaderLines -> 1];

extractRate = 
    StringLength[StringTrim[#"Counties/principal cities"]] == 0 ||
    #"Counties/principal cities" == "Rate per 100,000 inhabitants" &]];

crimeRate = 
  extractRate // Normal // Partition[#, 2] & // 
    Map[Merge[#, (SelectFirst[StringLength@StringTrim@ToString@# > 0 &])] &] // Dataset;

cityState[text_] := 
 Module[{city, state, split = StringSplit[text, ","]},
  city = First@split;
  state = First@StringSplit@split[[2]];
  <|"City" -> city, "State" -> state, "CityState" -> city <> ", " <> state|>]

  Select[#"Population" >= 400000 && Not@MissingQ[#"Property\ncrime"] &&
      Not@MissingQ[#"Violent\ncrime"] &], 
  Callout[{#"Property\ncrime", #"Violent\ncrime", #"Population"}, 
    Style[#"CityState", FontSize -> 9]] &], PlotTheme -> "Detailed", 
 FrameLabel -> {"Property Crime Rate (per 100k)", 
   "Violent Crime Rate (per 100k)"}, PlotTheme -> "Detailed", 
 PlotLabel -> "Crime Rates by City/State 2017", AspectRatio -> 1/3, 
 ImageSize -> Full]

Albuquerque, NM increased significantly between 2014 and 2017 and is by far the highest for property and second highest for violent crimes. In the news.

enter image description here

Thanks Diego for this analysis, it is very instructive. However as you point out at the beginning this depends upon a voluntary effort by nearly 18000 law enforcement agencies. I did a similar analysis in 2015 related to the number of black deaths by police and found that of the 18000 law agencies only 1200 bothered to report deaths at all. The US Bureau of Justice Statistics (BJS) also attempted to do a review but found that all attempts at verification failed because even those police departments which reported gave different results when queried at later dates, see So these UCR reports are highly biased. America is pretty much unique amongst western countries in that it doesn't have compulsory data collection from all law enforcement offices. Even China collects data from its crime bureaus but it doesn't release it publicly. The best reference for US deaths due to the police is the Guardian's ongoing report called the Counted, see

Indeed, one of the many oddities of the USA, although compulsory statistics collection is no guarantee of accuracy, to put it mildly.

Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
or Discard

Group Abstract Group Abstract