I live in St. Louis which frequently appears in crime statistics as one of the "most dangerous" cities. While city lines are important for the agencies reporting this data, they can lead to misleading statistics. Large metropolitan areas with small central cities tend to rank higher (more crime) on these lists that similar metropolitan areas where the central city is larger. This is not necessarily because there is more crime, but because the city include less of the affluent suburbs. I'm not suggesting you are doing anything wrong in your post, just that we can go further.
Luckily there is more data to help!
https://www.census.gov/data/tables/time-series/demo/popest/2010s-total-metro-and-micro-statistical-areas.html
The census defines "Metropolitan Statistical Area" to try to measure exactly what it says, the metropolitan area. The data from that link does not include crime stats, but we can use it along with the data you posted to see if the hypothesis I laid out above may be true.
msadata =
Import["/Users/bobs/Downloads/cbsa-est2018-alldata.csv", "Dataset",
"HeaderLines" -> 1];
msa = msadata[Select[#LSAD === "Metropolitan Statistical Area" &]]
I made a little utility to handle names and included a few more agencies from your Dataset (I wanted St. Louis).
agencyToName[str_] := StringReplace[str, {" Police Dept" -> "", " City" -> "",
" Sheriff Department" -> ""}];
notsmall = usCrimes2014[Select[#Population >= 200000 &]]
I made this somewhat hacky map from agency names to MSA names, city names seems to be unique enough that it worked pretty well.
msanames = Normal[msa[All, "NAME"]];
agencyToMSAMap =
DeleteMissing[
AssociationMap[
Function[name,
SelectFirst[msanames, StringContainsQ[#, agencyToName@name] &]],
Normal@notsmall[All, "Agency"]]]
I combined the data into a single Dataset
combinestats[crimedata_] :=
Association[
Normal /@ {KeyTake[
crimedata, {"Agency", "Population", "PropertyCrimeRate",
"ViolentCrimeRate"}],
KeyTake[SelectFirst[
msa, #NAME ===
agencyToMSAMap[
crimedata["Agency"]] &], {"POPESTIMATE2014"}]}] /;
KeyExistsQ[agencyToMSAMap, crimedata["Agency"]]
combinestats[_] := Missing[]
mydata = DeleteMissing[combinestats /@ notsmall][
Select[NumberQ[Total[KeyDrop[#, "Agency"]]] &]]
Finally I added a second set of bubbles to your plot so each city has both the city population and the MSA population:
BubbleChart[{Normal@
mydata[All,
Callout[{#PropertyCrimeRate, #ViolentCrimeRate, #POPESTIMATE2014},
agencyToName[#Agency]] &],
Normal@mydata[
All, {#PropertyCrimeRate, #ViolentCrimeRate, #Population} &]},
FrameLabel -> {"Property Crime Rate (per 100k)",
"Violent Crime Rate (per 100k)"}, PlotTheme -> "Detailed",
PlotLabel -> "Crime Rates by Policy Department (2014)",
PlotTheme -> "Detailed", AspectRatio -> 1/3, ImageSize -> Full]
If my hypothesis is correct then the highest crime cities (top right) will show lots of blue around their yellow and in fact we see a lot of them are that way (Atlanta, St. Louis, Detroit). However, there are cities in the lower crime area that also show a lot of blue. It turns out these are all not the major center of their MSA (Newark is in with New York, Anaheim with L.A., Scottsdale with Phoenix, etc). So the primary idea checks out that city crime data (while important for the agencies reporting it) is screwed by municipal lines and does not reflect a standard measurement for comparing the crime rates of two areas that people outside of those agencies probably really care about.
Note: Just like every year in football, something went wrong with Cleveland. The yellow is bigger than the blue. Maybe there are two Clevelands in the MSA names?