Message Boards Message Boards

Searchable databases: Visualizing entry locations

Posted 9 years ago

There has been a lot of inquiry into how to create usable and searchable databases using Mathematica. I will attempt to do my best to detail how to go about doing something like that here.This process is intended to accomplish the tasks listed bellow. I have attached the CSV containing the data used and the corresponding notebook to this post.

  • Using the Wolfram Language import the data according to the named fields and create a data set

  • Create a form to search by name through the data set and return a notebook with a summary of the person's information

  • Create a form to search by zip code or city, state to locate and plot each person living within a 100 mile radius

It is relatively straightforward to generate a dataset object from such a neatly organized excel spreadsheet using SemanticImport[].

    data = SemanticImport["C:\\Users\\YottaTech\\Desktop\\us-500.csv"]

the output from SemanticImport

Search the first element of the database and look at the entry

data[1]

a single users data

Take the first name of the first entry utilizing the association keys

data[1]["first_name"]
"James"

It was not explicitly stated whether it was to be searchable by first, last, or full name. So I made a function for each that creates separate lists of all the first, last, and full names search through for patterns

firstNames = Table[data[i]["first_name"], {i, 1, Length[data]}];
lastNames = Table[data[i]["last_name"], {i, 1, Length[data]}];
fullNames = 
  Table[StringJoin[data[i]["first_name"], " ", 
    data[i]["last_name"]], {i, 1, Length[data]}];

list of first ten first names of the data

We now understand the data enough to make a function to parse through it for patterns.The SearchNameFunction takes a primary argument of the name and a secondary optional argument of the type of name you are submitting. If you don' t specify this option it will assume you are search by the person' s "Full" name. You can specify to search by either "Last" or "First" names as well.

SearchNameFunction[name_, type_: "Full"] :=
 Module[{positions, names},
  (
   Which[
        type == "Full",
        names = fullNames
    ,
        type == "First",
        names = firstNames
    ,
        type == "Last",
        names = lastNames
    ];
   positions = Flatten@Position[names, name];
   data[#] &@positions
   )
  ]
SearchNameFunction["Ruta", "Last"]

results of searching for last name Ruta

Search a user by their full name name - notice we don' t need to specify the type of the name for this method

SearchNameFunction["Art Venere"]

results of searching by the users full name

Create a FormFunction that takes in the name as a string from the form and passes it to the SearchNameFunction[]

form = FormFunction[{"name" -> "String"}, 
   SearchNameFunction[#name] &];

Run the form[] function and input the name you would like to search

form[]

Now I will go about visualizing the data meaningfully. I tried using the exact addresses but Mathematica didnt like the interpreter for that very much so I instead opted to use cities and zip codes.The only real issue I had when parsing the information given in the excel file was that a lot of the cities and zip codes were missing or corrupted. In order to make a working prototype that would effectively parse ALL users I decided on making a hierarchy where we would use whatever their location. This seemed to work well enough for the application in question.

This function takes a location input as either zip code or a city and plots points for every entry within a 100 mile radius of the location. This range can be changed as an optional variable but is automatically set to 100.

SearchCityFunction[locationInput_, range_: 100] :=
 (

  peoplesLocations =
   Module[{zip, location, output, disk},

    (*If the locationInput is all numbers consider it to be a zip code, else it will be interpreted as a city*)
    If[DigitQ[ToString@locationInput],
     location = ZIPCodeData[ToString@locationInput, "Coordinates"],
     location = Interpreter["Location"][locationInput],
     ];

    (*Starting with the zip code check if the entry is within 100 miles of the location Input. 
    If the zip is missingor invalid use the city, if the city is missing or invalid use the state. 
    Would be worth looking into though if the entry is within 100 miles store it, else do nothing. *)

    ParallelTable[
     If[
      GeoWithinQ[disk = GeoDisk[location, Quantity[range, "Miles"]], 
       output = 
        If[MissingQ[
          zip = ZIPCodeData[ToString@data[i]["zip"], "Coordinates"]],
         If[MissingQ[data[i]["city"]], 
          data[i]["state"]["Coordinates"], data[i]["city"]],
         zip]]
      ,
      output
      ,
      Nothing]
     , {i, 1, Length[data], 1}]];

  (*Plot the stored locations on a geographic map*)
  GeoGraphics[{Red, PointSize[Large], 
    Point[GeoPosition[peoplesLocations]]}]
  )

Run test function for a zip code near New Orleans (Where i am from!)

SearchCityFunction[70433]

results of searching zip code near New Orleans

SearchCityFunction["New York"]

results of searching New York

Create a FormFunction that takes in the name as a string from the form and passes it to the SearchCityFunction[] and test it.

form = FormFunction[{"location" -> "String"}, 
   SearchCityFunction[#location] &];
form[]

I know there is probably a lot better ways to go about doing this but I thought it was an interesting project! Let me know what you think and how I could go about optimizing the process I laid down.

Attachments:
POSTED BY: William Duhe
6 Replies

enter image description here - another post of yours has been selected for the Staff Picks group, congratulations !

We are happy to see you at the tops of the "Featured Contributor" board. Thank you for your wonderful contributions, and please keep them coming!

POSTED BY: EDITORIAL BOARD
Posted 9 years ago

Excellent post, William!

Not only is it useful, but it was well-written (for long-time beginners like me who don't get their daily fill of Mathematica). I understood the purpose of every piece of syntax and function call due to your verbose explanations. Thanks!

If I get some time, I'd love to create some battlefield maps of the world and regions for the various wars throughout history. (Heck, one could even throw in the TimeLinePlot for completeness.) With a flexible "range" variable, one could program your SearchCItyFunction to allow viewing battle locations by war, theater, or region--and even be selective about which battles to show from which wars.

How about naval battles? Or sunken merchant ships? Or locations of discoveries in 20th century physics? Or the number of new fast food restaurants in a growing suburb in the last 15 years?

I'm not exactly sure where to get these data files, but I don't think it would be too hard to find. Your tool opens a world of possible diversions. Good job, and thanks for sharing!

POSTED BY: Glen Deering
Posted 9 years ago

Thank you for the compliments! If you know of anywhere that has databases that would include the information you would like to use please feel free to link it here and I can help you throw something together. I think it would be really cool to visualize maps of pertinent battles around the globe and be able to filter through them like you mentioned. I am also interested in creating some more real time visualizations for shifting through data sets more fluidly - and that would be a cool place to start.

POSTED BY: William Duhe
Posted 9 years ago

Okay, William. I've got one for you.

http://www.civil-war-journeys.org/spreadsheetofcivilwarbattles.htm

There is a link to an Excel file at the bottom of the webpage. Among other things, it has the name of the battle, the state in which it was fought, the campaign it was a part of, names of Union/Confederate/Indian commanders, Union/Confederate victory, date ranges for each battle, number of casualties, and whether the battlefield has been preserved for visits today. These things could be indicated by Tooltip (for name of battle, campaign, and names of commanders), color code (for year), or symbol other than a disk (Stars and Bars for a Confederate victory and US flag for a Union victory--the bigger the flag, the more casualties inflicted--might have to be logarithmically scaled).

confederateFlag = 
 Interpreter["Image"][
  File[FindFile[
    "C:\\Users\\Glen\\Downloads\\Wolfram\\battle-flag-army-of-\
northern-virginia.jpg"]]]

GeoGraphics[{GeoMarker[
   Entity["City", {"Richmond", "Virginia", "UnitedStates"}], 
   confederateFlag]}]

enter image description here

But then, since a flag would be the symbol used in location, trying to color code it wouldn't make sense or look very good. Perhaps the Tooltip could be color coded for the year. As an example,

TooltipStyle -> {Background ->LightRed}

(However, since your Tooltip only shows up when you hover over the location, you'd lose the visual advantage of seeing a lot of information all over the map all at once.) What to do, what to do... Hmmm...

I'd plug it into your code myself for quick results, but the spreadsheet might need a little help first. Battlefields like Gettysburg, Richmond, and Atlanta will resort to their respective cities, but what about Gaines' Mill and Seven Pines? (I happen to know exactly where these latter two places are. Google Maps doesn't know.) Latitude and longitude coordinates might be necessary for viewing locations of battles. As a further example of the need for coordinates, the "Guide" tab in the spreadsheet explains that the Battle of Marietta, GA had four separate engagements at different places during a month-long period.

I'm sure with just a little more research, I can dig up all the coordinates I need. Well, coordinates for close to 400 listed battles might be more than a little research... I thought I'd give you heads up to see what ideas you had. I'll see what I can get to.

--Glen

POSTED BY: Glen Deering
Posted 9 years ago

A couple of good links for Civil War flags:

http://www.historynet.com/civil-war-flags

http://www.americancivilwarstory.com/civil-war-flags.html

http://www.loeser.us/flags/civil.html

This last one seems to be the most interesting. You can get regimental and unit flags here. Ah, that would really blow up this thread. (Still...)

--Glen

POSTED BY: Glen Deering
Posted 9 years ago

I just saw this comment today! Wow thanks for the the information and the cool example you showed. I will be checking out the data file you posted this weekend and playing with it a bit to see if I can come up with a spin off like you suggested in your post. I will curate the data and re-post it a form that should be very easy to visualize - post a couple additional attempts/examples - and we can tackle it together.

POSTED BY: William Duhe
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract