Download the notebook at the end of the thread
Abstract
We discuss in detail about how to use Wolfram Language and Baidu Map API service to work on GIS related domestic data science project. This API service is very useful to convert any given street address to geo position in terms of latitude and longitude within mainland China.
Demo
For example, I can visualize average cost person for dinner of a restaurant against its location via GeoBubbleChart
. Without geoencode, I may not put their street address into the plot funtion directly. The same routine is quite useful in commercial property planning in general.
Instruction
Starting from a valid App Key (AK) for the API service according to this document
bdAPIkey = "7ha3**********************72g";
After you are asked to generate the APP key, you will need to choose how to verify the GET request you send to the server to retrieve data. Two options available:
- White list of IP address or "0.0.0.0/0" to accept all IP
- SN checksum
The first method is only OK for testing or in the case that you have a static IP to send request from for internal use or reverse proxy. We are going to use the second method which is more generic than the first one.
Basics steps are:
- Encode a specific partial URL from the query
- Append a private key to the above result and enconde againe
- Compute the MD5 checksum of the new string to generate the SN required
- Attach the SN to the original query
- Send this GET http request to the server and retrieve XML/Json result
- Parse the structured return value
The domain is always like this:
domain = "http://api.map.baidu.com";
The scheme URL and the query URL are constructed via URLBuild
with the App Key sitting at the end of the query
urlpartial=URLBuild[{"/geocoder","v2/"},
{"address"-> "???????","callback"-> "showLocation","output"-> "xml","ak"-> bdAPIkey},CharacterEncoding -> "UTF8"]
(* "/geocoder/v2/?address=%E4%B8%8A%E6%B5%B7%E5%B8%82%E4%B8%8A%E6%B5%B7%E4%B8%AD%E5%BF%83&callback=showLocation&output=xml&ak=7ha<SAMPLE_KEY>72g" *)
Shanghai Tower, a 632 m skyscraper is chosen to be the address as input for instance. Note: This entity is curated in Wolfram Language and its geo position is available in Entity[...]
call.
The next step requires us to attach the private key/SK to the encoded partial URL:
urlpartial~~sk
(* "/geocoder/v2/?address=%E4%B8%8A%E6%B5%B7%E5%B8%82%E4%B8%8A%E6%B5%B7%E4%B8%AD%E5%BF%83&callback=showLocation&output=xml&ak=7ha<SAMPLE_KEY>D72gHo<SAMPLE_KEY>HP" *)
where
sk = Ho<SAMPLE_KEY>HP
Then the signature/SN for verification is generated by (See comment below about All MD5's created equal)
sn = Hash[URLEncode[urlpartial ~~ sk], "MD5", "HexString"]
(* "c87<MD5 HEX Digest>d8d" *)
Let's append the signature/SN to the original query. We can do this either by HTTPRequest[<URL>, "Body"->{...}]
or URLBuild
again:
fullURL = URLBuild[{"http://api.map.baidu.com", "geocoder", "v2/"},
{"address" -> "???????", "callback" -> "showLocation",
"output" -> "xml", "ak" -> bdAPIkey, "sn" -> sn}]
(* "http://api.map.baidu.com/geocoder/v2/?address=%E4%B8%8A%E6%B5%B7%E5%B8%82%E4%B8%8A%E6%B5%B7%E4%B8%AD%E5%BF%83&callback=showLocation&utput=xml&ak=7h<SAMPLE_Key>2g&sn=c87<MD5 HEX Digest>d8d *)
Just pass the URL string into the HTTPRuest
function:
req = HTTPRequest[fullURL, <|Method -> "GET"|>]
and the resultant response, if everything goes well, is
xmlOBJ = URLExecute[req, "XML"]
You can inspect the returning XML object to see the {lat,lon}
information is available for the aforementioned address. Use the following code to extract the geo position pair from the XML with Case
function:
SetAttributes[FindLatLonPair, HoldAll]
FindLatLonPair[xmlOBJ_] := Module[{xmllocations},
xmllocations =
Cases[xmlOBJ, XMLElement["lat", __] | XMLElement["lng", __],
Infinity];
Association[Sort@xmllocations /. {
XMLElement["lng", {}, {lng_}] :>
Rule["Longitude", ToExpression@lng],
XMLElement["lat", {}, {lat_}] :>
Rule["Latitude", ToExpression@lat]
}
]
] /; Head[xmlOBJ] === XMLObject["Document"]
Quickly apply this function on the XML object we had before:
Code of the Demo
Assuming I have curated some data for a list of restaurants in a region. The data include the street addresses and average cost per customer on food and service for dinner there.
Import the data (not attached with the notebook)
entitiesRaw = DeleteCases[Import["data.csv"], item_ /; item[[1]] === ""];
If you wrap everything I have shown in the API call into a function, then Map
the function onto all street address in the datasheet imported, You shall have a list of valid XML objects. Extract all lat-lon
pairs:
geopos = FindLatLonPair /@ (resultsXMLObj);
(*{<|lat->n1,lon->n2|>,<|lat->n3,lon->n4|> ... }*)
Use the following method to generate geo postion <-> value pair
bubbleChartPair = Thread[(GeoPosition[Values[#]] & /@ geopos) -> {dinnerCost1, dinnerCost2 .... } ];
(*{ {Lat, Lon} -> dinnerCost , {Lat, Lon} -> dinnerCost ... }*)
just put them into the GeoBubblePlot
function to generate a nice spacial trend graphic, for instance
GeoBubbleChart[bubbleChartPair]
Some of the Geo positions are offset due to difference in datum (BD9 vs Mathematica's default datum) or civic GIS usage precision lost.
All MD5's Are Created Equal
In the documentation for Baidu API's SN generation, several code snippets are given to demonstrate the MD5 hash code. The results are the same as if from Mathematica. In case you wonder, here is a proof:
sn = Hash["wolfram", "MD5", "HexString"]
(* 5f7e6b1fa5f9740f66c5437b200425d8 *)
comparing to what I have from the Python.org online interactive session
Location of Private Key
After you create AK/App Key, you will be redirected to this page. Private key/SK is bounded by the gold box.
Attachments: