Message Boards Message Boards

9
|
26890 Views
|
8 Replies
|
14 Total Likes
View groups...
Share
Share this post:

One Way to Query Wolfram Alpha from Python

Posted 11 years ago
Recently I found this link very useful for python users if they are interestered in the integration of W|A into their python code. I can quickly explain what this code does and show some modification I have made so that you can interactively use Python as a plain text interface to get information from W|A. 

Before you start, you can register for a developer account on the Wolfram Product page. You may also apply for a free and non-commercial use API ID from the W|A developer portal. You will need this ID to access the database and its XML data structure.

Let me first show you how it looks when you run the modified script and after. The example is a "group of 8 " query about an international group
myMac: ~ shenghui$ ./WAPy.py 'group of 8'

The application then prints the input query and its url encoded form, which just like that you will see in the url bar after you search for this answer in web W|A interface. It also prints out a list of titles related to the detailed information. They corresponds to the following pods from the outputs through web browser. For instance, the tabular result for Demographics is both available for web query and query via API. 


 The prompt at the bottom is asking for a valid choice from the title list. If you want to check out the demographics info for the member coutries, just go ahead the type the exact match or paste it into the input prompt: 

 A nice feature is added here for searching the pod info again within the current XML session. Because there is a cap for non-commercial/personal developer API ID, I do not want to load the XML again from the server to waste my data cap. Therefore I added this "try again" prompt and I could reuse everything already on the local machine. Let's type y (yes) to get some new piece of data from the XML: 

Until you hit "n" or any letter other than 'y', you can squeeze the XML to the last drop of data. 

The python code is like the following: 
preambles: 
 #!/usr/bin/python
 
 from subprocess import call
 call("clear")
 import sys
 import urllib2
 import urllib
 import httplib
 from xml.etree import ElementTree as etree

The body is simply a method call for a wolfram object
appid = 'UQ7*********'
query = sys.argv[1]
print 'I am asking for: ', query
w = wolfram(appid)
w.search(query)

All heavy lifting goes into the definition of the object "wolfram". The constructor takes my API ID/appid as input
class wolfram(object):   
    def __init__(self, appid):       
        self.appid = appid
        self.base_url = 'http://api.wolframalpha.com/v2/query?'
        self.headers = {'User-Agent':None}

The get_xml method uses the urlencode function to turn my W|A input, aka first argument following executing WAPy.py file. After the call, it gets the XML data from the W|A server
    def _get_xml(self, ip):
        url_params = {'input':ip, 'appid':self.appid}
        data = urllib.urlencode(url_params)
        print data
        req = urllib2.Request(self.base_url, data, self.headers)
        xml = urllib2.urlopen(req).read()
        return xml

The core of this python application is to extract the titles from the XML data
     def _xmlparser(self, xml):
         data_dics = {}
         tree = etree.fromstring(xml)
         #retrieving every tag with label 'plaintext'
         for e in tree.findall('pod'):
             for item in [ef for ef in list(e) if ef.tag=='subpod']:
                 for it in [i for i in list(item) if i.tag=='plaintext']:
                     if it.tag=='plaintext':
                         data_dics[e.get('title')] = it.text
        return data_dics

If you are not too familiar with the code, that's fine. Basically it first create a python object < treeobj 0xpppppp> from the XML file and looking for proper tags. The following simply creates list from list e with some condition.
[ef for ef in list(e) if ef.tag=='subpod']

specifically the condition is that the the XML tag must be like <subpod> </subpod>.

Finally the human-readable data is stored in the data_dics. This is a special data structure in python called dictionary. Nothing fancy but just create a mapping like list. e.g.
a = ['name': 'wolfram research', 'product':"wolfram alpha"]
and
a['name']
returns 'wolfram research'. Notice that you cannot use integer index here for a.
The last part is about waiting for my input and print out the result under the chosen pod tile. This method calls two methods of wolfram object before to get "xml" and "result_dics".
    def search(self, ip):
         xml = self._get_xml(ip)
         result_dics = self._xmlparser(xml)
        
         print 'Available Titles', '\n'
         titles = dict.keys(result_dics)
         for ele in titles : print '\t' + ele
         print '\n'
         tryAgain = 'y'
        while tryAgain == 'y':
                s = raw_input('Choose Pod Title: (type quit to terminate) ')
                if s == 'quit': quit()

                while (s not in titles):
                        if s == 'quit': quit()
                        print 'Not Valid Title'
                        s = raw_input('Choose Pod Title Again: ')
                print result_dics[s]
                tryAgain = raw_input('\nTry other pod title(y/n): ')
        print '\nTerminate the query'

To print out the pod titles after I have the dictionary object, I use the dict.keys method. Shortly, it does the following:
a = ['name': 'wolfram research', 'product':"wolfram alpha"]
dict.keys(a) ====> ['name', 'product']

The for loop exhausts the list of pod titles and prints them out.
for ele in titles : print '\t' + ele

The next While loop's purpose is to wait the user choose to quit the progam and the inner loop checks if the user has typeda valid pod title from the list of available pod titles.
Once you have the code saved to a .py file, use
chmod a+x myfile.py
to make it an executable.
Of course this is just a very simple application about the embedding W|A into python code, yet this shows a basic collection of essential elements one need to complete such a task. For a thorough manual about the XML structure, please go to this site
POSTED BY: Shenghui Yang
8 Replies
@Carrettie, I do not have luck with the last three options in your list. They either prints out timeout error or some package not being supported. I will go with the actual cloud platform such as AWS to run the command remotely. 

Also there are several issues with the original code: 
1. sys.argv[0] should move one step ahead after import sys
2. result_dics['Result'] does not work as a general option because not all queries return a "Result" title

Just FYI. Your pasted code should be left as is. . 
POSTED BY: Shenghui Yang

You just need to update the $ \_xmlparser$ method with the following

def _xmlparser(self, xml):
     data_dics = {}
     tree = etree.fromstring(xml)
     #retrieving every tag with label 'plaintext'
     for e in tree.findall('pod'):
         for item in [ef for ef in list(e) if ef.tag=='subpod']:
             for it in [i for i in list(item) if i.tag=='plaintext']:
                 if it.tag=='plaintext':
                     mykey = e.get('title')
                     if mykey not in data_dics.keys():
                          data_dics[mykey] = [it.text]
                     else:
                         prev = data_dics[mykey]
                         data_dics[mykey] = prev + [it.text]
    return data_dics

Thanks for the test query. There was a bug in the original code because the dict object in python only use the latest value for a given key, say

a["key1"] =3
a["key2"] =4
# a is {"key1":3 , "key2":4}

if assign the "key1" again with 6, you should have

a["key1"] =6 
# a is updated to {"key1":6 , "key2":4}

for multiple value of a given key, we need to use a list value instead. After update and you should be able to get all result: listRes

POSTED BY: Shenghui Yang
Posted 11 years ago

Thanks a lot! I had been looking for this for a long time. I just have one issue. If I have a Quadratic Equation in the query, the 'Solutions' key holds only one value of the unknown. How do I get BOTH the values of the unknown? Ex : Query : Solve x^2+3*x+2=0 (Actual solutions : x=-1 and x=-2) But value of the 'Solutions' key is only x=-1 Please help me.
Thanks in advance.

POSTED BY: Nikhila B.S.
This great - thanks for sharing! I winder if this will work with cloud / online Python - similar to these sites:
Just in case I also give a full code from that site you link to - to keep the record - sites come and go.

 import urllib2
 import urllib
 import httplib
 from xml.etree import ElementTree as etree
 
 class wolfram(object):
     def __init__(self, appid):
         self.appid = appid
         self.base_url = 'http://api.wolframalpha.com/v2/query?'
        self.headers = {'User-Agent':None}

    def _get_xml(self, ip):
        url_params = {'input':ip, 'appid':self.appid}
        data = urllib.urlencode(url_params)
        req = urllib2.Request(self.base_url, data, self.headers)
        xml = urllib2.urlopen(req).read()
        return xml

    def _xmlparser(self, xml):
        data_dics = {}
        tree = etree.fromstring(xml)
        #retrieving every tag with label 'plaintext'
        for e in tree.findall('pod'):
            for item in [ef for ef in list(e) if ef.tag=='subpod']:
                for it in [i for i in list(item) if i.tag=='plaintext']:
                    if it.tag=='plaintext':
                        data_dics[e.get('title')] = it.text
        return data_dics

    def search(self, ip):
        xml = self._get_xml(ip)
        result_dics = self._xmlparser(xml)
        #return result_dics
        #print result_dics
        print result_dics['Result']

if __name__ == "__main__":
    appid = sys.argv[0]
    query = sys.argv[1]
    w = wolfram(appid)
    w.search(query)
POSTED BY: Sam Carrettie
Posted 11 years ago

Thank you very much.
But I'm getting this error : 'wolfram' object has no attribute '_WAUnicodeCvtr'
I tried Googling the error (excluding the 'wolfram', of course), but I did not find anything.
Is there anything extra I have to import for this? Or is there any step I have missed?
Thanks in advance.

POSTED BY: Nikhila B.S.

@Nikhila, I have updated the code. I defined _WAUnicodeCvtr myself for non-unicode input and I have removed the function in the last piece of code.

POSTED BY: Shenghui Yang
Posted 11 years ago

Thanks a lot!
Since I am a beginner in Python, I did not know the use of self. I read up about it and I realized it was a separate method.
Could you please tell me what the method does? (I'm very sorry, I'm very new to Python and using WA with Python)
Thank you very much :)

POSTED BY: Nikhila B.S.

You are very welcome. Once you stick with python more often, you will learn a lot about this language. Then you will find more ways to use the Wolfram Alpha API.

POSTED BY: Shenghui Yang
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract