Group Abstract Group Abstract

Message Boards Message Boards

0
|
10.1K Views
|
5 Replies
|
2 Total Likes
View groups...
Share
Share this post:

XPath Query: get URL from href attribute.

Posted 3 years ago

I want to extract the URL attribute embedded in the href, as shown below:

enter image description here enter image description here

The second screenshot, which includes the want-to-extract URL indicated by the circle numbered 2, is opened by clicking the button indicated by the circle numbered 1 in the first screenshot.

For this purpose, I tried the following but failed:

In[258]:= session = StartWebSession[]; 
WebExecute["OpenPage" -> "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"]; 
elements = WebExecute["LocateElements" -> "ITA Settings"]; 
WebExecute["ElementAttribute" -> {First@elements, "href"}]


Out[261]= Failure["InvalidInput", <|"MessageTemplate" -> "`command` \
failed.", "MessageParameters" -> <|"command" -> "ElementAttribute"|>, 
  "Element" -> "InvalidInput"|>]

How should I adjust my usage to achieve the goals described here?

Regards,
Zhao

POSTED BY: Hongyi Zhao
5 Replies
Posted 3 years ago
POSTED BY: Hongyi Zhao
Posted 3 years ago

Hi Rohit Namjoshi,

Thank you for wonderful comments and tips. It can solve the problem perfectly, as shown below:

In[113]:= (*Use Visible->False to run the browser in "\[AliasDelimiter]ess" mode, where the browser window does not actually become visible:*)
session = StartWebSession["Chrome", Visible -> False];
page = WebExecute[
  "OpenPage" -> 
   "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"];
inputs = WebExecute[session, "LocateElements" -> "Tag" -> "Input"];
WebExecute[session, "ClickElement" -> Last@inputs];

anchors = WebExecute["LocateElements" -> "Tag" -> "a"];
Select[WebExecute["PageHyperlinks"], StringContainsQ["gnum="]]

Out[118]= \
{"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-getgen?gnum=\
007&what=gp", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=-a-c,b,a&unconv=P%201%20n%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=c,b,-a-c&unconv=P%201%20a%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=c,a,b&unconv=P%201%201%20a&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=a,-a-c,b&unconv=P%201%201%20n&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=-a-c,c,b&unconv=P%201%201%20b&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,c,a&unconv=P%20b%201%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,a,-a-c&unconv=P%20n%201%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita"}

Another question is: Can I set an HTTP or SOCKS5 proxy for the browser started by StartWebSession?

POSTED BY: Hongyi Zhao

This is a good question, but very different from the original one. In order for it to be useful to others in the forum it should be posted as a separate question, one that has a subject heading appropriate to the new query. That way people with similar questions will have a better chance of locating it in a search (along with any responses that might come in).

POSTED BY: Daniel Lichtblau
Posted 3 years ago

Done! See here.

POSTED BY: Hongyi Zhao
Posted 3 years ago

Hi Hongyi

The circle numbered 1 is not a URL, it is a button in an HTML form (you can see this by using Chrome Developer Tools). You can submit the form for the last form input element which is "ITA Settings" like this

session = StartWebSession[];
page = WebExecute[
  "OpenPage" -> 
   "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"]
inputs = WebExecute[session, "LocateElements" -> "Tag" -> "Input"]
WebExecute[session, "ClickElement" -> Last@inputs]

That will take you to the page with the circle number 2. On that page, each row of the table (apart from the header) does have a link. To navigate to a particular one

anchors = WebExecute["LocateElements" -> "Tag" -> "a"]
pb11 = anchors // 
  Select[StringStartsQ[WebExecute[session, "ElementText" -> #], "P b 1 1"] &]
WebExecute[session, "ClickElement" -> pb11]
POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard