Message Boards Message Boards

0
|
5728 Views
|
5 Replies
|
2 Total Likes
View groups...
Share
Share this post:

XPath Query: get URL from href attribute.

Posted 1 year ago

I want to extract the URL attribute embedded in the href, as shown below:

enter image description here enter image description here

The second screenshot, which includes the want-to-extract URL indicated by the circle numbered 2, is opened by clicking the button indicated by the circle numbered 1 in the first screenshot.

For this purpose, I tried the following but failed:

In[258]:= session = StartWebSession[]; 
WebExecute["OpenPage" -> "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"]; 
elements = WebExecute["LocateElements" -> "ITA Settings"]; 
WebExecute["ElementAttribute" -> {First@elements, "href"}]


Out[261]= Failure["InvalidInput", <|"MessageTemplate" -> "`command` \
failed.", "MessageParameters" -> <|"command" -> "ElementAttribute"|>, 
  "Element" -> "InvalidInput"|>]

How should I adjust my usage to achieve the goals described here?

Regards,
Zhao

POSTED BY: Hongyi Zhao
5 Replies
Posted 1 year ago

I try to set the proxy as follows but failed:

In[209]:= (*
https://mathematica.stackexchange.com/questions/242495/chrome-driver-error-in-startwebsession
$DefaultProxyRules
*)
(*Set a proxy specification for the HTTP protocol:*)
origUseProxy=$DefaultProxyRules["UseProxy"];
origHttpProxy=$DefaultProxyRules["HTTP"];
origHttpsProxy=$DefaultProxyRules["HTTPS"];
origSocksProxy=$DefaultProxyRules["Socks"];

$DefaultProxyRules["HTTP"] = {"127.0.0.1", 8080};
$DefaultProxyRules["HTTPS"] = {"127.0.0.1", 8080};
$DefaultProxyRules["Socks"] = {"127.0.0.1", 18889};

(*Because the "UseProxy" value acts like a master switch, that HTTP proxy specification will not take effect unless "UseProxy" is set to Manual:*)
$DefaultProxyRules["UseProxy"] = Manual;


In[224]:= (*Use Visible->False to run the browser in "\[AliasDelimiter]ess" mode, where the browser window does not actually become visible:*)
(*session = StartWebSession["Chrome", Visible -> False];*)
session = StartWebSession[]
page = WebExecute[
  "OpenPage" -> 
   "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=227"];
inputs = WebExecute[session, "LocateElements" -> "Tag" -> "Input"];
WebExecute[session, "ClickElement" -> Last@inputs];

anchors = WebExecute["LocateElements" -> "Tag" -> "a"];
Select[WebExecute["PageHyperlinks"], StringContainsQ["gnum="]]
DeleteObject[session];

Out[224]= Failure["StartWebSession", <|"MessageTemplate" -> "Unable \
to start `driver` driver process", 
  "MessageParameters" -> <|"driver" -> "Chrome"|>|>]

During evaluation of In[224]:= URLRead::invhttp: Empty reply from server.

During evaluation of In[224]:= WebExecute::argr: The arguments for WebExecute are not valid.

During evaluation of In[224]:= Last::normal: Nonatomic expression expected at position 1 in Last[$Failed].

During evaluation of In[224]:= WebExecute::argr: The arguments for WebExecute are not valid.

During evaluation of In[224]:= URLRead::invhttp: Empty reply from server.

During evaluation of In[224]:= URLRead::invhttp: Empty reply from server.

During evaluation of In[224]:= StringContainsQ::strse: String or list of strings expected at position 1 in StringContainsQ[gnum=][<|MessageTemplate->`command` failed.,MessageParameters-><|command->PageHyperlinks|>|>].

Out[229]= Failure[]

During evaluation of In[224]:= DeleteObject::nim: Cannot delete object Failure[\[WarningSign]   Message:  Unable to start Chrome driver process
Tag:    StartWebSession

].
POSTED BY: Hongyi Zhao
Posted 1 year ago

Hi Rohit Namjoshi,

Thank you for wonderful comments and tips. It can solve the problem perfectly, as shown below:

In[113]:= (*Use Visible->False to run the browser in "\[AliasDelimiter]ess" mode, where the browser window does not actually become visible:*)
session = StartWebSession["Chrome", Visible -> False];
page = WebExecute[
  "OpenPage" -> 
   "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"];
inputs = WebExecute[session, "LocateElements" -> "Tag" -> "Input"];
WebExecute[session, "ClickElement" -> Last@inputs];

anchors = WebExecute["LocateElements" -> "Tag" -> "a"];
Select[WebExecute["PageHyperlinks"], StringContainsQ["gnum="]]

Out[118]= \
{"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-getgen?gnum=\
007&what=gp", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=-a-c,b,a&unconv=P%201%20n%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=c,b,-a-c&unconv=P%201%20a%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=c,a,b&unconv=P%201%201%20a&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=a,-a-c,b&unconv=P%201%201%20n&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=-a-c,c,b&unconv=P%201%201%20b&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,c,a&unconv=P%20b%201%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,a,-a-c&unconv=P%20n%201%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita"}

Another question is: Can I set an HTTP or SOCKS5 proxy for the browser started by StartWebSession?

POSTED BY: Hongyi Zhao

This is a good question, but very different from the original one. In order for it to be useful to others in the forum it should be posted as a separate question, one that has a subject heading appropriate to the new query. That way people with similar questions will have a better chance of locating it in a search (along with any responses that might come in).

POSTED BY: Daniel Lichtblau
Posted 1 year ago

Done! See here.

POSTED BY: Hongyi Zhao

Hi Hongyi

The circle numbered 1 is not a URL, it is a button in an HTML form (you can see this by using Chrome Developer Tools). You can submit the form for the last form input element which is "ITA Settings" like this

session = StartWebSession[];
page = WebExecute[
  "OpenPage" -> 
   "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"]
inputs = WebExecute[session, "LocateElements" -> "Tag" -> "Input"]
WebExecute[session, "ClickElement" -> Last@inputs]

That will take you to the page with the circle number 2. On that page, each row of the table (apart from the header) does have a link. To navigate to a particular one

anchors = WebExecute["LocateElements" -> "Tag" -> "a"]
pb11 = anchors // 
  Select[StringStartsQ[WebExecute[session, "ElementText" -> #], "P b 1 1"] &]
WebExecute[session, "ClickElement" -> pb11]
POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract