Group Abstract Group Abstract

Message Boards Message Boards

0
|
10K Views
|
5 Replies
|
2 Total Likes
View groups...
Share
Share this post:

XPath Query: get URL from href attribute.

Posted 3 years ago

I want to extract the URL attribute embedded in the href, as shown below:

enter image description here enter image description here

The second screenshot, which includes the want-to-extract URL indicated by the circle numbered 2, is opened by clicking the button indicated by the circle numbered 1 in the first screenshot.

For this purpose, I tried the following but failed:

In[258]:= session = StartWebSession[]; 
WebExecute["OpenPage" -> "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"]; 
elements = WebExecute["LocateElements" -> "ITA Settings"]; 
WebExecute["ElementAttribute" -> {First@elements, "href"}]


Out[261]= Failure["InvalidInput", <|"MessageTemplate" -> "`command` \
failed.", "MessageParameters" -> <|"command" -> "ElementAttribute"|>, 
  "Element" -> "InvalidInput"|>]

How should I adjust my usage to achieve the goals described here?

Regards,
Zhao

POSTED BY: Hongyi Zhao
5 Replies
Posted 3 years ago

I try to set the proxy as follows but failed:

In[209]:= (*
https://mathematica.stackexchange.com/questions/242495/chrome-driver-error-in-startwebsession
$DefaultProxyRules
*)
(*Set a proxy specification for the HTTP protocol:*)
origUseProxy=$DefaultProxyRules["UseProxy"];
origHttpProxy=$DefaultProxyRules["HTTP"];
origHttpsProxy=$DefaultProxyRules["HTTPS"];
origSocksProxy=$DefaultProxyRules["Socks"];

$DefaultProxyRules["HTTP"] = {"127.0.0.1", 8080};
$DefaultProxyRules["HTTPS"] = {"127.0.0.1", 8080};
$DefaultProxyRules["Socks"] = {"127.0.0.1", 18889};

(*Because the "UseProxy" value acts like a master switch, that HTTP proxy specification will not take effect unless "UseProxy" is set to Manual:*)
$DefaultProxyRules["UseProxy"] = Manual;


In[224]:= (*Use Visible->False to run the browser in "\[AliasDelimiter]ess" mode, where the browser window does not actually become visible:*)
(*session = StartWebSession["Chrome", Visible -> False];*)
session = StartWebSession[]
page = WebExecute[
  "OpenPage" -> 
   "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=227"];
inputs = WebExecute[session, "LocateElements" -> "Tag" -> "Input"];
WebExecute[session, "ClickElement" -> Last@inputs];

anchors = WebExecute["LocateElements" -> "Tag" -> "a"];
Select[WebExecute["PageHyperlinks"], StringContainsQ["gnum="]]
DeleteObject[session];

Out[224]= Failure["StartWebSession", <|"MessageTemplate" -> "Unable \
to start `driver` driver process", 
  "MessageParameters" -> <|"driver" -> "Chrome"|>|>]

During evaluation of In[224]:= URLRead::invhttp: Empty reply from server.

During evaluation of In[224]:= WebExecute::argr: The arguments for WebExecute are not valid.

During evaluation of In[224]:= Last::normal: Nonatomic expression expected at position 1 in Last[$Failed].

During evaluation of In[224]:= WebExecute::argr: The arguments for WebExecute are not valid.

During evaluation of In[224]:= URLRead::invhttp: Empty reply from server.

During evaluation of In[224]:= URLRead::invhttp: Empty reply from server.

During evaluation of In[224]:= StringContainsQ::strse: String or list of strings expected at position 1 in StringContainsQ[gnum=][<|MessageTemplate->`command` failed.,MessageParameters-><|command->PageHyperlinks|>|>].

Out[229]= Failure[]

During evaluation of In[224]:= DeleteObject::nim: Cannot delete object Failure[\[WarningSign]   Message:  Unable to start Chrome driver process
Tag:    StartWebSession

].
POSTED BY: Hongyi Zhao
Posted 3 years ago

Hi Rohit Namjoshi,

Thank you for wonderful comments and tips. It can solve the problem perfectly, as shown below:

In[113]:= (*Use Visible->False to run the browser in "\[AliasDelimiter]ess" mode, where the browser window does not actually become visible:*)
session = StartWebSession["Chrome", Visible -> False];
page = WebExecute[
  "OpenPage" -> 
   "https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=7"];
inputs = WebExecute[session, "LocateElements" -> "Tag" -> "Input"];
WebExecute[session, "ClickElement" -> Last@inputs];

anchors = WebExecute["LocateElements" -> "Tag" -> "a"];
Select[WebExecute["PageHyperlinks"], StringContainsQ["gnum="]]

Out[118]= \
{"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-getgen?gnum=\
007&what=gp", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=-a-c,b,a&unconv=P%201%20n%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=c,b,-a-c&unconv=P%201%20a%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=c,a,b&unconv=P%201%201%20a&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=a,-a-c,b&unconv=P%201%201%20n&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=-a-c,c,b&unconv=P%201%201%20b&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,c,a&unconv=P%20b%201%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,a,-a-c&unconv=P%20n%201%201&from=ita", \
"https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&\
what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita"}

Another question is: Can I set an HTTP or SOCKS5 proxy for the browser started by StartWebSession?

POSTED BY: Hongyi Zhao

This is a good question, but very different from the original one. In order for it to be useful to others in the forum it should be posted as a separate question, one that has a subject heading appropriate to the new query. That way people with similar questions will have a better chance of locating it in a search (along with any responses that might come in).

POSTED BY: Daniel Lichtblau
Posted 3 years ago

Done! See here.

POSTED BY: Hongyi Zhao
Posted 3 years ago
POSTED BY: Rohit Namjoshi
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard