Hello, I have a csv file or an excel that looks as below:
csv file with columns - path
I would like to first import the first cell with the hyperlink (in blue) and extract everything in plaintext format (just the content) and put the output into the column next to column "path"
I have the following codes so far: (first line importing the column with hyperlinks)
data1 = dataset[1, All, 6];
data2 = data1[3]; <-this line is referring to the third row
data3 = StringSplit[Import[data2, "Plaintext"], ","];
data4 = StringReplace[#, (StartOfString ~~ ",") | ("," ~~
EndOfString) :> ""] & /@ data3;
data5 = StringReplace[#, (StartOfString ~~ Whitespace) | (Whitespace ~~
EndOfString) :> ""] & /@ data4;
data6 = StringSplit[ToString[data5], " "];
data7 = StringSplit[data6, ".htm "];
I have the output:
{{"Description"}, {"Document"}, {"Type"}, {"Size
"}, {"1"}, {"PROXY"}, {"2010"}, {"proxy2010.htm"}, {"DEF"}, \
{"14A"}, {"717341
"}}
I would like to take the part where it says "xxx.htm" <- with xxx constantly changing for every row, but it should always have ".htm" at the end.
My question is
****From the output above, can I take the ".htm" part of the output and store the .htm address to a variable? Can I run this entire process from rows 2 through 100? (loop) Thank you,****