Gregory:
I would also advise that you persist your data in a local data store, be it your file system or a database. So I would spend one pass getting all the internet data locally, then another pass processing this data. This way you divide data fetching from data processing. Your HTTPClient connections will be shorter and may reduce the possibility of connection or session timeouts.
The following code will import the links to Form DEF14A for a particular CIK from 1994 to this year using Import.
$HTTPCookies
cik = 320193;
foundToYear[x_]:=Module[{foundstr,lyear},foundstr=StringCases[x,RegularExpression["\\-(\\d\\d)\\-"]->"$1"][[1]];
lyear=If[ToExpression[foundstr]>49,Plus[1900,ToExpression[foundstr]],Plus[2000,ToExpression[foundstr]]];
Return[lyear];];
DeleteDuplicates[Sort[Select[Import["http://www.sec.gov/cgi-bin/srch-edgar?text=CIK%3D"<>IntegerString[ToExpression[cik],10,10]<>"+TYPE%3DDEF&first=1994&last="<>DateString[DateList[],"Year"],"Hyperlinks"],Function[StringMatchQ[#,"*.txt"]==True]],Function[foundToYear[#1]>foundToYear[#2]]]]
$HTTPCookies
The following code will import the links to Form DEF14A for a particular CIK from 1994 to this year using URLFetch.
$HTTPCookies
cik = 320193;
foundToYear[x_]:=Module[{foundstr,lyear},foundstr=StringCases[x,RegularExpression["\\-(\\d\\d)\\-"]->"$1"][[1]];
lyear=If[ToExpression[foundstr]>49,Plus[1900,ToExpression[foundstr]],Plus[2000,ToExpression[foundstr]]];
Return[lyear];];
DeleteDuplicates[Sort[Select[ImportString[URLFetch["http://www.sec.gov/cgi-bin/srch-edgar?text=CIK%3D"<>IntegerString[ToExpression[cik],10,10]<>"+TYPE%3DDEF&first=1994&last="<>DateString[DateList[],"Year"], "Cookies" -> False],{"HTML","Hyperlinks"}],Function[StringMatchQ[#,"*.txt"]==True]],Function[foundToYear[#1]>foundToYear[#2]]]]
$HTTPCookies
The $HTTPCookies global variable remains empty in this example. However, side effects are the links returned in this version the base URL is lost . Also may want to throw in "StoreCookies"->False. Also look into the Help for "tutorial/InternetConnectivity"
The aim is to reduce the reuse of the same connection (In some cases a good thing) if you decide to process the data in the same run as fetching it. So if it take 90 seconds to process a particular CIK and the connection timeout for either HTTPClient, Proxy Server, or Web Server is shorter than 90 second, the next internet connection to same base URL may return an error. You want the HTTPClient and web server to treat each connection as a new connection.
Hans