Gregory:
You had provided some sample code a few months ago and I would concur that the companyInfo process which contains pattern dependent string functions may be the source of some issues. (previous post)
From the amount of files that you are processing I would say that you are starting with about 1,400 CIK codes and process the SEC Header information in the Form DEF14A to get company location (state) and SIC. Please note that this header information is also available as an SGML file at the cost of another URLFetch. So for each Form DEF14A there will be a corresponding SGML header file whose header filename is in the content of each Form DEF14A file. This file is much easier to process. However, doing this for 1400 CIK calling ~10 Form DEFA then ~10 SGML headers would increase the number of web connections and files to save. The other approach is to process the data in the file which you are doing. You are using the ":" (colon) as a record delimiter. This is also what I did in the past also, but I used simple String Splits and did not use ReadList. In addition, I also pre-processed the string to the left of the ":" and brought down any text line with just whitespace on the right of the ":" with the next line below. Here is older code with SGML header processing:
getBeneficialfromSECDEF14A[cik_]:=Module[{formDEF14A,tablestartpos,tablestartnearfunc,tableendpos,tableendnearfunc,benpos,tablestartnearest,tableendnearest,tablestartcommon,tableendcommon,bentablestart,bentableend,bentable,formLinks,sgmlHeaderFileName,sgmlHeaderpos,sgmlHeaderStubURL,sgmlHeaderURL,sgmlHeaderData,sgmlHeaderList,hd,sgmlTextStr,sgmlTextStartpos,sgmlTextEndpos,SICFormValue},processSECHeader[HeaderData_]:=Module[{HeaderDataStr,HeaderDataStream,HeaderDataList},HeaderDataStr=StringReplace[HeaderData,{"<SEC-HEADER>"->"","<TYPE>"->"","<PUBLIC-DOCUMENT-COUNT>"->"","<FILER>"->"","<COMPANY-DATA>"->"","</COMPANY-DATA>"->"","<FILING-VALUES>"->"","</FILING-VALUES>"->"","<BUSINESS-ADDRESS>"->"","</BUSINESS-ADDRESS>"->"","<MAIL-ADDRESS>"->"","</MAIL-ADDRESS>"->"","</FILER>"->"","</SEC-HEADER>"->"","<FORM-TYPE>DEF 14A"->"","<ACT>34"->"","<FORMER-COMPANY>"->"","</FORMER-COMPANY>"->"","<ACCEPTANCE-DATETIME>"->"AcceptanceDatetime|","<ACCESSION-NUMBER>"->"AccessionNumber|","<PERIOD>"->"ConformedPeriodOfReport|","<FILING-DATE>"->"FiledAsOfDate|","<DATE-OF-FILING-DATE-CHANGE>"->"DateAsOfChange|","<EFFECTIVENESS-DATE>"->"EffectivenessDate|","<CONFORMED-NAME>"->"CompanyConformedName|","<CIK>"->"CompanyCIK|","<ASSIGNED-SIC>"->"SICNumber|","<IRS-NUMBER>"->"IRSNumber|","<STATE-OF-INCORPORATION>"->"StateOfIncorporation|","<FISCAL-YEAR-END>"->"FiscalYearEnd|","<FILE-NUMBER>"->"FileNumber|","<FILM-NUMBER>"->"FilmNumber|","<STREET1>"->"AddressStreet1|","<STREET2>"->"AddressStreet2|","<CITY>"->"AddressCity|","<STATE>"->"AddressState|","<ZIP>"->"AddressZip|","<PHONE>"->"BusinessPhone|","<FORMER-CONFORMED-NAME>"->"FormerConformedName|","<DATE-CHANGED>"->"DateChanged|"}];
HeaderDataStream=StringToStream[HeaderDataStr];
HeaderDataList=ReadList[HeaderDataStream,String];
Close[HeaderDataStream];
Return[HeaderDataList];];
foundToYear[x_]:=Module[{foundstr,lyear},foundstr=StringCases[x,RegularExpression["\\-(\\d\\d)\\-"]->"$1"][[1]];
lyear=If[ToExpression[foundstr]>49,Plus[1900,ToExpression[foundstr]],Plus[2000,ToExpression[foundstr]]];
Return[lyear];];
getSICValue[formdata_]:=Module[{searchstring,SICPos,SICNumPos,SICValue=""},searchstring="STANDARD INDUSTRIAL CLASSIFICATION:";
SICPos=StringPosition[formdata,searchstring,IgnoreCase->True];
SICNumPos=StringPosition[formdata,RegularExpression["(STANDARD INDUSTRIAL CLASSIFICATION:)?(\\[\\d{4}\\])"]];
SICValue=StringTrim[StringTake[formdata,{(Last[Flatten[SICPos]]+1),(First[Flatten[SICNumPos]]-1)}]];
Return[SICValue];];
formLinks=DeleteDuplicates[Sort[Select[Import["http://www.sec.gov/cgi-bin/srch-edgar?text=CIK%3D"<>IntegerString[ToExpression[cik],10,10]<>"+TYPE%3DDEF&first=1994&last="<>DateString[DateList[],"Year"],"Hyperlinks"],Function[StringMatchQ[#,"*.txt"]==True]],Function[foundToYear[#1]>foundToYear[#2]]]];
formDEF14A=Import[formLinks[[1]],"Plaintext"];
SICFormValue=getSICValue[formDEF14A];
tablestartpos=StringPosition[formDEF14A,"<table",IgnoreCase->True];
tablestartnearfunc=Nearest[tablestartpos];
tableendpos=StringPosition[formDEF14A,"</table>",IgnoreCase->True];
tableendnearfunc=Nearest[tableendpos];
benpos=StringPosition[formDEF14A,"beneficial",IgnoreCase->True];
tablestartnearest=Flatten[Map[tablestartnearfunc,benpos],1];
tableendnearest=Flatten[Map[tableendnearfunc,benpos],1];
tablestartcommon=Commonest[tablestartnearest];
tableendcommon=Commonest[tableendnearest];
bentablestart=Min[tablestartcommon];
bentableend=Min[tableendcommon];
If[bentableend<bentablestart,(*find other table end*)bentableend=SelectFirst[tableendnearest[[All,2]],Function[Less[bentablestart,#]]];];
bentable=ImportString[StringTake[formDEF14A,{bentablestart,bentableend}],{"HTML","Data"}];
sgmlHeaderpos=StringPosition[formDEF14A,{"<SEC-HEADER>",".sgml"},2,Overlaps->False];
sgmlHeaderFileName=StringTake[formDEF14A,{Last[First[sgmlHeaderpos]]+1,Last[Last[sgmlHeaderpos]]}];
sgmlHeaderStubURL=StringReplacePart[formLinks[[1]],"",Last[StringPosition[formLinks[[1]],RegularExpression["(/).*([.]txt)"]]]];
sgmlHeaderURL=sgmlHeaderStubURL<>"/"<>sgmlHeaderFileName;sgmlHeaderData=Import[sgmlHeaderURL,"Text"];
hd=processSECHeader[sgmlHeaderData];
Return[{formLinks,hd,SICFormValue,bentable}];];
The code above still uses Import, which you may have change to URLFetch and now processing files locally. I will have to dig up the other process or rewrite.
I believe that your original issue was not with Import but with processing the SGML header to get the SIC. Nevertheless I have outlined a basic solution path to processing the companyinfo header in the Form DEF14A txt file. It is also not clear form your answer if you experienced any web issues once file were persisted locally ; did you? I do not wish to steer you wrong.