Message Boards Message Boards

0
|
343 Views
|
3 Replies
|
0 Total Likes
View groups...
Share
Share this post:

Problem doing a bulk import into SQL with GDELT data

Posted 1 month ago

Hi There,

This is a strange one (for me). I'm trying to import data from the GDELT project (http://gdeltproject.org/), one of the files is attached.

I have MS SQL installed on my PC. When I try to do bulk import from these files, I get errors about data truncating. On examining the files, it looks like there is some problems with commas in certain columns (37, 44, and 51).

If I open the files in Excel, Excel misinterprets these columns with a comma (even though this is supposed to be a tab delimited document).

When I try to do things like the following, it still fails to format the files in a way that SQL can read. removeCommasInColumns[rowList, colsList] := MapAt[StringReplace[#, "," -> ""] &, row, List /@ cols] dataworldevents = Import["mypath\20201114.export.CSV", "TSV"]; processedData = removeCommasInColumns[#, {37, 44, 51}] & /@ dataworldevents; Export["mypath\yourfile.csv", processedData, "TSV"];

This is the bulk upload query I'm using: dbQuery = "BULK INSERT worldevents FROM 'mypath\20201114.export.CSV' WITH ( FIRSTROW = 1, FIELDTERMINATOR = '\t', ROWTERMINATOR = '\n', );";

Any help would be appreciated. I'd like to import a lot of data so this seems like it would be the fastest way to do it.

Attachments:
POSTED BY: S G
3 Replies
Posted 9 days ago

Attached is the modified query that I used to create the daily update tables for GDELT. So far it seems to work (for MS SQL). Looking at the data, it looks like the one supplied by GDELT is not correct as a number of fields appear to be the wrong size. Hopefully this is useful to someone. This seems to fix the issues above.

Attachments:
POSTED BY: S G
Posted 1 month ago

What is the exact error message from SQL Server? "data truncation" errors are generated when the column type is not wide enough to accommodate the data. E.g. the column is varchar(8) but the corresponding column in the TSV has more than 8 characters.

POSTED BY: Rohit Namjoshi
Posted 1 month ago
POSTED BY: S G
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract