Message Boards Message Boards

1
|
4567 Views
|
4 Replies
|
3 Total Likes
View groups...
Share
Share this post:

Improve the speed of Binary import?

Dear All,

I have the following binary file format:

  1. Every 'frame' starts with an "UnsignedInteger32" which gives the number of datapoints.

Then for each datapoint of that frame it is the following form:

  1. number of cameras as "UnsignedInteger8" (call it ncam)
  2. position x,y,z and distance d as "Real32"
  3. ncam times an "UnsignedInteger8" followed by "UnsignedInteger16" (e.g. ncam 2 => "UnsignedInteger8", "UnsignedInteger16","UnsignedInteger8", "UnsignedInteger16")

As you can see the datapoints have to be read one-by-one, as the first number of the datapoint determines the length of that datapoints (which varies between 2 and 3 in my case, but could be higher.

so each datapoint is the following sequence of data-types:

  • ncam=2: "UnsignedInteger8", "Real32", "Real32", "Real32", "Real32", "UnsignedInteger8", "UnsignedInteger16", "UnsignedInteger8", "UnsignedInteger16"
  • ncam=3: "UnsignedInteger8", "Real32", "Real32", "Real32", "Real32", "UnsignedInteger8", "UnsignedInteger16", "UnsignedInteger8", "UnsignedInteger16", "UnsignedInteger8", "UnsignedInteger16"

I devised the following code to read the data:

ClearAll[ncamString]
ncamString[n_Integer]:=ncamString[n]=Join[{"Real32","Real32","Real32","Real32"},Join@@ConstantArray[{"UnsignedInteger8","UnsignedInteger16"},n]]
str=OpenRead["trial.png",BinaryFormat->True];
n=BinaryRead[str,"UnsignedInteger32"];
Print["Reading frames\[Ellipsis]"];
j=0;
start=AbsoluteTime[];
Dynamic[j]
alldata=Reap[
    While[n=!=EndOfFile,
            j++;
            data=Table[
            cams=BinaryRead[str,"UnsignedInteger8"];
            other=ncamString[cams];
            other=BinaryRead[str,other];
            Prepend[other,cams]
        ,
            {n}
        ];
        Sow[data];
        n=BinaryRead[str,"UnsignedInteger32"] (* for next frame *)
        ]
    ][[2,1]];
Close[str];

elapsed=AbsoluteTime[]-start;
Print["Reading ",Length[alldata]," frames took: ",elapsed," sec. Or ",elapsed/Length[alldata]," sec/frame."]

This takes roughly 40 milliseconds per frame (or roughly 500KB/s which is very low for an SSD that can do 3 GB/s, yes capital b!). I have attached the test-file with just 5 frames with the extension .png only because the forum does not allow for .bin files.

If I run RuntimeTools`Profile on my code it shows that ~90% of the time is spent on the many many BinaryRead calls. Is there a faster way to 'read in' the entire data in to memory first an then to interpret from there? Something like StringToStream... Or an alternative to BinaryRead

Attachments:
POSTED BY: Sander Huisman
4 Replies

I just found out that if I replace:

BinaryRead[str, listoftypes];

with

First[BinaryReadList[str, listoftypes, 1]];

is quite a big speed-up, roughly 5x. It is strange, to me, that BinaryRead is not immediately turned into a BinaryReadList call when types is a list. This is tested with the StringToStream-stream, to make sure it is not the many IO calls that is the bottleneck...

POSTED BY: Sander Huisman

Sander,

When I read this post, I thought "Sander will know how to do this"!!!

I don't know if this will help but ReadList is very fast. Can you do this:

lst = ReadList["trial.bin", Byte]

and parse the bytes??

I did something like this when we were creating Binary files for an Arbitrary WaveForm generator.

I don't have time now to find that code and see what we did but I can try later today unless you get it working sooner.

I hope this helps.

Regards,

Neil

POSTED BY: Neil Singer

Hi Neil,

Thanks for the suggesting. I've tried ReadList, but found it slower than Import:

AbsoluteTiming[str = Import[fn, "Byte"];]
AbsoluteTiming[str = ReadList[fn, Byte];]

Import is 4x faster on my machine. Moreover, Import stores it in 3x smaller memory footprint. I think the main problem is the casting process here, that happens inside BinaryRead.

BinaryReadList has some optimisations if all the types are the same:

Needs["GeneralUtilities`"];
PrintDefinitions@BinaryReadList

however BinaryRead, that I use, doesn't have optimisations (at least visible outside the kernel code). It just reads each format 1-by-1 and checks for $Failed and so on:

Catch[Map[System`BinaryReadDump`ThrowBinaryRead[channel, #, opts] &, fmt]];

So BinaryRead with a list of formats is broken up into separate reads of BinaryRead... Which might explain why it takes so long, each number is now cast one-by-by...

POSTED BY: Sander Huisman

I tried already the following constructs:

string = FromCharacterCode[Import[fn, "Byte"]];
str = StringToStream[string];

string = Import[fn, "String"];
str = StringToStream[string];

but both are roughly the same speed unfortunately. Indicating that BinaryRead has a bunch of overhead, i.e. it does not originate from the actual system-calls, but from other auxiliary stuff... (casting, error-checking, ...)

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract