Group Abstract Group Abstract

Message Boards Message Boards

6
|
25.1K Views
|
14 Replies
|
28 Total Likes
View groups...
Share
Share this post:

[?] What is the intended purpose of ByteArray & how can we use/convert it?

Posted 9 years ago
POSTED BY: Szabolcs Horvát
14 Replies
Posted 9 years ago
POSTED BY: Itai Seggev
POSTED BY: Szabolcs Horvát
Posted 9 years ago

The kernel used to use UCS-2 internally. It now uses a variant of UTF-8. MathLink also gained functions for sending and transmitting / receiving UTF-8. These were steps 1 and 2 in the process for getting the full unicode character set. But there are several additional steps to actually get there. Important additional ones include creating equivalents of \:wxyz for non-BMP characters; getting the kernel, MathLink, and FE successfully talking to each other using these new methods; and finding all in the places where the assumption that characters lie in the range 0-65535 is hardcoded, either implictly or explicitly, and updating the code. We have made progress on some of these internally, but as you might imagine its an on-going process and we certainly can't promise the feature on any particular timeline.

POSTED BY: Itai Seggev

Thanks for the comments! It is encouraging that work is being done towards this goal.

POSTED BY: Szabolcs Horvát
Posted 8 years ago

Please do not forget about Unicode in file paths:

POSTED BY: Alexey Popkov

Hi @Itai Seggev and @Dorian Birraux,

There seem to be basically three different efficient representations of byte sequences: strings, byte arrays, and byte-type rank-1 RawArrays.

Only RawArrays can be exchanged with C code nicely. (Strings are supported in LibraryLink, but handling them is cumbersome, and they are assumed to be null-terminated.)

I found that a ByteArray can be converted to a RawArray:

In[4]:= ba = ByteArray[Range[10]];

In[5]:= RawArray["Byte", ba]
Out[5]= RawArray["UnsignedInteger8", "<" 10 ">"]

What about the reverse? How can I convert a rank-1 byte-type RawArray into a ByteArray without unpacking it first into a list of 64-bit machine integers (and blow up the storage requirements 8-fold)?

POSTED BY: Szabolcs Horvát

You can't directly create a ByteArray from a RawArray. But, I see no reason not to support it, since we already have ByteArray from PackedArray.

POSTED BY: Dorian Birraux

Thanks for the response Dorian! In the meantime I also got a response on StackExchange, which pointed out that the type specification "ByteArray" can be used in LibraryFunctionLoad. In C code, it can be treated as a byte-type rank-1 RawArray. In Mathematica it will be a ByteArray. Thus one can write a simple library function that just returns a RawArray that was passed to it, but load it as LibraryFunctionLoad[..., {"RawArray"}, "ByteArray"].

I am looking forward to all this functionality becoming documented and brought to completion!

POSTED BY: Szabolcs Horvát
POSTED BY: Szabolcs Horvát

Do you mean the reverse, i.e. that not all Strings are a valid byte sequence? This gives True:

Let me rephrase what I wrote, I should have been clearer. You can indeed represent any unicode character codepoint from 0 to 65535 in a String. Internally though, characters are encoded on bytes, which requires to define a character encoding (i.e. a consistent way of representing values from 0 to 65535 using bytes). Given a character encoding, some byte sequences may be invalid.

e.g: In UTF-8 192 is not a valid byte and FromCharacterCode[192, "UTF-8"] returns an error.

From this follows that when you build a String out of bytes, its content is encoded. That's not required with byte arrays. I hope it clarifies.

POSTED BY: Dorian Birraux
POSTED BY: Dorian Birraux

So this can be seen as a kind-of PackedArray, but just for bytes? Semi-related, what about HDF5 import/export? It would be great to directly import towards a ByteArray.

POSTED BY: Sander Huisman

It is possible to get the contents of a HDF5 file as a ByteArray, it is one of the import elements. But I do not know what sorts of conversions it goes through to get there.

POSTED BY: Szabolcs Horvát

How do I ask my int8 matrix to be converted to a ByteArray? I can't find an example...

POSTED BY: Sander Huisman
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard