# [✓] What is the intended purpose of ByteArray & how can we use/convert it?

GROUPS:
 Szabolcs Horvát 5 Votes I was wondering what the intended purpose of the ByteArray type was. The cryptography functionality seems to be using it. And in version 11.1 we have BinarySerialize, which people are also a bit confused about (including myself, so consider that function included in this question as well).The most straightforward guess about BinaryArray is that it is a space-efficient and consistent way to store binary data. We could use a list of integers, but that is not space efficient (each takes at least 8 bytes) and the 0..255 range is not enforced.If such a space efficient data type is to be useful, it should be possible to convert/transfer it without the overhead of an inefficient integer-list intermediate representation.How can we convert/transfer ByteArray to/from: Files. Is there a function like BinaryReadList to handle it? LibraryLink. Can I transfer a byte array efficiently to C? Can I convert it to a byte-type RawArray (which is already supported by LibraryLink)? Strings. String are sometimes used to represent the contents of files, or binary data, in a byte-perfect way. We have ImportSting/ExportString for this reason. Strings are not as good for this purpose as a real byte array because each character takes 2 bytes (and hopefully this will change in the future to allow for things beyond the basic multilingual plane in Unicode) Base64 encoded data in strings. This is how ByteArrays show up in InputForm, though the documentation suggests that they are stored more efficient internally. Such a string can be converted to a ByteArray using DeveloperDecodeBase64ToByteArray. What about the reverse conversion? It would also be nice to have an equivalent of StringToStream for ByteArrays.If some of the above are not possible, please consider them a feature request. Regarding reading/writing from/to files, a lightweight function would be preferred (as opposed to the heavyweight, high overhead Import/Export which cannot even be used during initialization, i.e. in init.m)What can we do with ByteArrays other than use them with the cryptography functions?The documentation mentions that we can use Part, First, Last, Min, Max.By experimentation, Take, Drop, Length, Dimensions, Rest, Most also work.So do BitAnd, BitOr, etc.HTTPRequestData and related functions support the property "BodyByteArray"Is there anything else?
4 months ago
10 Replies
 Dorian Birraux 4 Votes ByteArray represents bytes, internally using one byte per value which make them space efficient. Because the data are binary, the performance gain of using them in place of string of bytes, is significant. Indeed, not all byte sequences are a valid String, so, when one stores bytes in a string, the data needs to be validated. That not required with ByteArray.There is an effort to implement most if not all the features you've listed as top level function. In the mean time, here are some non documented functions that you may find interesting, as a complement to those you already mentioned: DeveloperEncodeBase64: ByteArray to String convertion, takes a byte array, returns a base64 string. In[1]:= DeveloperEncodeBase64[ByteArray[Range[5]]] Out[1]= "AQIDBAU="  BinaryWrite accepts ByteArray, as an undocumented feature. I'm not aware of a reverse function that reads a file directly into a byte array. Generally speaking, ByteArray is recommended everywhere one uses binary data. It leads me to talk about BinarySerialize. BinarySerialize serializes any Wolfram Language expression to a binary representation that is platform independent and fast to deserialize. Contrary to MX which contains all the definitions of a given expression, like DownValues, BinarySerialize is more data oriented, and only somehow represents the FullForm of an expression. The format used by BinarySerialize has simple enough specifications that we may consider publishing them.As users noticed on StackOverflow, BinarySerialize does not always produce a smaller output with respect to ByteCount. One reason to that is, contrary to Compress, BinarySerialize does not automatically performs compression of the output. You pay the cost of a zlib compression only if needs be. Also some expressions like arrays (packed, raw) are already efficiently stored in memory, so having an output size of roughly the size of the byte count is generally a good result (Range produces packed array).
4 months ago
 So this can be seen as a kind-of PackedArray, but just for bytes? Semi-related, what about HDF5 import/export? It would be great to directly import towards a ByteArray.
3 months ago
 It is possible to get the contents of a HDF5 file as a ByteArray, it is one of the import elements. But I do not know what sorts of conversions it goes through to get there.
3 months ago
 How do I ask my int8 matrix to be converted to a ByteArray? I can't find an example...
3 months ago
 Szabolcs Horvát 2 Votes Thank you for the response Dorian. ... not all byte sequences are a valid String, so, when one stores bytes in a string, the data needs to be validated Do you mean the reverse, i.e. that not all Strings are a valid byte sequence? This gives True: tup = Tuples[Range[0, 255], {2}]; tup2 = ToCharacterCode /@ FromCharacterCode /@ tup; tup2 === tup All possible byte values, including 0, seem to be storable in Strings.But either way, it is clear enough that a dedicated ByteArray is better for storing byte data than a string. That doesn't need to be explained further. Other than the ones I mentioned, are there any operations we can perform on ByteArrays (especially other things than element extraction)?Here are a few more suggestions, in addition to the ones I already mentions:Efficiently changing elements in-place through Part: a = ByteArray[...]; a[[2]] = 5; (This should also support Span, i.e. ;;)Append, Prepend, AppendTo, PrependTo.Something like Partition to break a big array into parts. My envisioned use case is processing a large ByteArray without unpacking the whole thing to an integer list. Instead, we could unpack small sections at a time, process them, then re-pack them. So perhaps other methods, such as Map, BlockMap, etc. are more appropriate. (E.g., Audio has AudioBlockMap). If ByteCount can be trusted, there is a storage overhead of 96 bytes, so perhaps complete pre-Partition-ing is not the best.Direct creation functions: Analogues of ConstantArray (large constant byte array) and RandomInteger (for random bytes).But the most important missing functionality is conversion: to/from strings (like FromCharacterCode, ToCharacterCode) to/from files (BinaryRealList, BinaryWrite) and very importantly: LibraryLink. Conversion to/from RawArray would suffice, as RawArrays work with LibraryLink since version 10.4. This would allow us to implement efficient functions for anything we need. I looked into the implementation of some of the built-in functions, and I see that currently sending to/from LibraryLink is done through an inefficient conversion to a 64-bit integer list (i.e. {Integer, 1} LibraryLink type).
4 months ago
 Dorian Birraux 1 Vote Do you mean the reverse, i.e. that not all Strings are a valid byte sequence? This gives True: Let me rephrase what I wrote, I should have been clearer. You can indeed represent any unicode character codepoint from 0 to 65535 in a String. Internally though, characters are encoded on bytes, which requires to define a character encoding (i.e. a consistent way of representing values from 0 to 65535 using bytes). Given a character encoding, some byte sequences may be invalid. e.g: In UTF-8 192 is not a valid byte and FromCharacterCode[192, "UTF-8"] returns an error. From this follows that when you build a String` out of bytes, its content is encoded. That's not required with byte arrays. I hope it clarifies.
4 months ago
 Itai Seggev 4 Votes As Dorian indicated, many of the requested features are in the works. Indeed, some of them were planned for the initialze release of ByteArray, but were delayed for various reasons. We certainly hope to gradually added them over the course the next few 11.x release.A bit more of context and answers to questions (both excplicit and implicit).1) ByteArrays are not really arrays (at least in the sense of the Wolfram Language). The represent one dimensional data of 8-bit unsigned integers. In that sense, ByteList or ByteRow might have been clearer. You can think of them as the closest thing WL has to a char *, from that POV of the name ByteArray makes sense.2) I think it would be better to think of them as being like SparseArray or StructuredArray, as opposed to packed array. They are not transparent to user-level functions, but they overload many basic language constructs like Part, Take, etc so that that appear like a 1-dimensional list.3) One very big difference between SparseArray/StructuredArray and ByteArray is that ByteArray is opaque to Listable functions. This is intentional, because we don't them to be accidentally converted something else. And even some thing as simple as Plus: what does addition mean? Do individual entries overflow? Does it get converted to a normal list? Does it get converter to some future TwoByteArray? (We certainly won't have something by that name, but the idea is clear enough.4) Unlike PackedArray which has a FullForm like its normal list, ByteArray uses Base64 so that it efficiently pack its values when you Put / Get it to files, not just in memory.5) We internally encode strings in a variant of UTF-8. Now, of course, any byte can be faithfully converted to/from ISO8859-1, but that encoding only equals UTF-8 for the lower 7 bits. For other values, you need to use multiple bytes per character. So using a string to store byte data is both less space efficient and time efficient (since you need to ensure to correct conversion between the two encodings.)