Thank you for the response Dorian.
... not all byte sequences are a valid String, so, when one stores bytes in a string, the data needs to be validated
Do you mean the reverse, i.e. that not all Strings are a valid byte sequence? This gives True:
tup = Tuples[Range[0, 255], {2}];
tup2 = ToCharacterCode /@ FromCharacterCode /@ tup;
tup2 === tup
All possible byte values, including 0, seem to be storable in Strings.
But either way, it is clear enough that a dedicated ByteArray is better for storing byte data than a string. That doesn't need to be explained further.
Other than the ones I mentioned, are there any operations we can perform on ByteArrays (especially other things than element extraction)?
Here are a few more suggestions, in addition to the ones I already mentions:
Efficiently changing elements in-place through Part:
a = ByteArray[...];
a[[2]] = 5;
(This should also support Span, i.e. ;;)
Append, Prepend, AppendTo, PrependTo.
Something like Partition to break a big array into parts. My envisioned use case is processing a large ByteArray without unpacking the whole thing to an integer list. Instead, we could unpack small sections at a time, process them, then re-pack them. So perhaps other methods, such as Map, BlockMap, etc. are more appropriate. (E.g., Audio has AudioBlockMap). If ByteCount can be trusted, there is a storage overhead of 96 bytes, so perhaps complete pre-Partition-ing is not the best.
Direct creation functions: Analogues of ConstantArray (large constant byte array) and RandomInteger (for random bytes).
But the most important missing functionality is conversion:
- to/from strings (like FromCharacterCode, ToCharacterCode)
- to/from files (BinaryRealList, BinaryWrite)
- and very importantly: LibraryLink. Conversion to/from RawArray would suffice, as
RawArrays work with LibraryLink since version 10.4. This would allow us to implement efficient functions for anything we need. I looked into the implementation of some of the built-in functions, and I see that currently sending to/from LibraryLink is done through an inefficient conversion to a 64-bit integer list (i.e. {Integer, 1} LibraryLink type).