Message Boards Message Boards

Transform a string in UTF-8 format into a string in ANSI format?

GROUPS:

I have a string like a = "abcdefg" it is a UTF-8 format string. I want to transform a into a string which is in ANSI format, how can I do that?

POSTED BY: gearss zhang
Answer
4 days ago

I'm not 100% sure I am right but I can learn if I answer, let's try then:

The question is, what do you mean by UTF-8 format string? Does it come from an UTF-8 encoded source and was decoded during import ("Text", "JSON" format etc.)? Or was it imported as a raw bytes from such source ("String", "Byte")?

Your question suggest the latter while the former is more likely to be the case. Anyway, for 'raw bytes' scenario you can use

FromCharacterCode[ToCharacterCode[string,  "UTF8"], targetEncoding]

and for, a more likely, decoded string scenario:

ExportString[string, "String",  CharacterEncoding -> targetEncoding]

I didn't have a coffee yet so sorry in case I made a mistake, strings and encoding can be confusing.

POSTED BY: Kuba Podkalicki
Answer
4 days ago

I say what I wan to to do: I have a txt file, its encoding format is UTF-8, I want to tranform this file into ANSI format.

POSTED BY: gearss zhang
Answer
4 days ago

I say what I wan to to do: I have a txt file, its encoding format is UTF-8, I want to tranform this file into GB18030 format. but I run the code:

FromCharacterCode[ToCharacterCode[string,  "UTF8"], targetEncoding]

The message says:

Message[Get::noopen, "/opt/Wolfram/WolframEngine/11.2/SystemFiles/CharacterEncodings/GB18030.m"]

How do I solve this problem?

POSTED BY: gearss zhang
Answer
4 days ago

"The phrase ANSI character set has no well-defined meaning..." Do you mean Windows-1252? Or do you mean some other ANSI standard such as ASCII or its successor ISO-8859?

Perhaps

 http://reference.wolfram.com/language/ref/$CharacterEncodings.html 

would help you.

POSTED BY: Michael Rogers
Answer
4 days ago

For characters that are part of ASCII, the UTF-8 encoding is identical, so nothing changes. For characters that are not part of ASCII, the UTF-8 codes have no representation in ASCII. Therefore, what you ask is either trivial or impossible.

POSTED BY: John Doty
Answer
3 days ago

May be this MMa.SE discussion can help you:

POSTED BY: Alexey Popkov
Answer
2 days ago

Group Abstract Group Abstract