GBK to UTF 8 conversion is garbled

Updated on technology 2024-02-28
11 answers
  1. Anonymous users2024-02-06

    The bytes are missing when you should encode the conversion.,You didn't find that when you entered an even number of words, it was normal.,When the odd number is garbled.,I don't know the specific character length.。

    string str1 = new string("utf-8"),"gbk");

    string str2 = new string("gbk"),"utf-8");

    The length of the printed string is not the same.

    UTF8 occupies 3 bytes per Chinese character, so that in some places the calculation of the number of characters is different from that encoded by GBK, UTF-8 uses variable-length bytes to store Unicode characters, for example, ASCII letters continue to use 1 byte to store, accented scripts, Greek letters or Cyrillic letters and other words use 2 bytes to store, while commonly used Chinese characters use 3 bytes. The auxiliary plane character uses 4 bytes. The GB 18030 standard encodes characters in single-byte, double-byte, and four-byte ways.

    The single-byte portion uses 0 00 to 0 7f (corresponding to the ASCII code). For double-byte parts, the first byte code is from 0 81 to 0 fe, and the last byte code points are 0 40 to 0 7e and 0 80 to 0 fe, respectively. The four-byte part uses 0 30 to 0 39, which is not used in GB T 11383, as the suffix for the expansion of the double-byte encoding, so that the extended four-byte encoding ranges from 0 81308130 to 0 fe39fe39.

    The first of these. 1. The three byte coding code points are 0 81 to 0 fe, the first.

    Second, the four byte code points are 0 30 to 0 39.

    According to programmers, GB2312, GBK, and GB18030 are all double-byte character sets (DBCS).

  2. Anonymous users2024-02-05

    But have you ever thought about it? When we request Tomcat, if our JSP page is encoded in UTF-8, then Tomcat's own encoding is ISO-8859-1, and he will encode our string as ISO-8859-1. Usually we solve garbled characters through servlets or actions.

    string param = new string("iso-8859-1"),"utf-8");

    In this way, the garbled code is solved.,So what's the difference between what the landlord writes?

  3. Anonymous users2024-02-04

    Garbled. The essence is:ReadBinaryThe encoding used is inconsistent with the original encoding of the characters to binary.

    UTF-8 and GBK are two sets of Chinese encodings that support better, so conversions between them are often performed.

    Convert to GBK: 鎴戜slippery 鏄腑锲篲線汉.

    UTF-8 to GBK and then to UTF-8: We are Chinese 2Encoded in GBK and then decoded in UTF-8, then encoded in UTF-8 and decoded in GBK.

    The result of this run is:

    GBK to UTF-8:

    GBK to UTF-8 and then to GBK:

    The lead is called the late dismantling of the kun.

  4. Anonymous users2024-02-03

    GBK and UTF8 are encodings.

    The difference between the two: GBK encoding: refers to Chinese Chinese characters, the others contain Chinese Simplified Chinese and Chinese Traditional characters, and there is also a character "gb2312", which can only store Chinese Simplified Chinese characters.

  5. Anonymous users2024-02-02

    GBK, UTF8 is a kind of character set, is a way to encode Character set (character set) is a collection of multiple characters, there are many types of character sets, each character set contains a different number of characters, common character set names: ASCII character set, GB2312 character set, UTF8 character set, GB18030 character set, Unicode character set, etc.

    In order for a computer to accurately process various character set characters, it needs to encode characters so that the computer can recognize and store various characters.

  6. Anonymous users2024-02-01

    GBK and UTF-8 are the system of character encoding, GBK contains a collection of Chinese, Japanese and Korean characters, he can perfectly support Chinese Simplified and English, but if IE does not install Chinese Simplified Chinese support on the computer to read GBK encoded web pages, Chinese will become garbled, such as British browsing your **, the computer is all Martian, UTF-8 contains most of the text encoding, can express more languages, one of the biggest benefits of using UTF-8 is users in other regions (the United States, India, Taiwan) does not need to install Chinese Simplified Chinese support, you can read your text normally, and there will be no garbled characters, usually the network transmission is also using UTF-8 encoding.

    UTF8 is an international code, its versatility is relatively good, and foreigners can also browse the forum; GBK is a country code, and the versatility is worse than UTF8, but UTF8 occupies a larger database than GBK.

    In order to avoid all the garbled characters, UTF-8 should be used, and it will be very convenient to support internationalization in the future.

  7. Anonymous users2024-01-31

    Under the win7 system, if the encoding of a document file is GBK, you need to change it to UTF-8, you can modify it by saving as, the method is as follows:

    1. Open the txt file of GBK change, and then click "File" -- "Save As" to open.

    2. Click "Encoding" in the "Save As Window" and select "UTF-8" to save.

  8. Anonymous users2024-01-30

    Start - All Programs - Attachments - Command Prompt, open the Command Prompt, enter chcp, press enter to execute, the current system activity page will be queried, which indicates the encoding used by the current system.

  9. Anonymous users2024-01-29

    Change the system code in the dashboard, region, and language.

  10. Anonymous users2024-01-28

    There is no way to modify this, unless it is Linux that can be modified at will.

  11. Anonymous users2024-01-27

    windows registry editor version[hkey_current_user\console\%systemroot%_system32_

    codepage"=dword:0000fde9"fontfamily"=dword:00000036"fontweight"=dword:

    00000190"facename"="consolas"

    screenbuffersize"=dword:232900d2"windowsize"=dword:002b00d2 copy the above script, write it to a notebook, and save it in format:

    Name. reg and then click run,After the system default encoding is UTF-8,I kept making errors when doing data stream conversion,And then I found the method on the Internet,Test it is available!!

Related questions
8 answers2024-02-28

What is the difference between a guqin and a guzheng?

31 answers2024-02-28

1. GBM can be used with flashcards. But you can't use the live wire to burn, the interface is different, but there is a conversion line to sell, look for it! GBM can be inserted into a GBA cassette. >>>More

22 answers2024-02-28

The bland, youthful years have passed.

Romance, the smoke clouds of youth quietly poured in. >>>More

4 answers2024-02-28

To the universe, man is a fart.

Regardless of the birth of man or not, the universe will continue to evolve according to the laws of physics. >>>More

11 answers2024-02-28

Although and though are both although.

The two words are usually interchangeable. >>>More