How many Chinese characters can be processed by a computer

2 answers

Anonymous users2024-02-11

The number of Chinese characters in the computer Chinese character database is determined by the character set, GB2312 has more than 6,000 characters, GBK is about 1-20,000, and GB18030 is about 60,000.

There are four kinds of storage and operation of Chinese character information in the computer: input code, national standard code, internal code and font code.

Input code: including pinyin encoding and font encoding. Microsoft Pinyin ABC is pinyin encoding, and Wubi font input method is font encoding.

National standard code: also known as Chinese character exchange code, used to exchange information between computers. Represented by two bytes, the highest bit of each byte is 0, so the number of Chinese characters that can be represented is 2 to the power of 14, which is 16384.

Add the decimal number 32 (that is, 20 of the hexadecimal number) to the high byte and low byte of the Chinese character area code to obtain the national standard code. For example, the national standard code for the character "中" is 8680 (decimal) or 7468 (hexadecimal).

Internal code: The internal code of Chinese characters is used to store, process and transmit Chinese characters within the equipment and information processing system. Regardless of the input code used, it is converted to an internal code as soon as it enters the computer.

The rule is to add 128 (decimal) or 80 (hexadecimal) to the high byte and low byte of the national standard code. For example, the encoding of the word "中" should be f4e8 when it is expressed in hexadecimal. The purpose of this is to distinguish the Chinese character encoding from the Western ascii, because the high position of the ascii of each Western letter is 0, and the high position of each byte of the Chinese character encoding is 1.

Font code: Represents the font model data of the Chinese character glyph, so it is also called the font pattern code, which is the output form of the Chinese character. It is usually represented by lattice, vector function, etc.

When expressed by dot matrix, the glyph code refers to the ** of the glyph dot matrix of this Chinese character. Depending on the requirements for outputting kanji, the number of dot lattices is also different. The simple Chinese characters are 16 16 dots, the improved kanji are 24 24 dots, 48 48 dots, and so on.

If it's a 24 24 dot matrix, 24 dots per line are 24 binary bits, and it takes 3 bytes to store a row. So, 24 lines take up a total of 3 24 = 72 bytes. Calculation formula:

8 rows per line. Therefore, for a 48 48 dot matrix, the storage space required for a Chinese glyph is 48 8 48 = 6 48 = 288 bytes.
Anonymous users2024-02-10

Depending on the level of standard, the number of kanji libraries also varies.

According to GB2312, the Chinese character database specified in it contains 6,763 commonly used Chinese characters.

By the time of GB18030, the Chinese character database had been greatly expanded to 70,244 (in fact, not only "Chinese" characters, but also many ethnic minority characters had been summarized and collected into this version of the library).