WHAT IS THE RELATIONSHIP BETWEEN ASC CODE AND UNICODE?

7 answers

Anonymous users2024-02-09

unicode

A character encoding system that uses two bytes for each character, regardless of whether it is an ASCII character or not. This coding scheme is supported by the Microsoft Windows NT platform and leveraged in 32-bit ActiveX technology. International Organization for Standardization (ISO) character standard.

Unicode uses a 16-bit (2-byte) encoding scheme that allows 65,536 different character spaces. Unicode contains representations of punctuation, mathematical symbols, modifiers, and more.

ASCII character set.

Abbreviation for American Standard Code for Information Interchange, a 7-bit character set widely used to represent letters and symbols on standard keyboards. The ASCII character set is the same as the first 128 characters (0 through 127) in the ANSI character set. Code values range from 0 to 255 and represent letters, numbers, punctuation, and other characters.

An ASCII code is a standardized code used to exchange information between computers or between computers and peripherals.

ANSI character set.

Microsoft Windows uses the American National Standards Institute (ANSI) 8-bit character set, which can represent up to 256 characters on a keyboard. The first 128 characters represent the letters and symbols found on a standard American keyboard. The last 128 characters represent special characters such as letters from the international alphabet, accent marks, currency symbols, and fraction symbols, among others.
Anonymous users2024-02-08

Categories: Computer, Networking, >> Software.

Analysis: At present, the most widely used character set and its encoding in computers is the ASCII code (American Standard Code for Information Interchange) developed by the American National Bureau of Standards (ANSI), which has been set as an international standard by the International Organization for Standardization (ISO), called ISO 646 standard. Comic is available for all Latin alphabets, and ASCII codes are available in both 7-digit and 8-digit forms.

Because 1 binary number can represent (21=) 2 states; A 2-digit binary number can represent (22) = 4 states; By analogy, a 7-bit binary number can represent (27=) 128 states, each of which is uniquely coded as a 7-bit binary code corresponding to a character (or control code), which can be arranged into a decimal serial number 0 127. So, a 7-bit ASCII code is encoded with a seven-bit binary number and can represent 128 characters.

No. 0 32 and No. 127 (34 in total) are control characters or communication characters, such as control characters: lf (line break), cr (carriage enter), ff (page change), del (delete), bel (ring), etc.;

Communication-specific characters: SOH (header), EOT (ending), ACK (acknowledge), etc.

No. 33 126 (94 in total) is character, of which No. 48 57 is 0 9 ten ** digits; 65 90 is 26 uppercase letters, 97 122 is 26 lowercase letters, and the rest are some punctuation marks, operation marks, etc.

Note: In a computer's memory unit, an ASCII code value occupies one byte (8 binary bits), and its highest bit (B7) is used as a parity bit. The so-called parity check refers to a method used to check whether there is an error in the transmission process, which is generally divided into two types: odd check and even check.

Odd check provisions: the correct number of 1 in a byte must be an odd number, if it is not an odd number, add 1 to the highest b7; Even check stipulates that the number of 1s in a byte must be even, if it is not even, add 1 to the highest b7.

Unicode code: Unicode code is also an international standard early respect code, which uses two-byte encoding and is incompatible with ANSI code. At present, it is used in the Internet, Windows system and many large software.
Anonymous users2024-02-07

Because the device that is powered on by the hole roll is initially only in two states: power-off (0) and power-on (1), so all the complex things in the computer are arranged and combined by 0 and 1 in the final analysis, so in order to display text on the computer, it is necessary to make the text correspond to the corresponding 0 and 1 (i.e., binary numbers).

Since computers were originally invented by Americans, the earliest correspondence is: f (binary number) = English characters, where English characters contain numbers. Here the correspondence f is called the encoding table (ASCII, 1 byte).

However, with the popularization of computers, computers began to be used by people in various countries, and ASCII, which can only represent English and numbers, cannot meet everyone's needs, for example, it cannot display Chinese, Korean, Japanese and other languages that are not in line with other countries. So we in China developed our own GB2312 code to add Chinese into the ASCII code (meaning that GB2312 is compatible with ASCII correspondence rules, GB2312 is only an extension on the basis of ASCII), at the same time, other countries have done the same, each country has its own set of codes, so there will be garbled in multilingual mixed texts (because countries and countries are basically not compatible with each other).

So why don't you come up with a unified code that is compatible with all the languages in the world? So Unicode came into being, which unifies all the languages into a set of codes, so that as long as everyone uses this set of coded drawings, there will be no more garbled characters.

However, with the addition of more and more languages to Unicode, more and more bytes are required for encoding (4 bytes), sometimes an English document originally needs only 1K to be stored in ASCII, but if you use Unicode, it will increase to 4K, which causes a great waste of resources to be transmitted on the network, so in the spirit of economy, there is a variable length UTF-8 encoding, which is compatible with ASCII and other languages, and will not cause too much waste of space.

Nowadays, when storing, the text is generally saved in the form of UTF-8 (because it is more space-saving), but in computer memory, Unicode encoding is still uniformly used, because Unicode is fixed-length encoding, and the processing efficiency of fixed-length encoding CPU is higher. That is, the computer reads the text from the hard disk, converts its encoding format (UTF-8) into Unicode to the CPU for processing, and then converts the Unicode-encoded text in memory into UTF-8 and stores it on the hard disk.

Note: 1 byte = 8 bits.
Anonymous users2024-02-06

The ASCII code uses a one-byte encoding, so its range is basically only English letters, numbers and some special symbols, and only 256 characters.

When representing a Unicode character, it is common to use "U+" followed by a set of hexadecimal numbers to represent the character. In the Basic Multilingual Plane (abbreviated as BMP). It is also referred to as "plane zero", plane 0), all characters in four-digit hexadecimal numbers (e.g. u+4ae0, a total of more than 60,000 characters); Characters other than the zero plane need to use five- or six-digit hexadecimal numbers.

Older versions of the Unicode standard used a similar notation method, but with some slight differences: in Unicode "u-" followed by eight digits, while "u+" had to be followed by four digits.

Unicode is able to represent all bytes in the world.

GBK is only used to encode Chinese characters, and the full name of GBK is "Chinese Character Internal Code Expansion Specification", which uses double-byte encoding.

UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, also known as universal code. Founded in 1992 by Ken Thompson. It has now been standardized to RFC to encode Unicode characters with 1 to 6 bytes.

It is possible to display Chinese, Simplified, Traditional and other languages (such as English, Japanese, Korean) on the same page.
Anonymous users2024-02-05

1. ASCII code.

string s = "beams";

gb2312 = "gb2312");

byte gb = ;

At this time, there are two numbers in the GB: 193 (11000001) and 186 (10111010).

2. Non-ASCII coding.

string s = "beams";

gb2312 = "gb2312");

byte gb = ;

At this time, there are two numbers in the GB: 193 (11000001) and 186 (10111010).

3、unicode

Unicode is, of course, a large collection, and can now hold more than 1 million symbols. Each symbol is encoded differently. c If you want to see the unicode encoding of a certain Chinese character, you can use the following **:

string s = "beams";

byte unicode = ;

At this time, there are two numbers in Unicode: 129 (10000001) and 104 (1101000).
Anonymous users2024-02-04

The machine only knows 1 and 0, so in order to be able to communicate with the machine in his own language, people have made various regulationsThe correspondence between binary and syntactic characters, i.e., encoding

The earliest popular encoding set was the ASCII code, which specified the encoding of 128 characters in one byte (8-bit binary) (the first binary is unified as 0,2 7), mainly English characters.

The English language is sufficient to use the 128 characters specified in ASCIIBut it's nowhere near enough to represent other languages, even if it makes use of binary in the first place, so there are encodings in various Chinese languages, such as gb2312 that supports Chinese Simplified Chinese, and big5 that supports Chinese Traditional. So when parsing the file, you also need to pay attention to its encoding format, otherwise it will be garbled, because different encodings will have different interpretations of binary.

Unicode was born in this contextIt unifies linguistic symbols all over the world。It is represented by u+** and * is a hexadecimal number.

Unicode is problematic in practical applications, such as:For multi-byte character commands, how do you let the machine know that this is a character and not multiple characters?Therefore, in the actual interaction with the machine, it is necessary to convert Unicode to the format and use it, which leads to UTF (UTF, which is the abbreviation of Unicode TransformationFormat, which means Unicode conversion format), including UTF-32, UTF-16, UTF-8 and so on.

Rules:Each code point is represented by four bytes, and the byte content corresponds to the Unicode code point

Cons:Waste of space。If a file is all in English, three bytes of space will be wasted per character, because one byte per English character can be represented.

Rules:Rules:
Anonymous users2024-02-03

The details are as follows.

The ASCII encoding of the letter A is 65 in decimal and binary. All the languages of the world are unified into a single set of codes, usually two bytes. The ASCII encoding of the letter A is 65 in decimal and 0000000001000001 in binary (based on the ASCII encoding, 8 digits 0 are added in the front); In Chinese characters, the unicode encoding is 20013 in decimal and 0100111000101101 in binary.

ASCII (American Standard Code for Information Interchange) is a computer coding system based on the Latin alphabet, mainly used to display modern English and other Western European languages. It is the most common standard for information exchange, and it was first published in 1967 and last updated in 1986 in the form of a standard for the international standard ISO and has defined a total of 128 characters so far.