# 1.2 Data Representation in a Computer

Computer must not only be able to carry out computations, they must be able to do them quickly and efficiently. There are several data representations, typically for integers, real numbers, characters, and logical values.

### Number Representation in Various Numeral Systems

A numeral system is a collection of symbols used to represent small numbers, together with a system of rules for representing larger numbers. Each numeral system uses a set of digits. The number of various unique digits, including zero, that a numeral system uses to represent numbers is called base or radix.

#### Base – b numeral system

b basic symbols (or digits) corresponding to natural numbers between 0 and b − 1 are used in the representation of numbers.

To generate the rest of the numerals, the position of the symbol in the figure is used. The symbol in the last position has its own value, and as it moves to the left its value is multiplied by b.

We write a number in the numeral system of base b by expressing it in the form

N(b), with n+1 digit for integer and m digits for fractional part, represents the sum:

Decimal, Binary, Octal and Hexadecimal are common used numeral system. The decimal system has ten as its base. It is the most widely used numeral system, because humans have four fingers and a thumb on each hand, giving total of ten digit over both hand.

Switches, mimicked by their electronic successors built of vacuum tubes, have only two possible states: “open” and “closed”. Substituting open=1 and closed=0 yields the entire set of binary digits. Modern computers use transistors that represent two states with either high or low voltages. Binary digits are arranged in groups to aid in processing, and to make the binary numbers shorter and more manageable for humans.Thus base 16 (hexadecimal) is commonly used as shorthand. Base 8 (octal) has also been used for this purpose.

Decimal System

Decimal notation is the writing of numbers in the base-ten numeral system, which uses various symbols (called digits) for no more than ten distinct values (0, 1, 2, 3, 4, 5, 6, 7, 8 and 9) to represent any number, no matter how large. These digits are often used with a decimal separator which indicates the start of a fractional part, and with one of the sign symbols + (positive) or − (negative) in front of the numerals to indicate sign.

Decimal system is a place-value system. This means that the place or location where you put a numeral determines its corresponding numerical value. A two in the one’s place means two times one or two. A two in the one-thousand’s place means two times one thousand or two thousand.

The place values increase from right to left. The first place just before the decimal point is the one’s place, the second place or next place to the left is the ten’s place, the third place is the hundred’s place, and so on.

The place-value of the place immediately to the left of the “decimal” point is one in all place-value number systems. The place-value of any place to the left of the one’s place is a whole number computed from a product (multiplication) in which the base of the number system is repeated as a factor one less number of times than the position of the place.

For example, 5246 can be expressed like in the following expressions

The place-value of any place to the right of the decimal point is a fraction computed from a product in which the reciprocal of the base—or a fraction with one in the numerator and the base in the denominator—is repeated as a factor exactly as many times as the place is to the right of the decimal point.

For example

#### Binary System

The binary number system is base 2 and therefore requires only two digits, 0 and 1. The binary system is useful for computer programmers, because it can be used to represent the digital on/off method in which computer chips and memory work.

A binary number can be represented by any sequence of bits (binary digits), which in turn may be represented by any mechanism capable of being in two mutually exclusive states.

Counting in binary is similar to counting in any other number system. Beginning with a single digit, counting proceeds through each symbol, in increasing order. Decimal counting uses the symbols 0 through 9, while binary only uses the symbols 0 and 1.

When the symbols for the first digit are exhausted, the next-higher digit (to the left) is incremented, and counting starts over at 0A single bit can represent one of two values, 0 or 1.Binary numbers are convertible to decimal numbers.

Here’s an example of a binary number, 11101.11(2)MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaaigdacaaIXaGaaGymaiaaicdacaaIXaGaaiOlaiaaigdacaaIXaWaaSbaaSqaaiaacIcacaaIYaGaaiykaaqabaaaaa@3DF1@ , and its representation in the decimal notation

The hexadecimal system is base 16. Therefore, it requires 16 digits. The digits 0 through 9 are used, along with the letters A through F, which represent the decimal values 10 through 15. Here is an example of a hexadecimal number and its decimal equivalent:

The hexadecimal system (often called the hex system) is useful in computer work because it is based on powers of 2. Each digit in the hex system is equivalent to a four-digit binary number. Table below shows some hex/decimal/binary equivalents.

 Hexadecimal Digit Decimal Equivalent Binary Equivalent 0 0 0000 1 1 0001 2 2 0010 3 3 0011 4 4 0100 5 5 0101 6 6 0110 7 7 0111 8 8 1000 9 9 1001 A 10 1010 B 11 1011 C 12 1100 D 13 1101 E 14 1110 F 15 1111 10 16 10000 F0 240 11110000 FF 255 11111111

#### Octal System

Binary is also easily converted to the octal numeral system, since octal uses a radix of 8, which is a power of two (namely, 23, so it takes exactly three binary digits to represent an octal digit). The correspondence between octal and binary numerals is the same as for the first eight digits of hexadecimal in the table above. Binary 000 is equivalent to the octal digit 0, binary 111 is equivalent to octal 7, and so forth.

Converting from octal to binary proceeds in the same fashion as it does for hexadecimal:

And from octal to decimal:

#### Converting from decimal to base–b

After that, multiply the fractional part by b repeatedly to get each digit as an integer part. We will continue this process until we get a zero as our fractional part or until we recognize an infinite repeating pattern.

Now convert 0.625 to hexadecimal :

.

0.39625 * 16 = 0.625 ————————————-> 0

.625* 16 = 10 —————————> A.

We get fractional part is zero.

### Data Representation in a Computer. Units of Information

#### Basic Principles

Data Representation refers to the methods used internally to represent information stored in a computer. Computers store lots of different types of information:

• numbers
• text
• graphics of many varieties (stills, video, animation)
• sound

At least, these all seem different to us. However, all types of information stored in a computer are stored internally in the same simple format: a sequence of 0’s and 1’s. How can a sequence of 0’s and 1’s represent things as diverse as your photograph, your favorite song, a recent movie, and your term paper?

• Numbers must be expressed in binary form following some specific standard.
• Character data are assigned a sequence of binary digits
• Other types of data, such as sounds, videos or other physical signals are converted to digital following the schema below

Digital signal

Depending on the nature of its internal representation, data items are divided into:

• Basic types (simple types or type primitives) : the standard scalar predefined types that one would expect to find ready for immediate use in any programming language
• Structured types(Higher level types) are then made up from such basic types or other existing higher level types.

#### Units of Information

The most basic unit of information in a digital computer is called a BIT, which is a contraction of Binary Digit. In the concrete sense, a bit is nothing more than a state of “on” or “off” (or “high” and “low”) within a computer circuit. In 1964, the designers of the IBM System/360 mainframe computer established a convention of using groups of 8 bits as the basic unit of addressable computer storage. They called this collection of 8 bits a byte.

Computer words consist of two or more adjacent bytes that are sometimes addressed and almost always are manipulated collectively. The word size represents the data size that is handled most efficiently by a particular architecture. Words can be 16 bits, 32 bits, 64 bits, or any other size that makes sense within the context of a computer’s organization.

Some other units of information are described in the following table :

Representation of Integers

An integer is a number with no fractional part; it can be positive, negative or zero. In ordinary usage, one uses a minus sign to designate a negative integer. However, a computer can only store information in bits, which can only have the values zero or one. We might expect, therefore, that the storage of negative integers in a computer might require some special technique – allocating one sign bit (often the most significant bit) to represent the sign: set that bit to 0 for a positive number, and set to 1 for a negative number.

#### Unsigned Integers

Unsigned integers are represented by a fixed number of bits (typically 8, 16, 32, and/or 64)

• With 8 bits, 0…255 (0016…FF16) can be represented;
• With 16 bits, 0…65535 (000016…FFFF16) can be represented;
• In general, an unsigned integer containing n bits can have a value between 0 and 2n1MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaaikdadaahaaWcbeqaaiaad6gaaaGccqGHsislcaaIXaaaaa@3970@

If an operation on bytes has a result outside this range, it will cause an ‘overflow’

#### Signed Integers

The binary representation discussed above is a standard code for storing unsigned integer numbers. However, most computer applications use signed integers as well; i.e. the integers that may be either positive or negative.

In binary we can use one bit within a representation (usually the most significant or leading bit) to indicate either positive (0) or negative (1), and store the unsigned binary representation of the magnitude in the remaining bits.

However, for reasons of ease of design of circuits to do arithmetic on signed binary numbers (e.g. addition and subtraction), a more common representation scheme is used called two’s complement. In this scheme, positive numbers are represented in binary, the same as for unsigned numbers. On the other hand, a negative number is represented by taking the binary representation of the magnitude:

• Complement the bits : Replace all the 1’s with 0’s, and all the 0’s with 1’s;
• Add one to the complemented number.

Example

+4210 =  001010102
and so
-4210   =  110101102

• Binary number with leading 0 is positive
• Binary number with leading 1 is negative

Example

Performing two’s complement on the decimal 42 to get -42

Using a eight-bit representation

42= 00101010   Convert to binary

11010101   Complement the bits

11010101   Add 1 to the complement
+ 00000001
--------
11010110  Result is  -42 in two's complement


### Arithmetic Operations on Integers

#### Addition and Subtraction of integers

Addition and subtraction of unsigned binary numbers

Binary Addition is much like normal everyday (decimal) addition, except that it carries on a value 2 instead of value 10.

0 + 0 = 0

0 + 1 = 1

1 + 0 = 1

1 + 1 = 0, and carry 1 to the next more significant bit

Example

00011010 + 00001100 = 00100110
1 1	            carries
0 0 0 1 1 0 1 0	=   26(base 10)
+  0 0 0 0 1 1 0 0	=   12(base 10)
----------------
0 0 1 0 0 1 1 0	=   38(base 10)

11010001 + 00111110 = 100011010

1 1           1	      carries
1 1 0 1 0 0 0 1	 =    208 (base 10)
+ 0 1 0 0 1 0 0 1	 =    73 (base 10)
----------------
1 0 0 0 1 1 0 1 0     =    281 (base 10)



The result exceeds the magnitude which can be represented with 8 bits. This is an overflow.

Subtraction is executed by using two’s complement

Addition and subtraction of signed binary numbers

#### Multiplication and Division of Integers

Binary Multiplication

Multiplication in the binary system works the same way as in the decimal system:

0 x 0 = 0

0 x 1 = 0

1 x 0 = 0

1 x 1 = 1, and no carry or borrow bits

Example

00101001 × 00000110 = 11110110

0  0  1  0  1  0  0  1	   =   	41(base 10)
×     0  0  0  0  0  1  1  0	   =   	6(base 10)
----------------------
0  0  0  0  0  0  0
0  1  0  1  0  0  1
0  1  0  1  0  0  1
----------------------------
0  0  1  1  1  1  0  1  1  0	   =   	246(base 10)

00010111 × 00000011 = 01000101

0  0  0  1  0  1  1  1	   =   	23(base 10)
×     0  0  0  0  0  0  1  1	   =   	3(base 10)
----------------------
1  1  1  1  1      	 	carries
0  0  1  0  1  1  1
0  0  1  0  1  1  1
0  0  1  0  0  0  1  0  1	   =   	69(base 10)


Binary division follow the same rules as in decimal division.

### Logical operations on Binary Numbers

Logical Operation with one or two bits

NOT : Changes the value of a single bit. If it is a “1”, the result is “0”; if it is a “0”, the result is “1”.

AND: Compares 2 bits and if they are both “1”, then the result is “1”, otherwise, the result is “0”.

OR : Compares 2 bits and if either or both bits are “1”, then the result is “1”, otherwise, the result is “0”.

XOR : Compares 2 bits and if exactly one of them is “1” (i.e., if they are different values), then the result is “1”; otherwise (if the bits are the same), the result is “0”.

Logical operators between two bits have the following truth table

 x y x AND y x OR y x XOR y 1 1 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 0

Logical Operation with one or two binary numbers

A logical (bitwise) operation operates on one or two bit patterns or binary numerals at the level of their individual bits.

Example

NOT 0111
= 1000


AND operation

An AND operation takes two binary representations of equal length and performs the logical AND operation on each pair of corresponding bits. In each pair, the result is 1 if the first bit is 1 AND the second bit is 1. Otherwise, the result is 0.

Example

     0101
AND  0011
= 0001


OR operation

An OR operation takes two bit patterns of equal length, and produces another one of the same length by matching up corresponding bits (the first of each; the second of each; and so on) and performing the logical OR operation on each pair of corresponding bits.

Example

      0101
OR 0011
= 0111



XOR Operation

An exclusive or operation takes two bit patterns of equal length and performs the logical XOR operation on each pair of corresponding bits.

Example


0101
XOR 0011
= 0110


### Symbol Representation

#### Basic Principles

It is important to handle character data. Character data is not just alphabetic characters, but also numeric characters, punctuation, spaces, etc. They need to be represented in binary.

There aren’t mathematical properties for character data, so assigning binary codes for characters is somewhat arbitrary.

ASCII Code Table

ASCII stands for American Standard Code for Information Interchange. The ASCII standard was developed in 1963, permitted machines from different manufacturers to exchange data.

ASCII code table consists of 128 binary values (0 to 127), each associated with a character or command. The non-printing characters are used to control peripherals such as printer.

The extended ASCII character set also consists 128 128 characters representing additional special, mathematical, graphic and foreign characters.

#### Unicode Code Table

There are some problems with the ASCII code table. With ASCII character set, string datatypes allocated one byte per character. But logographic languages such as Chinese, Japanese, and Korean need far more than 256 characters for reasonable representation. Even Vietnamese, a language uses almost Latin letters, need 61 characters for representation. Where can we find numbers for our characters? is it a solution : 2 bytes per character?

Hundreds of different encoding systems were invented. But these encoding systems conflict with one another : two encodings can use the same number for two different characters, or use different numbers for the same character.

The Unicode standard was first published in 1991. With two bytes for each character, it can represent 216-1 different characters.

The Unicode standard has been adopted by such industry leaders as HP, IBM, Microsoft, Oracle, Sun, and many others. It is supported in many operating systems, all modern browsers, and many other products.

The obvious advantages of using Unicode are :

• To offer significant cost savings over the use of legacy character sets.
• To enable a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering.
• To allow data to be transported through many different systems without corruption.

### Representation of Real Numbers

#### Basic Principles

No human system of numeration can give a unique representation to real numbers. If you give the first few decimal places of a real number, you are giving an approximation to it.

Mathematicians may think of one approach : a real number x can be approximated by any number in the range from x – epsilon to x + epsilon. It is fixed-point representation. Fixed-point representations are unsatisfactory for most applications involving real numbers.

Scientists or engineers will probably use scientific notation: a number is expressed as the product of a mantissa and some power of ten.

A system of numeration for real numbers will typically store the same three data — a sign, a mantissa, and an exponent — into an allocated region of storage

The analogues of scientific notation in computer are described as floating-point representations.

In the decimal system, the decimal point indicates the start of negative powers of 10.

If we are using a system in base k (ie the radix is k), the ‘radix point’ serves the same function:

A floating point representation allows a large range of numbers to be represented in a relatively small number of digits by separating the digits used for precision from the digits used for range.

To avoid multiple representations of the same number floating point numbers are usually normalized so that there is only one nonzero digit to the left of the ‘radix’ point, called the leading digit.

A normalized (non-zero) floating-point number will be represented using

where

• s is the sign,
• b is the base (or radix)

Example

If k = 10 (base 10) and p = 3, the number 0·1 is represented as 0.100

If k = 2 (base 2) and p = 24, the decimal number 0·1 cannot be represented exactly but is approximately 110011001100110011001101×24MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaabgdacaGG3cGaaeymaiaaicdacaaIWaGaaeymaiaabgdacaaIWaGaaGimaiaabgdacaqGXaGaaGimaiaaicdacaqGXaGaaeymaiaaicdacaaIWaGaaeymaiaabgdacaaIWaGaaGimaiaabgdacaqGXaGaaGimaiaabgdacqGHxdaTcaaIYaWaaWbaaSqabeaacqGHsislcaaI0aaaaaaa@4CEA@

Formally,

In brief, a normalized representation of a real number consist of

• The range of the number : the number of digits in the exponent (i.e. by emaxMathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadwgadaWgaaWcbaGaciyBaiaacggacaGG4baabeaaaaa@39CC@ ) and the base b to which it is raised
• The precision : the number of digits p in the significand and its base b

#### IEEE 754/85 Standard

There are many ways to represent floating point numbers. In order to improve portability most computers use the IEEE 754 floating point standard.

There are two primary formats:

• 32 bit single precision.
• 64 bit double precision.

Single precision consists of:

• A single sign bit, 0 for positive and 1 for negative;
• An 8 bit base-2 (b = 2) excess-127 exponent, with eminMathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLnhiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr4rNCHbGeaGqiVCI8FfYJH8YrFfeuY=Hhbbf9v8qqaqFr0xc9pk0xbba9q8WqFfeaY=biLkVcLq=JHqpepeea0=as0Fb9pgeaYRXxe9vr0=vr0=vqpWqaaeaabiGaciaacaqabeaadaqaaqaaaOqaaiaadwgadaWgaaWcbaGaciyBaiaacMgacaGGUbaabeaaaaa@39CA@ = –126 (stored as 127(10)126(10)=1=00000001(2) ) and emax = 127 (stored as 127(10)+127(10)=254(10)=11111110(2) ).
• a 23 bit base-2 (k=2) significand, with a hidden bit giving a precision of 24 bits (i.e. 1.d1d2d23 )

Notes

• Single precision has 24 bits precision, equivalent to about 7.2 decimal digits.
• The largest representable non-infinite number is almost 2×21273.402823×1038
• The smallest representable non-zero normalized number is 1×21271.17549×1038
• Denormalized numbers (eg 0.01×2126 ) can be represented.
• There are two zeros, ± 0.
• There are two infinities, ± .
• A NaN (not a number) is used for results from undefined operations

Double precision floating point standard requires a 64 bit word

• The first bit is the sign bit
• The next eleven bits are the exponent bits
• The final 52 bits are the fraction

Range of double numbers : [±2.225×10−308÷±1.7977×10308]