Data types

Bit and byte

Bit (binary digit) is the smallest unit of information in a computer.

  • It is either 0 or 1.
  • Bits are used to represent the most basic form of data, such as the on/off state of an electrical signal in a computer’s hardware.

Byte is a unit of data that is composed of 8 bits.

  • Each byte can represent 256 different values (2^8).
  • Bytes are used to represent a wide range of data types, characters, and symbols.
  • Bytes are also the basic storage unit for files, memory, and network communication.

Integer types

  • int is the most frequently used integer type, normally 4 bytes.
  • unsigned int is an unsigned integer type, can represent non-negative numbers from 0 to 2^32 - 1.

To initialize an integer variable, the traditional ways

int a; // not recommended
a = 0; // you should never forget to initialize a variable

int b = 0; // init b with 0
int c(0); // init c with 0

Since C++11, you can uniform initialization for all variables:

int a {}; // init a with 0, also called zero initialization
int b {0}; // init b with 0

Signed and unsigned

  • Both unsigned int and signed int are 4 bytes, can represent 2^32 numbers.
    • unsigned int can represent 2^32 numbers, from 0 to 2^32 - 1.
    • signed int can represent from -2^31 to 2^31 - 1.
    • The first bit is used to represent the sign of the number.

Overflow

Here’s an example of overflow:

#include <iostream>

int main() {
    int a = 56789;
    int result = a * a;
    std::cout << "Result: " << result << std::endl;
    return 0;
}

The answer should be 3224990521, however the result is -1069976775.

56789 * 56789 > 2^31 - 1, so it overflows.

More integer types

  • short int for shorter integers
  • long int for longer integers
  • long long for even longer integers

C++ has standard defined minimum and maximum size for each type, the actual size can be different for different compilers and systems. See here.

Microsoft official documentation for the size of types,

sizeof operator

sizeof is an operator that returns the size of a type or a variable in bytes.

  • Not a function, can be used with types and variables.
  • The sizeof operator is evaluated at compile time.
int i = 0;
short s = 0;
cout << "sizeof(int)=" << sizeof(int) << endl;
cout << "sizeof(i)=" << sizeof(i) << endl;
cout << "sizeof(short)=" << sizeof(s) << endl;
cout << "sizeof(long)=" << sizeof(long) << endl;
cout << "sizeof(size_t)=" << sizeof(size_t) << endl;

Character types

  • char is a single byte, which can represent a character. It’s an 8-bit integer indeed.
    • signed char is a signed 8-bit integer.
    • unsigned char is an unsigned 8-bit integer.
    • char is either signed or unsigned (depending on the compiler and system).

How to represent a character?

char is an 8-bit integer, so it can represent 2^8 = 256 different characters. The ASCII table is a list of characters that are assigned to the numbers from 0 to 255. See here.

char c1 = 'C'; // C
char c2 = 80; // decimal
char c3 = 0x50; // hexadecimal

bool

A Boolean type that can have one of two values: true or false.

  • true is represented by 1, false is represented by 0.
  • bool is actually an int type under the hood.
    • Boolean width: 8 instead of 1. (1 Byte)
  • Any non-zero value is considered true, zero is considered false.
bool b1 = true;
int i = b1; // value of b1 is 1

bool b2 = -256; // value of b2 is 1, not recommended
bool b = (-256 != 0); // better choice

bool b3 = 0; // value of b3 is 0

Choose appropriate integer types

The resolution of the image is 4226 x 2847, char is widely used for pixel values. See RGB model

The final memory usage is 4226 * 2847 * 3 bytes, which is about 37 MB.

Byte

Since c++17, std::byte is defined in the <cstddef> header.

  • Using char makes you think you are dealing with characters
  • Using std::byte makes you think you are dealing with bytes

Fixed width integer types

C++11 introduces fixed width integer types, which are guaranteed to be the same size across different compilers and systems. See here.

Types:

  • int8_t for 8-bit integer
  • int16_t for 16-bit integer
  • int32_t for 32-bit integer
  • int64_t for 64-bit integer
  • uint8_t for 8-bit unsigned

Some useful macros:

  • INT8_MIN, INT8_MAX
  • INT16_MIN, INT16_MAX
  • INT32_MIN, INT32_MAX
  • INT64_MIN, INT64_MAX
  • UINT8_MIN, UINT8_MAX

size_t

Computer memory keeps increasing

  • 32-bit int was enough in the past to for data length
  • But now it is not.

size_t is an unsigned integer type

  • size_t is defined in the <cstddef> header.
  • size_t is usually 32-bit or 64-bit, depending on the system.

Floating point types

  • float for single precision floating point numbers, 32 bits
  • double for double precision floating point numbers, 64 bits
  • long double for extended precision floating point numbers, 80 bits

c++23 introduces fixed width floating point types, which are guaranteed to be the same size across different compilers and systems. (Support for these types is not widespread yet.)

  • float16_t for 16-bit floating point numbers
  • float32_t for 32-bit floating point numbers

Range and accuracy

An example

// float.cpp
#include <iostream>
#include <iomanip>

using namespace std;

int main(){
    float f1 = 1.2f;
    float f2 = f1 * 1000000000000000; // 1.2e15
    cout << std::fixed << std::setprecision(15) << f1 << endl;
    cout << std::fixed << std::setprecision(1) << f2 << endl;
    return 0;
}
  • How many numbers in range [0, 1]?
  • How many numbers can 32 bits represent?
  • You want 1.2, but float can only represent 1.200000047683716.

float point representation

float bit layout

The value of float is calculated as follows:

\[ (-1)^{b_{31}} \times 2^{\left(b_{30} b_{29} \ldots b_{23}\right)_2-127} \times\left(1 . b_{22} b_{21} \ldots b_0\right)_2 \]

  • float: 1 sign bit, 8 exponent bits, 23 fraction bits, see here.
  • double: 1 sign bit, 11 exponent bits, 52 fraction bits, see here.
  • long double: 1 sign bit, 15 exponent bits, 64 fraction bits, see here.

Floating-point VS integer

  • Represent values between integers
  • A much greater range of values
  • Floating-point operations are slower than integer operations
    • double is even slower than float
  • Lose precision

Precision

  • Will f2 be greater than f1?

    // precision.cpp
    float f1 = 23400000000;
    float f2 = f1 + 10; // but f2 = f1

  • Can we use == operator to compare two floating point numbers?

    if (f1 == f2) //bad
    if (fabs(f1 - f2) < numeric_limits<float>::epsilon()) // good

    cppreference link for numeric_limits:

  • numeric_limits<float>::epsilon() is the machine epsilon for float.

  • You can change float to double to get the machine epsilon for double.

static_cast

static_cast is a cast operator that converts a value from one type to another type.

double d = static_cast<double>(f1) + 10;
  • when use f1+10 without static_cast, it will first do a float addition, then implicitly convert the result to double.
  • static_cast forces f1 to be converted to double before the addition such that the result is a double.
  • static_cast is also used for other types, like int to char, float to int etc.
  • It’s a compile-time cast.

Literal

A literal is a program element that directly represents a value.

Integer literal

  • Decimal: 123
  • Octal: 0123
  • Hexadecimal: 0x123
  • Binary: 0b1010

  • unsigned: 123u or 123U
  • long: 123l or 123L
  • long long: 123ll or 123LL

Floating point literal

  • float: 1.23f or 1.23F
  • double: 1.23
  • long double: 1.23l or 1.23L
  • Exponential: 1.23e-4 or 1.23E-4, which is \(1.23 \times 10^{-4}\).

inf and nan

IEEE 754 floating point numbers can represent positive or negative infinity, and NaN (not a number).

  • ±inf: infinity (Exponent=11111111, fraction=0)
  • nan: not a number (Exponent=11111111, fraction!=0)

  • \(1/0 =\) inf
  • \(\log(0) =\) -inf
  • \(\sqrt{-1} =\) nan

Arithmetic operators

Operator Description Example
+ Addition a + b
- Subtraction a - b
* Multiplication a * b
/ Division a / b
% Modulus (remainder) a % b
++ Increment a++ or ++a
-- Decrement a-- or --a
+ Unary plus +a
- Unary minus -a

Operator precedence:

  1. a++, a--
  2. ++a, --a
  3. +a, -a
  4. *, /, %
  5. +, -
  • You can refer to this table.
  • If you are not sure about the precedence, use parentheses!

For more details, see cppreference.

Assignment operators

Operator Description Example
= Assignment a = b
+= Addition assignment a += b
-= Subtraction assignment a -= b
*= Multiplication assignment a *= b
/= Division assignment a /= b
%= Modulus assignment (remainder) a %= b

Increment and decrement operators:

  • a++ and a-- are post-increment and post-decrement operators.
  • ++a and --a are pre-increment and pre-decrement operators.
// not recommended
int a = 0;
int b = a++; // b = 0, a = 1
int c = ++a; // c = 2, a = 2
// recommended
int a = 0;
int b = a;
a++; // or a = a + 1;
...

Implicit type conversion

Implicit type conversion is done by the compiler, without programmer’s explicit permission.

  • Widening conversion:
    • Any integer type except long long to double.
    • bool and char to any other built-in type.
    • short to int, int to long, long to long long.
    • float to double.
  • Narrowing conversion:
    • Any floating point type to integer type.
    • Any integer type to bool.

Example:

// Implicit type conversion examples

// Promotion (widening conversion)
int i = 42;
double d = i;  // int promoted to double
std::cout << "Promotion: int to double - " << d << std::endl;

char c = 'A';
int ascii = c;  // char promoted to int
std::cout << "Promotion: char to int - " << ascii << std::endl;

// Coercion (narrowing conversion)
double pi = 3.14159;
int rounded = pi;  // double coerced to int, fractional part lost
std::cout << "Coercion: double to int - " << rounded << std::endl;

int large = 1000;
char narrowed = large;  // int coerced to char, possible data loss
std::cout << "Coercion: int to char - " << static_cast<int>(narrowed) << std::endl;

// Mixed-type arithmetic (promotion occurs)
int num = 5;
double result = num / 2;  // int promoted to double after division
std::cout << "Mixed-type arithmetic: int/int as double - " << result << std::endl;

Signed - unsigned conversions

  • A signed integer type and its unsigned integer always have the same bit width.
  • When a signed - unsigned conversion happens,
    • the bit pattern is the same,
    • but the interpretation is different.
using namespace std;
unsigned short num = UINT16_MAX;
short num2 = num;
cout << "unsigned val = " << num << " signed val = " << num2 << endl;
// Prints: "unsigned val = 65535 signed val = -1"

// Go the other way.
num2 = -1;
num = num2;
cout << "unsigned val = " << num << " signed val = " << num2 << endl;
// Prints: "unsigned val = 65535 signed val = -1"

Explicit type conversion

Explicit type conversion is done by the programmer, using the cast operator.

C-style cast:

(int) x; // old-style cast, old-style syntax
int(x); // old-style cast, functional syntax

The c-style cast is not recommended,

  • It’s not type-safe.
  • It’s not clear what the cast is doing.

C++11 introduces static_cast to cast between types.

  • Syntax: static_cast<new_type>(expression)
  • It’s a compile-time cast.
  • It returns an error if the cast is not possible.
double d = 1.58947;
int i = d;  // warning C4244 possible loss of data
int j = static_cast<int>(d);       // No warning.
string s = static_cast<string>(d); // cannot convert from
                                   // double to std:string

const type qualifier

  • const is a type qualifier that specifies that the value of the variable cannot be changed.
    • The variable must be initialized when declared.
    • const can be applied to variables, parameters, and return types.
const int a = 10;
int b = a; // ok
a = 20; // error

const_cast

  • const_cast<type>(expression)
    • remove the const qualifier from a variable,
    • add const qualifier to a non-const variable.

Arithmetic conversions

Many binary operators cause implicit type conversion. Here’re conditions:

  1. If either operand is of type long double, the other operand is converted to long double.
  2. Otherwise, if either operand is of type double, the other operand is converted to double.
  3. Otherwise, if either operand is of type float, the other operand is converted to float.
  4. Otherwise, if either operand is of type unsigned long/long/unsigned int, the other operand is converted to the most precise type that can hold both operands.
  5. Otherwise, both operands are of the type with lower precision. (On MSVC, it may be different.)

Example

  • Case 1: Addition

    int a = 10;
    double b = 20.5;
    double c = a + b; // a is converted to double
  • Case 2: Integer division

    int total = 7;
    int count = 2;
    // 'total' and 'count' are integers, integer division would occur
    double average1 = total / count; // 3, wrong result
    
    // To get a floating-point result, promote one operand to 'double'
    double average2 = total / static_cast<double>(count); // 3.5, correct result
  • Case 3: Compound assignment

    int i = 10;
    double d = 3.5;
    // 'i' is converted to 'double' before addition, 
    // then the result is assigned back to 'i' after truncation
    i += d;

I’m lost in the type conversions

1


  • In arithmetic operations, compiler will use the higher-precision type as the result type.
  • In assignment, compiler will convert the right-hand side to the type of the left-hand side.

auto keyword

auto keyword is used to let the compiler deduce the type of the variable from the initializer.

auto a = 10; // a is int
auto b = 10.5; // b is double
auto c = 'a'; // c is char
auto d = "hello"; // d is const char*

Be careful:

auto a = 2; // a is int
a = 2.5; // a is still int and 2.5 is truncated to 2

Summary

  • C++ is a statically typed language, every variable has a type and its type is determined at compile time.

  • After declaration, you can just use the variable, no need to write its type again.

    // eg
    int a = 10;
    a = a + 10; // no need to write int before a and int before 10
  • C++ supposes the programmer is responsible for the correctness of the types.

  • You should always be aware of the types you are using.

Footnotes

  1. from scaler.com↩︎