Overview of system program design

Lecture 1 关于C

一个函数体内支持多少个参数？31？？
在一次函数调用是支持多少个argument？ 31个
一行最多支持多少个character？80 characters is likely to be inadaquate for programming styles with large indentations and/or verbose variable names.
一个表达式里最多支持多少层内嵌插入语？
What is the maximum value for a long int? 32bits
typedef struct bar{int bar;} bar;
char *foo[] : foo is an array of pointers to char

char (*foo)[]: foo is a pointer to an array of char

Lecture 1++ C features

257 case labels for a switch statement（ANSI C standard）
static keyword
similar C symbols have multiple different meanings
some of the Operators have the wrong precedence
some routines in the standard library have unsafe semantics
- 内存不安全，

Declarations in C

int (* fun())(); // a function returning a pointer to a function int()
int (* foo())[]; // a function returning a pointer to an array of int
int (* foo[])[]; // an array of pointer to functions int()
int foo[][]; // an array of arrays int

key words:
- Qualifier: volatile, const
- type: void, char, signed, unsigned, short, int, long, float, double, struct, union, enum
- Declarator:…

1
2
3

struct date_tag{
  short dd,mm,yy;
}my_birthday, xmas;//my_birthday and xmas are instance of the struct date_dag

struct赋值，会进行拷贝，将内部的值拷贝给当前的结构体，并不会直接传引用
如果struct为const类型，那么内部成员变量全都为const类型

union中所有变量共用同一个地址，且值总是为最后一次赋给的值；

union A{
  int a;
  char c;
  double b;
};
int main(){
  union A a;
  a.a=1;
  a.b = 1.1;//这时，a = b = c = 1.1
}

union的长度为其中成员变量中最长的那部分

typedef & #define

#define peach int
unsigned peach i;
typedef int banana;
//unsinged banana i; wrong statement
//banana unsigned i; wrong
banana c = 12;//correct one

执行时间不同。typedef在编译阶段有效，然而#define为宏定义，发生在预处理阶段，也就是在编译之前，只进行字符替换，不对其进行类型检查
#define没有作用域的限制，只要是定义的宏，之后都可以使用。typedef有自己的scope

对指针的操作，特别注意typedef，不是简单的将字符替换

typedef int * pint;
#define PINT int*
int i = 1;
const pint p1 = &i;// equals pint const p1 = int* const p1
//只能修改指向的变量的内容，不能修改自己的地址 a const pointer to int

const PINT p2 = &i; //equals const int *p2=int const *p2;
//a pinter to const int

想象一个应用场景，返回一个 int（）形式的函数

Lecture 3：A Tour of Computer System

what is information
- Bits + context
program in various forms
how compilation systems work
processors
caches matter
hierarchy storage devices
OS vs. Hardware
communication between systems

Lecture 3 C Programming Model

some misunderstandings for C programmers
- integral promotion
- 内存分布：
- 内存从零开始
- 各类变量占地址位置：

Stack中参数与成员变量的内存位置

参数地址与局部变量的地址非常接近（在参数是传值的情况下）

从高地址到低地址依次为

main函数内的成员变量
被调用函数callee的形参
返回地址
旧的ebp ；新的ebp会指向这里
被调用函数内的成员变量（本地变量）
未初始化的全局变量
初始化的全局变量、静态变量和常量数据

递归函数的内存布局
activation records和stack
- 为每个被调函数创建的内存块就叫一个activation record活动记录
- stack：stack frames；push and pop
- the stack pointer栈指针保存栈结束的地址（栈顶），每一个新的活动记录需要申请的地方
- the frame pointer帧指针保存上一个活动记录结束的地址，换句话说是当前函数返回需要返回的地址
函数被调用是的顺序：
1. call之前将参数压入栈，从右到左
2. 将帧指针压入栈（call操作）
3. 让帧指针指向栈指针（这里的栈指针指向）
4. 压入函数的本地变量
5. 调用结束后使栈指针与帧指针相等
6. 弹出帧指针的值
- 总结：
  - 通过维护栈指针和帧指针，昂贵的动态分配本地变量变得简单，易于组织和快捷
  - 不仅仅本地变量，返回地址（压入的ebp的值）和传入的参数也存储在栈中
  - 所有编译器对于特定类型的硬件，遵循相同的规则。
取指令-解码-执行周期
- fetch memory from the address contained in the program counter
- 翻译 opcode来决定指令的意义
- 取出操作数（operands）执行指令并存储结果
旧的program counter存储再哪里以便从callee函数调用返回后重新使用？
- 在activation record里面都会存放一个frame pointer的备份ebp，里面存储旧的eps值，其地址可以作为基准地址使用。
- 每个函数调用也需要调用它的函数的返回地址（call操作会将地址压入栈中）

函数指针（function pointer）

int (*newvar)(char); //define a function pointer to a int(char)function
int foo(char);
newvar = &foo;
(*newvar)('a'); //use it like a function

Lecture 4 Representation of Data

Bits and bit manipulation

十进制数
二进制数
十六进制数
Alphanumeric Data Expression（字母数字并用的数据表示）
- ASCII中的 Unicode
- UTF-8每次传输8个digits（8bits，1byte）
- utf-7;utf-16,utf-32
- Base64/32/16: why use it? 在网页传输图片时，使用base64的编码使图片加载先于content，减少http请求和降低服务器的负载overload
- Base64使用大写字母集、小写字母集、数字集和{ +，/ }（共64个）
- Base16使用大写字母集W={A,B,C,D,E,F}和数字集，共16个
- 加密（现代加密）对称与非对称：
  - 对称加密：simple， unsafe， ease decrypted
  - 非对称加密

Integers

Numeric representation数值表示方法unsigned and signed
原码、补码、反码…
- 1的补码（补码）的特点：
  - 有+0和-0两种表示方法
  - 利用1的补码执行减法运算时，进位需做加法处理
- 2的补码特点：
  - 0的表示方法只有一种

Floating-Point Numbers

单精度、双精度、扩展双精度
浮点数的一些特殊值(注:指数部分的范围为**[-127,128]**)

指数	尾数	值
正常浮点数		1.f*2^e
e=-127	f != 0	0.f*2^-126
e = -127	f=0	0
e = 128	f != 0	NaN
e = 128	f=0	infinite

如何表示0，指数和维数为0，符号为可以为1或0
三种浮点数格式比较：

structure of floating number

布尔代数
- 位操作必须在两个int操作数之间，不能应用于浮点数 。

Lecture 5 Memory layout and allocation

Several Uses of Memory

static allocation：
static意味着事情发生在运行状态前，即发生在编译或者连接时间，通常变量使一个固定的长度。
总结：
- c语言使用static属性来隐藏变量和函数声明，就像在Java使用private一样。
- 任何一个使用static声明的全局（global）变量或者函数，只能在当前文件中访问。
- 尽可能使用static使一个好的编程实践来保护变量和函数。
使用static的注意事项：
- lift time：为整个程序周期
- 必须使定长
- 不是静态分配内存而是在运行前分配大小（如果知道长度的话）
- 对访问static数据有限制。

Dynamic allocation

为什么使用动态分配？
1. 静态分配时，起名是一个问题
2. 程序在运行前不是总能知道需要的storage大小
3. 静态分配会在整个程序运行期间保存内存大小
4. 动态分配对于递归函数和可重入(reentrant)函数来说使很重要的
Stack allocation
- 栈支持递归和动态分配
- 在不同函数内部的变量，可以拥有相同的名字并且有不同的值（函数之间，变量名可以相同）
- 每个递归函数的实例都有自己的私有变量（同个函数的实例拥有不同的变量值）
- 递归函数可以创建不同数量的函数实例（动态的）
Heap allocation
- 分配内存给能在函数调用外（activation record）存活的变量
- 绝不返回在栈中的本地变量
- malloc、free or new、delete
- 在栈中的变量总是使用指针访问和表示；
常见的bug：忘free、内存泄漏、野指针、c++从不尝试debug这些bug。

details in memory

Heap memory management

Malloc（）
free(ptr)
编写allocator的需求
- 解决请求内存序列的问题
- 对请求立即相应
- 仅使用堆
- 块对齐
- 不修改已分配的块
allocator需要满足的目标：
- 最大化throughput
- 最大化内存使用率（与1矛盾）
碎片Fragmentation
- Internal fragmentation
  - 已分配的块大于需要的payload（由于对齐）造成内部碎片
- external fragmentation
  - 有足够的空余内存，但是每一小块空余内存都不够分配，造成外部碎片
如何实现一个简单的allocator？
决解简单allocator内存利用率低的一些策略
1. 空闲块的组织
  - 任何的allocator需要一些数据结构来保存区别块边界和块状态（使用和free）的信息。
  - one-word header，the payload, padding
2. 如何选取合适的free block放置
  - 请求k字节(bytes)内存时，allocator搜索free list寻找够大的空余块
  - 查找空余块的方法取决于放置策略（first fit，next fit and best fit）
    - first fit：寻找第一个够大的空余块
    - next fit：和第一种方法相似，只不过从上一次分配块的位置开始搜索
    - best fit：检查每一个空余块，选择最合适的（smallest but fit）
  - 一旦寻找到了合适的内存块，需要一个使用策略
    - 一种是使用整个空余块，尽管这个方法简单、快速，但它主要的缺陷是内部碎片问题
    - 如果放置策略能够有一个好的匹配，那么内部碎片是可以接受的。
    - 如果fit不是很好，通常allocator分割空余块，第一部分为allocated block，第二部分为新的free block
  - 如果寻找不到一个合适的空余块（通常为空余块大小无法满足需要）则
    - 合并空余块生成新的较大的free block
    - 如果还是不行，则向系统or内核申请新的heap memory，通过mmap or sbrk函数
3. splitting分割
4. coalescing合并
  - 在free后，free掉的块可能和相邻的空余块引起false fragmentation
  - 很容易与后面的free block合并，但是很难与前面的free block合并，所以需要Boundary tag.
  - Boundary Tag和header一致，有block size 和状态位 a。

memory block

Memory bugs

Lecture 6 Performance Measurement and Improvement

Measurement and Profiling

Hot Spots

what is a hot spot?
- These places where the computer spends most of its time are called hot spots, inner loops and kernels
Amdahl’s Law
- 所有程序都有hot spots
- 一点点effort可以有很大的性能提升
- 如果程序中的一部分可以通过平行处理加速，那么顺序处理的部分是限制和决定加速的一部分
- 知道如何优化是重要的，知道什么时候停止优化也是重要的。
- $T_{New}=(1-\alpha)T_{Old}+(\alpha T_{Old})/k$
- $S=\frac{T_{Old}}{T_{New}}=\frac{1}{(1-\alpha)+\alpha/k}$
- α指的是可以被优化的部分，k指的是优化倍率

Practical Hints

一些在C中昂贵的操作
- printf()
- malloc()
- new()
Fast allocation and Free
- 使用一个大的内存块，一些小的块可以被分配来使用（比小的内存块allocate快很多）
- 使用完成后，整个大内存块free（比小的内存块free快很多）

Lecture 7 Memory Operation & Performance

Memory System

一个内存层次（memory hierarchy）是由不同存储设备组成的层次结构，有各种不同容量、价格、访问时间的存储设备
不同的存储设备：
1. Static Random Access Memory (SRAM)；最贵，最快，用作cache
2. Dynamic Random Access Memory (DRAM)；用作主存，frame buffer
3. Magnetic disks；可以存储远远大于RAM的数据，但是访问速度差距非常大
4. Magnetic tapes
5. Optical disks
利用不同内存技术的优点来找到速度与容量之间的tradeoff叫做memory hierarchy.

Caches

Locality of Reference(引用的局部性)
- 接下来需要访问的内存的数据很有可能是刚刚访问过的数据（访问的数据在接下来很短的未来很有可能被再次访问）
- Spatial locality: 需要同时访问的数据在内存中分配的地址很近
  
  Reference to address that are near to each other occur together in time
- 一个地址可能在一段时间内频繁访问
  
  Reference to a single address occur close together in time
Hit ratio
- the percentage of accesses for which the prediction is successful
cache层次设计
- register
- disk
- cache & main memory