DNN Architectures¶
约 336 个字 14 行代码 预计阅读时间 1 分钟
Background: Deep Learning¶
放一个 deep learning 任务的 general formula:
\[
\theta ^{t+1} = f(\theta ^{t},\nabla_{\mathcal{L}}(\theta^{t}, D^{(t)}))
\]
Three most important components
- Data: images, text, audio, tables, etc.
- Model: CNNs, RNNs, Transformers, MoEs, etc.
- Compute: CPU, GPU/TPU/LPU, M1/M2/M3/M4, FPGA, etc.
CNN¶
The design of CNN is inherently good at finding local spatial relationships in images as a result of their focus on small patches of an image.
several intersting applications
- Classification
- Retriebal: retrieving images similar to a query image.
- Detection
- Segementation
- Self-driving
- Synthesis: as in synthesizing and generating visual data.
The magic of CNNs is that if we stack their convolution layers many times, we start to learn more complex features.
- At the very lowest level, we learn edge and color.
- As we stack, we begin to get higher-level features such as texture, shapes, and eventually high-level semantic representations.
top 3 models for CNNs
- AlexNet: breakthrough model for CNNs.
- ResNet: introducing skip connections, which address the vanishing gradient problem.
- U-Net: backbone for stable diffusion.
Important components for CNNs
- Convolutional operations. Especially,
Conv2d
with a \(3\times 3\) filter. - Matrix multiplication. Included because skip connections.
- Softmax: applied at the output layer to predict the label with probabilities.
- Element-wise operations: ReLU, addition, subtraction, pooling, batch normalization.
RNN¶
¶
MLPs: Dense Pattern Processing¶
def mlp_layer_matrix(X, W, b):
# X: input matrix (batch_size * num_inputs)
# W: weight matrix (num_inputs * num_outputs)
# b: bias vector
H = activation(matmul(X, W) + b)
return H
def mlp_layer_compute(X, W, b):
for batch in range(batch_size):
for out in range(num_outputs):
Z[batch, out] = b[out]
for in_ in range(num_inputs):
Z[batch, out] += X[batch, in_] * W[in_, out]
H = activation(Z)
return H
Note
When analyzing how computational patterns impact computer systems, we typically examine three fundamental dimensions
- memory requirements
- computation needs
- data movement
- Memory Requirements
- store and access weights, inputs, and intermediate results.
- Computation Needs
- The core computation revolves around multiply-accumulate operations arranged in nested loops.
- The dense matrix multiplication pattern can be parallelized across multiple processing units, with each handling different subsets of neurons.
- Data Movement
- The all-to-all connectivity pattern in MLPs creates significant data movement requirements.
CNNs: Spatial Pattern Processing¶
Task requirements:
- the ability to detect local patterns.
- the ability to recognize these patterns regardless of their position.
CNNs: each output connects only to a small spatially contiguous region of the input.
最后更新:
2025年3月10日 19:48:32
创建日期: 2025年3月10日 19:48:32
创建日期: 2025年3月10日 19:48:32