
Axiom 3.1 (Sets are objects). If $A$ is a set, then $A$ is also an object. In particular, given two sets $A$ and $B$, it is meaningful to ask whether $A$ is also an element of $B$.

Definition 3.1.4 (Equality of sets). Two sets $A$ and $B$ are equal, $A = B$, iff every element of $A$ is an element of $B$ and vice versa. To put it another way, $A = B$ if and only if every elemnt $x$ of $A$ belongs also to $B$, and every element $y$ of $B$ belongs also to $A$.

Axiom 3.2 (Empty set). There exists a set $\emptyset$, known as the empty set, which contains no elements, i.e., for every object x we have $x \in \emptyset$.

Lemma 3.1.6 (Single choice). Let $A$ be a non-empty set. Then there exists an object $x$ such that $x \in A$.

2.1 The Peano axioms

Axiom 2.1. $0$ is a natural number.

Axiom 2.2. If $n$ is a natural number, then $n\text{++}$ is also a natural number.

Axiom 2.3. $0$ is not the successor of any natural number; i.e., we have $n \text{++} \ne 0$ for every natural number $n$.

Axiom 2.4. Different natural numbers must have different successors; i.e., if $n$, $m$ are natural numbers and $n \ne m$, then $n\text{++} \ne m\text{++}$. Equivalently, if $n\text{++} = m\text{++}$, then we must have $n = m$.

今天又试着在 Arch Linux 上 build MXNet,居然幸运地成功了,并且在 NetBeans 里导入了 MXNet 的项目。

Arch Linux 的安装,Nvidia 驱动,CUDA 和 cuDNN 的安装已经在上一篇博客中介绍了 (Deep Learning Environment with Arch Linux and PyTorch),这篇就只针对如何 build MXNet 做一个记录。

安装期间基本按照 MXNet 官网的文档 Installing MXNet,有些问题参考了网上帖子和 Arch Linux 的 AUR。

终于组装了一台电脑用于学习深度学习。折腾了几天,大体把 Arch Linux 和 PyTorch 装上了,不过还没有比较完整地验证是否安装正确。在这里大概地记录一下安装步骤,供以后重装系统时参考。


在任何机器学习系统中,数据加载都是很重要的一部分。当我们处理很小的数据集时,我们可以把整个数据集加载到 GPU 的内存中。对于大的数据集,我们必须把训练样本放在主内存。当数据集大到主内存都放不下时,数据加载就变成影响性能的很重要的一点。设计数据加载器时,我们的目标是高效的数据加载和数据准备,并且提供一个干净灵活的接口。


过去十年中,深度学习一直在往更深和更大的网络发展。尽管硬件也在快速地发展,最前沿的深度学习模型一直将 GPU 的内存用到极限。所以,我们总是想找到方法来用尽量少的内存训练更大的模型。这使我们能够训练的更快,使用更大的分批大小,从而实现更高的 GPU 利用率。



我们总是希望深度学习库能跑的更快并且能扩展到更大的数据集。一个很自然的方法是看看我们是否能用更多的硬件(用多个 GPU 并行处理)来解决这个问题。



这篇文章中我们讨论的大部分内容都是来自于 MXNet 的依赖引擎。这个依赖跟踪算法主要是由 Yutian LiMingjie Wang 开发的。



这篇文章中,我们的讨论集中在两个最重要的上层 (high-level) 的设计方案:

  1. 对数学计算,是支持符号还是命令式。
  2. 是适用更大更抽象的运算来构建神经网络,还是用更原子性的操作。


Back-propagation is a key procedure in deep learning. It’s used to calculate the gradients in the layers, which are then used in optimization processes (gradient descent and other derived algorithms). So it’s important to get a solid understanding on how back-propagation works. It really took me a long time to grasp the idea and finally derive the procedure.

What I’m writing here is mostly inspired by Chapter 2 of the online book Neural Network and Deep Learning. Honestly, I just read through Chapter 2, tried to understand it and then re-write the deriviation.

And my next step is to code back-propagation by hand, so that I could be more confident about it.

