Sections

Implementing a perceptron learning algorithm in Python
- Training a perceptron model on the Iris dataset
Adaptive linear neurons and the convergence of learning
- Implementing an adaptive linear neuron in Python
Implementing logistic regression in Python
Classification with scikit-learn
Scoring metrics for classification

什么是感知机分类

最简单形式的前馈神经网络，是一种二元线性分类器, 把矩阵上的输入 $\displaystyle x$ （实数值向量）映射到输出值 $\displaystyle f(x)$ 上（一个二元的值）。

$\displaystyle f(x)={\begin{cases}+1&{\text{if }}w\cdot x+b>0\\-1&{\text{else}}\end{cases}}$

学习算法

我们首先定义一些变量：

$\displaystyle x(j)$ 表示n维输入向量中的第j项
$\displaystyle w(j)$ 表示权重向量的第j项
$\displaystyle f(x)$ 表示神经元接受输入 $\displaystyle x$ 产生的输出
$\displaystyle \alpha$ 是一个常数，符合 $\displaystyle 0<\alpha \leq 1$ （接受率）
更进一步，为了简便我们假定偏置量 $\displaystyle b$ 等于0。因为一个额外的维度 $\displaystyle n+1$ 维，可以用 $\displaystyle x(n+1)=1$ 的形式加到输入向量，这样我们就可以用 $\displaystyle w(n+1)$ 代替偏置量。

感知器的学习通过对所有训练实例进行多次的迭代进行更新的方式来建模。

令 $\displaystyle D_{m}=\{(x_{1},y_{1}),\dots ,(x_{m},y_{m})\}$ 表示一个有 $\displaystyle m$ 个训练实例的训练集。

每次迭代权重向量以如下方式更新：对于每个 $\displaystyle D_{m}=\{(x_{1},y_{1}),\dots ,(x_{m},y_{m})\}$ 中的每个 $\displaystyle (x,y)$ 对， $\displaystyle w(j):=w(j)+{\alpha (y-f(x))}{x(j)}\quad (j=1,\ldots ,n)$

注意这意味着，仅当针对给定训练实例 $\displaystyle (x,y)$ 产生的输出值 $\displaystyle f(x)$ 与预期的输出值 $\displaystyle y$ 不同时，权重向量才会发生改变。

如果存在一个正的常数 $\displaystyle \gamma$ 和权重向量 $\displaystyle w$ ，对所有的 $\displaystyle i$ 满足 $\displaystyle y_{i}\cdot \left(\langle w,x_{i}\rangle +b\right)>\gamma$ ，训练集 $\displaystyle D_{m}$ 就被叫做线性分隔。然而，如果训练集不是线性分隔的，那么这个算法则不能确保会收敛。

	0	1	2	3	4
146	6.7	3	5.2	2.3	virginica
147	6.3	2.5	5	1.9	virginica
148	6.5	3	5.2	2	virginica
149	6.2	3.4	5.4	2.3	virginica
150	5.9	3	5.1	1.8	virginica

`matthews_corrcoef`(y_true, y_pred)	Compute the Matthews correlation coefficient (MCC) for binary classes
`precision_recall_curve`(y_true, probas_pred)	Compute precision-recall pairs for different probability thresholds
`roc_curve`(y_true, y_score[, pos_label, ...])	Compute Receiver operating characteristic (ROC)

`confusion_matrix`(y_true, y_pred[, labels])	Compute confusion matrix to evaluate the accuracy of a classification
`hinge_loss`(y_true, pred_decision[, labels, ...])	Average hinge loss (non-regularized)

`accuracy_score`(y_true, y_pred[, normalize, ...])	Accuracy classification score.
`classification_report`(y_true, y_pred[, ...])	Build a text report showing the main classification metrics
`f1_score`(y_true, y_pred[, labels, ...])	Compute the F1 score, also known as balanced F-score or F-measure
`fbeta_score`(y_true, y_pred, beta[, labels, ...])	Compute the F-beta score
`hamming_loss`(y_true, y_pred[, classes])	Compute the average Hamming loss.
`jaccard_similarity_score`(y_true, y_pred[, ...])	Jaccard similarity coefficient score
`log_loss`(y_true, y_pred[, eps, normalize, ...])	Log loss, aka logistic loss or cross-entropy loss.
`precision_recall_fscore_support`(y_true, y_pred)	Compute precision, recall, F-measure and support for each class
`precision_score`(y_true, y_pred[, labels, ...])	Compute the precision
`recall_score`(y_true, y_pred[, labels, ...])	Compute the recall
`zero_one_loss`(y_true, y_pred[, normalize, ...])	Zero-one classification loss.

`average_precision_score`(y_true, y_score[, ...])	Compute average precision (AP) from prediction scores
`roc_auc_score`(y_true, y_score[, average, ...])	Compute Area Under the Curve (AUC) from prediction scores

Sections

什么是感知机分类

学习算法

Implementing a perceptron learning algorithm in Python

Training a perceptron model on the Iris dataset

Reading-in the Iris data

Plotting the Iris data

Training the perceptron model

A function for plotting decision regions

Adaptive linear neurons and the convergence of learning (Adaline)

Implementing an adaptive linear neuron in Python

Standardizing features and re-training adaline

Large scale machine learning and stochastic gradient descent

Implementing logistic regression in Python

Plot sigmoid function:

Plot cost function:

Implement in Python

Classification with scikit-learn

Loading and preprocessing the data

Other Available Data

Training a perceptron via scikit-learn

Modeling class probabilities via logistic regression

Regularization path

Logistic regression with regularization

其它的分类器简介

Maximum margin classification with support vector machines

Solving non-linear problems using a kernel SVM

K-nearest neighbors - a lazy learning algorithm

Scoring metrics for classification

Classification metrics in Scikit-learn

Reading a confusion matrix

Precision, recall and F-measures

ROC and AUC

Area Under Curve

Log loss

Hinge loss

练习：尝试在信贷数据集中使用正则化方法， 画出系数的变化，以及最终的预测效果

results matching ""

No results matching ""

练习：尝试在信贷数据集中使用正则化方法，画出系数的变化，以及最终的预测效果