Le perceptron multicouche avec scikit-learn

Vérification de la version de la bibliothèque scikit-learn

Attention: le Perceptron Multicouche n'est implémenté dans scikit-learn que depuis la version 0.18 (septembre 2016).

Le code source de cette implémentation est disponible sur github. Le long fil de discussion qui précédé l'intégration de cette implémentation est disponible sur la page suivante: issue #3204.

In [1]:
import sklearn

# version >= 0.18 is required
version = [int(num) for num in sklearn.__version__.split('.')]
assert (version[0] >= 1) or (version[1] >= 18)

Classification

Premier exemple

In [2]:
from sklearn.neural_network import MLPClassifier
In [3]:
X = [[0., 0.], [1., 1.]]
y = [0, 1]

clf = MLPClassifier(solver='lbfgs',
                    alpha=1e-5,
                    hidden_layer_sizes=(5, 2),
                    random_state=1)

clf.fit(X, y)
Out[3]:
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

Une fois le réseau de neurones entrainé, on peut tester de nouveaux exemples:

In [4]:
clf.predict([[2., 2.], [-1., -2.]])
Out[4]:
array([1, 0])

clf.coefs_ contient les poids du réseau de neurones (une liste d'array):

In [5]:
clf.coefs_
Out[5]:
[array([[-0.14196276, -0.02104562, -0.85522848, -3.51355396, -0.60434709],
        [-0.69744683, -0.9347486 , -0.26422217, -3.35199017,  0.06640954]]),
 array([[ 0.29164405, -0.14147894],
        [ 2.39665167, -0.6152434 ],
        [-0.51650256,  0.51452834],
        [ 4.0186541 , -0.31920293],
        [ 0.32903482,  0.64394475]]),
 array([[-4.53025854],
        [-0.86285329]])]
In [6]:
[coef.shape for coef in clf.coefs_]
Out[6]:
[(2, 5), (5, 2), (2, 1)]

Vector of probability estimates $P(y|x)$ per sample $x$:

In [7]:
clf.predict_proba([[2., 2.], [-1., -2.]])
Out[7]:
array([[  1.96718015e-004,   9.99803282e-001],
       [  1.00000000e+000,   4.67017947e-144]])

Régression

Premier exemple

In [8]:
from sklearn.neural_network import MLPRegressor
In [9]:
X = [[0., 0.], [1., 1.]]
y = [0, 1]

reg = MLPRegressor(solver='lbfgs',
                   alpha=1e-5,
                   hidden_layer_sizes=(5, 2),
                   random_state=1)

reg.fit(X, y)
Out[9]:
MLPRegressor(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

Une fois le réseau de neurones entrainé, on peut tester de nouveaux exemples:

In [10]:
reg.predict([[2., 2.], [-1., -2.]])
Out[10]:
array([ 0.99992939, -1.74426841])

clf.coefs_ contient les poids du réseau de neurones (une liste d'array):

In [11]:
reg.coefs_
Out[11]:
[array([[-0.15363893,  0.34727085, -0.92556938, -0.58865324, -0.65405348],
        [-0.75481049, -0.64158244, -0.28595394, -0.41380114,  0.0718716 ]]),
 array([[ 0.31563122, -0.15311531],
        [ 0.17586972, -0.66584599],
        [-0.5589839 ,  0.55684731],
        [ 0.98388591, -0.34545676],
        [ 0.35609731,  0.69690797]]),
 array([[-1.15931301],
        [-0.93382131]])]
In [12]:
[coef.shape for coef in reg.coefs_]
Out[12]:
[(2, 5), (5, 2), (2, 1)]

Régularisation

In [13]:
# TODO...

Normalisation des données d'entrée

Itérer manuellement

Itérer manuellement la boucle d'apprentissage peut être pratique pour suivre son évolution ou pour l'orienter.

Voici un exemple où on suit l'évolution des poids du réseau sur 10 itérations:

In [14]:
X = [[0., 0.], [1., 1.]]
y = [0, 1]

clf = MLPClassifier(hidden_layer_sizes=(15,),
                    random_state=1,
                    max_iter=1,        # <- !
                    warm_start=True)   # <- !

for i in range(10):
    clf.fit(X, y)
    print(clf.coefs_)
[array([[-0.09765283,  0.26278451, -0.59495262, -0.23389012, -0.41873139,
        -0.48338682, -0.37177842, -0.18253452, -0.12170755,  0.04512124,
        -0.09701122,  0.22107355, -0.35018075,  0.45027045, -0.56055835],
       [ 0.20157586, -0.09725654,  0.06873391, -0.42629857, -0.35772621,
         0.35635519,  0.55737765, -0.22071285,  0.22754025,  0.44621704,
         0.46786258, -0.49204096, -0.54669594, -0.39129993,  0.44831413]]), array([[-0.0629073 ],
       [ 0.50142523],
       [-0.25377001],
       [-0.260921  ],
       [-0.4541206 ],
       [-0.58766348],
       [ 0.2200279 ],
       [-0.35418198],
       [-0.28616708],
       [-0.00932073],
       [-0.54601693],
       [ 0.08977517],
       [-0.43168177],
       [ 0.1103765 ],
       [ 0.24367823]])]
[array([[-0.09671366,  0.2637845 , -0.59595262, -0.23291645, -0.41774627,
        -0.48239973, -0.37077843, -0.18156801, -0.12075695,  0.04412134,
        -0.09801122,  0.22207354, -0.34919849,  0.45127044, -0.55956951],
       [ 0.20060628, -0.09625654,  0.06773391, -0.42531319, -0.35674358,
         0.35537262,  0.55837765, -0.2197407 ,  0.22656729,  0.44521714,
         0.46686259, -0.49104097, -0.54570738, -0.39029994,  0.44732804]]), array([[-0.06199866],
       [ 0.50242522],
       [-0.25477   ],
       [-0.26192094],
       [-0.45512059],
       [-0.58667413],
       [ 0.2210279 ],
       [-0.35518197],
       [-0.2851887 ],
       [-0.00832073],
       [-0.54501693],
       [ 0.08877518],
       [-0.43069621],
       [ 0.11137649],
       [ 0.24270352]])]
[array([[-0.09577504,  0.2647845 , -0.59695262, -0.23194289, -0.41676118,
        -0.48141267, -0.36977843, -0.18060167, -0.11980672,  0.04312145,
        -0.09901122,  0.22307353, -0.34821628,  0.45227044, -0.55858069],
       [ 0.19963685, -0.09525654,  0.06673392, -0.42432784, -0.355761  ,
         0.35439011,  0.55937765, -0.21876868,  0.22559445,  0.44421724,
         0.46586259, -0.49004098, -0.54471884, -0.38929994,  0.44634198]]), array([[-0.06109123],
       [ 0.50342521],
       [-0.25577   ],
       [-0.26292088],
       [-0.45612058],
       [-0.5856848 ],
       [ 0.22202789],
       [-0.35618197],
       [-0.2842104 ],
       [-0.00732073],
       [-0.54401694],
       [ 0.08777519],
       [-0.42971068],
       [ 0.11237649],
       [ 0.24172892]])]
[array([[-0.09483698,  0.2657845 , -0.59795261, -0.23096943, -0.41577613,
        -0.48042564, -0.36877843, -0.18160166, -0.11885686,  0.04212157,
        -0.10001122,  0.22407352, -0.34723411,  0.45327043, -0.55759188],
       [ 0.19866756, -0.09425654,  0.06573392, -0.42334253, -0.35477847,
         0.35340764,  0.56037764, -0.21976868,  0.22462172,  0.44321737,
         0.46486259, -0.48904099, -0.54373032, -0.38829995,  0.44535595]]), array([[-0.06018504],
       [ 0.5044252 ],
       [-0.25676999],
       [-0.26392082],
       [-0.45712057],
       [-0.58469548],
       [ 0.22302789],
       [-0.35718196],
       [-0.28323216],
       [-0.00632074],
       [-0.54301694],
       [ 0.0867752 ],
       [-0.42872519],
       [ 0.11337649],
       [ 0.24075442]])]
[array([[-0.0938995 ,  0.2667845 , -0.59895261, -0.22999608, -0.41479111,
        -0.47943863, -0.36777844, -0.18063532, -0.11790738,  0.04112171,
        -0.10101122,  0.22507351, -0.346252  ,  0.45427042, -0.5566031 ],
       [ 0.19769841, -0.09325655,  0.06473393, -0.42235725, -0.35379598,
         0.35242523,  0.56137764, -0.21879665,  0.22364911,  0.44221751,
         0.46386259, -0.488041  , -0.54274181, -0.38729996,  0.44436995]]), array([[-0.05928013],
       [ 0.50542519],
       [-0.25776999],
       [-0.26492076],
       [-0.45812055],
       [-0.58370618],
       [ 0.22402789],
       [-0.35818195],
       [-0.28225401],
       [-0.00532074],
       [-0.54201694],
       [ 0.08577521],
       [-0.42773973],
       [ 0.11437648],
       [ 0.23978002]])]
[array([[-0.09296261,  0.2677845 , -0.59995261, -0.22902285, -0.41380613,
        -0.47845165, -0.36677844, -0.18163532, -0.11695829,  0.04012189,
        -0.10201121,  0.2260735 , -0.34526994,  0.45527041, -0.55561433],
       [ 0.19672941, -0.09225655,  0.06373393, -0.421372  , -0.35281354,
         0.35144285,  0.56237763, -0.21979665,  0.22267661,  0.44121768,
         0.46286259, -0.48704101, -0.54175333, -0.38629997,  0.44338399]]), array([[-0.05837654],
       [ 0.50642518],
       [-0.25876998],
       [-0.26592071],
       [-0.45912054],
       [-0.5827169 ],
       [ 0.22502789],
       [-0.35918195],
       [-0.28127592],
       [-0.00432074],
       [-0.54101694],
       [ 0.08477523],
       [-0.4267543 ],
       [ 0.11537648],
       [ 0.23880571]])]
[array([[-0.09202631,  0.26878449, -0.6009526 , -0.22804972, -0.41282119,
        -0.4774647 , -0.36577845, -0.18066896, -0.11600959,  0.0391221 ,
        -0.10301121,  0.22707349, -0.34428793,  0.4562704 , -0.55462559],
       [ 0.19576055, -0.09125655,  0.06273393, -0.42038679, -0.35183116,
         0.35046053,  0.56337763, -0.21882462,  0.22170423,  0.44021789,
         0.46186259, -0.48604102, -0.54076487, -0.38529998,  0.44239805]]), array([[-0.05747429],
       [ 0.50742517],
       [-0.25976998],
       [-0.26692065],
       [-0.46012053],
       [-0.58172764],
       [ 0.22602788],
       [-0.36018194],
       [-0.28029791],
       [-0.00332074],
       [-0.54001694],
       [ 0.08377524],
       [-0.4257689 ],
       [ 0.11637648],
       [ 0.23783152]])]
[array([[-0.09109061,  0.26978449, -0.59996302, -0.22707671, -0.41183628,
        -0.47647777, -0.36477845, -0.18166896, -0.11506129,  0.03812237,
        -0.10401121,  0.22807348, -0.34330597,  0.4572704 , -0.55363686],
       [ 0.19479185, -0.09025655,  0.06182551, -0.41940161, -0.35084881,
         0.34947826,  0.56437763, -0.21982462,  0.22073196,  0.43921816,
         0.4608626 , -0.48504103, -0.53977643, -0.38429998,  0.44141214]]), array([[-0.05657342],
       [ 0.50842516],
       [-0.26076997],
       [-0.2679206 ],
       [-0.46112052],
       [-0.58073839],
       [ 0.22702788],
       [-0.36118194],
       [-0.27931998],
       [-0.00232074],
       [-0.53901694],
       [ 0.08277525],
       [-0.42478354],
       [ 0.11737647],
       [ 0.23685742]])]
[array([[-0.09015554,  0.27078449, -0.59897345, -0.2261038 , -0.4108514 ,
        -0.47549087, -0.36377845, -0.1807026 , -0.1141134 ,  0.03712277,
        -0.10501121,  0.22907347, -0.34232406,  0.45827039, -0.55264816],
       [ 0.1938233 , -0.08925655,  0.06091832, -0.41841647, -0.34986652,
         0.34849603,  0.56537762, -0.21885258,  0.21975982,  0.43821855,
         0.4598626 , -0.48404104, -0.53878801, -0.38329999,  0.44042627]]), array([[-0.05567397],
       [ 0.50942515],
       [-0.26176997],
       [-0.26892054],
       [-0.46212051],
       [-0.57974916],
       [ 0.22802788],
       [-0.36218193],
       [-0.27834212],
       [-0.00132074],
       [-0.53801694],
       [ 0.08177527],
       [-0.42379821],
       [ 0.11837647],
       [ 0.23588343]])]
[array([[-0.08922109,  0.27178449, -0.59997345, -0.22513101, -0.40986656,
        -0.474504  , -0.36277846, -0.1817026 , -0.11316591,  0.03612347,
        -0.10601121,  0.23007345, -0.3413422 ,  0.45927038, -0.55165947],
       [ 0.1928549 , -0.08825655,  0.05991832, -0.41743136, -0.34888428,
         0.34751386,  0.56637762, -0.21985258,  0.21878779,  0.43721921,
         0.4588626 , -0.48304105, -0.53779962, -0.3823    ,  0.43944043]]), array([[ -5.47759839e-02],
       [  5.10425136e-01],
       [ -2.62769965e-01],
       [ -2.69920493e-01],
       [ -4.63120501e-01],
       [ -5.78759956e-01],
       [  2.29027875e-01],
       [ -3.63181923e-01],
       [ -2.77364338e-01],
       [ -3.20743583e-04],
       [ -5.37016945e-01],
       [  8.07752814e-02],
       [ -4.22812913e-01],
       [  1.19376466e-01],
       [  2.34909538e-01]])]
/Users/jdecock/anaconda/lib/python3.5/site-packages/sklearn/neural_network/multilayer_perceptron.py:563: ConvergenceWarning: Stochastic Optimizer: Maximum iterations reached and the optimization hasn't converged yet.
  % (), ConvergenceWarning)

TODO: Quelle différence avec le mode d'apprentissage online (boucle ouverte ?) fit.partial_fit() ???