{"id":1004,"date":"2017-08-01T17:14:29","date_gmt":"2017-08-01T17:14:29","guid":{"rendered":"http:\/\/blog.tiran.info\/?p=1004"},"modified":"2017-08-01T17:14:29","modified_gmt":"2017-08-01T17:14:29","slug":"annore-2","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=1004","title":{"rendered":"ANN\/ORE #2 &#8211; Construction du r\u00e9seau de neurones avec nnet"},"content":{"rendered":"<p style=\"text-align: justify;\">Une fois les donn\u00e9es du <a href=\"http:\/\/blog.tiran.info\/annore-1\">dataset MNIST charg\u00e9es en base<\/a>, il est possible de les utiliser pour construire un ANN. Le probl\u00e8me est que le volume de ces derni\u00e8res combin\u00e9 aux besoins de calcul de la construction du mod\u00e8le exclut l&rsquo;utilisation de mon PC de bureau.<\/p>\n<p style=\"text-align: justify;\">Je vais donc d\u00e9localiser le traitement sur le serveur de base de donn\u00e9es dont les specs sont nettement plus avantageuses:\u00a0HP ProLiant BL460c en 2s 24c 48t (processeur Xeon E5-2695v2 \u00e0 2.40GHz) avec 256GB de RAM.<\/p>\n<p style=\"text-align: justify;\">Pour cela, je vais utiliser ORE afin de construire un ANN dans une session R instanci\u00e9e sur le serveur de base de donn\u00e9es (via <a href=\"https:\/\/docs.oracle.com\/cd\/E83411_01\/OREAD\/configuring-extproc-for-embedded-R-execution.htm#OREAD126\" target=\"_blank\" rel=\"noopener\">extproc<\/a>).<\/p>\n<p style=\"text-align: justify;\">A l&rsquo;instar des billets pr\u00e9c\u00e9dents, je vais utiliser le package standard <a href=\"https:\/\/cran.r-project.org\/web\/packages\/nnet\/index.html\" target=\"_blank\" rel=\"noopener\">nnet<\/a> pour construire le r\u00e9seau de neurones. Ce package est disponible en standard avec la distribution R.<\/p>\n<p style=\"text-align: justify;\">En revanche, pour mesurer la dur\u00e9e des traitements, je vais utiliser le package <a href=\"https:\/\/cran.r-project.org\/web\/packages\/tictoc\/index.html\" target=\"_blank\" rel=\"noopener\">tictoc<\/a>. Ce dernier doit donc au pr\u00e9alable \u00eatre charg\u00e9 sur le serveur:<\/p>\n<pre class=\"brush: js; ruler: true;\"> \noracle@psu888: \/home\/oracle [HODBA04D1_1]# ORE CMD INSTALL \/tmp\/tictoc_1.0.tar.gz\n* installing to library \u2018\/soft\/oracle\/product\/rdbms\/12.2.0.1\/R\/library\u2019\n* installing *source* package \u2018tictoc\u2019 ...\n** package \u2018tictoc\u2019 successfully unpacked and MD5 sums checked\n** R\n** inst\n** preparing package for lazy loading\n** help\n*** installing help indices\n  converting help for package \u2018tictoc\u2019\n    finding HTML links ... done\n    Stack                                   html\n    tic                                     html\n    tictoc                                  html\n** building package indices\n** testing if installed package can be loaded\n* DONE (tictoc)\noracle@psu888: \/home\/oracle [HODBA04D1_1]#\n<\/pre>\n<p style=\"text-align: justify;\">Dans l&rsquo;absolu, je pourrai travailler directement sur le serveur de base de donn\u00e9es mais je trouve plus confortable d&rsquo;utiliser RStudio depuis mon PC et donc d&rsquo;invoquer les commandes via les API ORE.<\/p>\n<pre class=\"brush: js; ruler: true;\"> \n&gt; ore.connect(user=&quot;c##rafa&quot;, password=&quot;Password1#&quot;, conn_string=&quot;\/\/clorai2-scan:1521\/pdb_hodba08&quot;)\n&gt; \n&gt; ore.doEval(function() {\n+   library(ORE)\n+   library(nnet)\n+   library(tictoc)\n+   set.seed(7777)\n+   ore.sync(table = &quot;MNIST_TRAINING_SET&quot;)\n+   mnist_train &lt;- ore.pull(ore.get(&quot;MNIST_TRAINING_SET&quot;))\n+   tic()\n+   nn_nnet &lt;- nnet(IMG_LBL ~ . - IMG_ID, data=mnist_train, size=100, maxit=100, MaxNWts=80000)\n+   exectime &lt;- toc()\n+   exectime &lt;- exectime$toc - exectime$tic \n+   ore.save(list=c(&quot;nn_nnet&quot;),name=&quot;DS NeuralNet&quot;, append = TRUE) \n+   print(paste(&quot; -&gt; Duree :&quot;, exectime, &quot;secondes&quot;))}, ore.connect = TRUE)\n[1] &quot; -&gt; Duree : 49865.035 secondes&quot;\n&gt;\n<\/pre>\n<p style=\"text-align: justify;\">Ci-dessus j&rsquo;utilise <a href=\"https:\/\/docs.oracle.com\/cd\/E67822_01\/OREUG\/GUID-E04D5025-049F-4AF9-8DAF-40F3874789E8.htm#OREUG505\" target=\"_blank\" rel=\"noopener\">ore.doEval<\/a> pour lancer la construction d&rsquo;un ANN avec un hidden layer de 100 neurones sur le serveur Oracle. Une fois le mod\u00e8le cr\u00e9\u00e9, je le sauvegarde dans un datastore ORE via <a href=\"http:\/\/docs.oracle.com\/cd\/E67822_01\/OREUG\/GUID-C157C764-A20D-4CCC-A34B-19F83743A388.htm#OREUG268\" target=\"_blank\" rel=\"noopener\">ore.save<\/a>. Le champ IMG_LBL \u00e9tant convertit sous la forme de facteur lors de sa r\u00e9cup\u00e9ration, la commande nnet construit un ANN multiclasse (via un <a href=\"https:\/\/fr.wikipedia.org\/wiki\/Encodage_one-hot\" target=\"_blank\" rel=\"noopener\">encodage one-hot<\/a> implicite):<\/p>\n<p style=\"text-align: justify;\">On peut voir que la cr\u00e9ation du mod\u00e8le a dur\u00e9 extr\u00eamement longtemps &#8211; <strong>quasiment 14 heures!<\/strong><\/p>\n<p style=\"text-align: justify;\">Le taille du mod\u00e8le g\u00e9n\u00e9r\u00e9 est d&rsquo;environ 400MB:<\/p>\n<pre class=\"brush: js; ruler: true;\">&gt; ore.datastore()\n  datastore.name object.count      size       creation.date description\n1   DS NeuralNet            1 407018410 2017-08-04 23:55:24        \n&gt; ore.datastoreSummary(&quot;DS NeuralNet&quot;)\n  object.name        class      size length row.count col.count\n1     nn_nnet nnet.formula 407018410     19        NA        NA\n&gt;\n<\/pre>\n<p style=\"text-align: justify;\">En revanche, pour le scoring, je vais travailler sur mon poste de travail &#8211; il me faut donc rapatrier localement le mod\u00e8le et les donn\u00e9es de validation crois\u00e9e:<\/p>\n<pre class=\"brush: js; ruler: true;\">&gt; ore.attach()\n&gt; ore.sync(table=&quot;MNIST_TEST_SET&quot;)\n&gt; mnist_test &lt;- ore.pull(MNIST_TEST_SET)\n&gt;  \n&gt; library(nnet)\n&gt;\n&gt; ore.load(&quot;DS NeuralNet&quot;, list = c(&quot;nn_nnet&quot;))\n[1] &quot;nn_nnet&quot;\n&gt; \n&gt; pred &lt;- predict(nn_nnet, newdata=mnist_test, supplemental_cols=c(&quot;IMG_ID&quot;,&quot;IMG_LBL&quot;), type=&quot;class&quot;)\n&gt; \n<\/pre>\n<p style=\"text-align: justify;\">On peut voir via une table de contingence que l&rsquo;essentiel des images sont correctement cat\u00e9goris\u00e9es:<\/p>\n<pre class=\"brush: js; ruler: true;\">&gt; table(mnist_test$IMG_LBL,pred)\n   pred\n       0    1    2    3    4    5    6    7    8    9\n  0  962    1    2    0    0    4    4    1    4    2\n  1    0 1124    3    0    0    1    2    2    2    1\n  2    3    2  992   13    4    0    4    6    8    0\n  3    0    0    9  974    0   14    0    4    3    6\n  4    0    0    5    1  950    1    3    4    1   17\n  5    4    2    1   17    3  855    4    0    3    3\n  6    4    2    2    2    9    5  929    1    4    0\n  7    1    2   12    9    2    0    0  992    3    7\n  8    6    0    4   14    2    5    2    3  935    3\n  9    1    3    0    6    7    5    2    6    3  976\n&gt; \n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Une fois les donn\u00e9es du dataset MNIST charg\u00e9es en base, il est possible de les utiliser pour construire un ANN.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[2,3,7,9],"tags":[],"class_list":["post-1004","post","type-post","status-publish","format-standard","hentry","category-ann","category-classification","category-oracle-advanced-analytics","category-oracle-r-enterprise"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1004","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1004"}],"version-history":[{"count":0,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1004\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1004"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1004"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1004"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}