{"id":923,"date":"2017-03-06T07:25:25","date_gmt":"2017-03-06T07:25:25","guid":{"rendered":"http:\/\/blog.tiran.info\/?p=923"},"modified":"2017-03-06T07:25:25","modified_gmt":"2017-03-06T07:25:25","slug":"oracle-r-enterprise-3","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=923","title":{"rendered":"Oracle R Enterprise (#3)"},"content":{"rendered":"<p style=\"text-align: justify;\">Oracle R Enterprise \u00e9tant en place, je peux reprendre les tests de comptage la ou je les avais laiss\u00e9 dans un <a href=\"http:\/\/blog.tiran.info\/oracle-r-enterprise-1\" target=\"_blank\">article pr\u00e9c\u00e9dent<\/a>&#8230;<\/p>\n<pre class=\"brush: js; highlight: [2, 23]; ruler: true;\">\u00a0\n&gt; library(ORE)\nLoading required package: OREbase\nLoading required package: OREcommon\n\nAttaching package: \u2018OREbase\u2019\n\nThe following objects are masked from \u2018package:base\u2019:\n\n    cbind, data.frame, eval, interaction, order, paste, pmax, pmin, rbind, table\n\nLoading required package: OREembed\nLoading required package: OREstats\nLoading required package: MASS\nLoading required package: OREgraphics\nLoading required package: OREeda\nLoading required package: OREmodels\nLoading required package: OREdm\nLoading required package: lattice\nLoading required package: OREpredict\nLoading required package: ORExml\n&gt; \n&gt; ore.connect(user=&quot;c##raf&quot;, password=&quot;Password1#&quot;, conn_string=&quot;\/\/clorai2-scan:1521\/pdb_iotst04&quot;)\n<\/pre>\n<p style=\"text-align: justify;\">A ce stade, on voit une session connect\u00e9e \u00e0 la base:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\nSQL&gt; column username format a8\nSQL&gt; column sql_id format a13\nSQL&gt; column sql_text format a90\nSQL&gt;\nSQL&gt; SELECT username,\n  2         a.sql_id,\n  3         b.sql_text\n  4    FROM v$session a, v$sql b\n  5   WHERE a.program = &#039;rsession.exe&#039; AND a.sql_id = b.sql_id(+);\n\nUSERNAME SQL_ID        SQL_TEXT\n-------- ------------- ------------------------------------------------------------------------------------------\nC##RAF\n\nSQL&gt;\n<\/pre>\n<p style=\"text-align: justify;\">A l&rsquo;aide d&rsquo;<a href=\"http:\/\/docs.oracle.com\/cd\/E67822_01\/OREUG\/GUID-5AD91E5F-4143-4083-9CD5-DAEBC65A4443.htm\" target=\"_blank\">ore.sync<\/a>, on peut alors cr\u00e9er des structures \u00ab\u00a0proxy\u00a0\u00bb pour acc\u00e9der aux tables de la base (dans cet exemple, je pr\u00e9cise l&rsquo;unique table qui m&rsquo;int\u00e9resse: LOG_CONTRIBS). Ces derni\u00e8res sont de type <a href=\"https:\/\/docs.oracle.com\/cd\/E67822_01\/OREUG\/GUID-D0852F22-583A-4610-949A-C2BE52F2461D.htm\" target=\"_blank\">ore.frame<\/a> et se manipulent comme des data.frame:<\/p>\n<pre class=\"brush: js; ruler: true;\">\u00a0\n&gt; ore.attach()\n&gt; ore.sync(table=&quot;LOG_CONTRIBS&quot;)\n&gt; \n&gt; class(LOG_CONTRIBS)\n[1] &quot;ore.frame&quot;\nattr(,&quot;package&quot;)\n[1] &quot;OREbase&quot;\n&gt; \n<\/pre>\n<p style=\"text-align: justify;\">Ainsi, pour faire un comptage exact du nombre de contributeurs distincts \u00e0 Wikipedia, je peut utiliser la formule R length(unique(&lt;df&gt;)).<\/p>\n<pre class=\"brush: js; ruler: true;\">\u00a0\n&gt; system.time(NB_DISTINCT &lt;- length(unique(LOG_CONTRIBS$UNAME)))\n   user  system elapsed \n   0.00    0.00   50.39 \n&gt; NB_DISTINCT\n[1] 28839445\n&gt; \n<\/pre>\n<p style=\"text-align: justify;\">Etant donn\u00e9 que LOG_CONTRIBS est un proxy pour la table de m\u00eame nom, la formule est automatiquement convertie en code SQL. On peut facilement le v\u00e9rifier \u00e0 l&rsquo;aide de la m\u00eame requ\u00eate que celle ex\u00e9cut\u00e9e plus haut:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\nSQL&gt; SELECT username,\n  2         a.sql_id,\n  3         b.sql_text\n  4    FROM v$session a, v$sql b\n  5   WHERE a.program = &#039;rsession.exe&#039; AND a.sql_id = b.sql_id(+);\n\nUSERNAME SQL_ID        SQL_TEXT\n-------- ------------- ------------------------------------------------------------------------------------------\nC##RAF   cq264yx9bt1fp with OBJ31_1 as ( select \/*+ no_merge(t) *\/  &quot;ID&quot; VAL001,&quot;TMSTAMP&quot; VAL002,&quot;UNAME&quot; VAL003,&quot;\n                       LOGTYPE&quot; VAL004,&quot;LOGACTION&quot; VAL005 from &quot;C##RAF&quot;.&quot;LOG_CONTRIBS&quot; t  ),OBJ31_2 as ( select \/\n                       *+ no_merge(t) *\/  &quot;UNAME&quot; VAL001 from &quot;C##RAF&quot;.&quot;LOG_CONTRIBS&quot; t  ) , LEV31_2 as ( select\n                       row_number() over (order by to_char(VAL001)) id, to_char(VAL001) name from OBJ31_2 where V\n                       AL001 is not null group by to_char(VAL001) ),OBJ31_3 as ( select VAL001 NAME001, VAL001  f\n                       rom OBJ31_2 group by VAL001  ) , LEV31_3 as ( select id, name from LEV31_2 ),OBJ31_4 as (\n                       select \/*+ no_merge(t) *\/ 1 NAME001, count(*) VAL001 from OBJ31_3 t  )   select  *  from\n                       OBJ31_4 t order by NAME001\n\nSQL&gt;\n<\/pre>\n<p style=\"text-align: justify;\">Le r\u00e9sultat du comptage est restitu\u00e9 en 50 secondes, ce qui correspond \u00e0 la dur\u00e9e que nous avions obtenu directement en SQL.<\/p>\n<p style=\"text-align: justify;\">Pour le comptage approximatif, je n&rsquo;ai pas trouv\u00e9 de package R impl\u00e9mentant l&rsquo;algorithme d&rsquo;HyperLogLog. Ici, j&rsquo;utilise donc directement une requ\u00e8te SQL au sein d&rsquo;un bloc <a href=\"http:\/\/docs.oracle.com\/cd\/E67822_01\/OREUG\/GUID-E04D5025-049F-4AF9-8DAF-40F3874789E8.htm#OREUG505\" target=\"_blank\">ore.doEval<\/a>.<br \/>\nCette construction est relativement lourde mais elle permet de montrer l&rsquo;op\u00e9ration de connexion implicite \u00e0 l&rsquo;aide de l&rsquo;appel \u00ab\u00a0dbConnect(Extproc())\u00a0\u00bb.<\/p>\n<pre class=\"brush: js; ruler: true;\">\u00a0\n&gt; ore.doEval(function() {\n+   ore.disconnect()\n+   library(tictoc)\n+   tic()\n+   con1 &lt;- dbConnect(Extproc())\n+   res1 &lt;- dbGetQuery(con1, &quot;select approx_count_distinct(UNAME) from LOG_CONTRIBS&quot;)\n+   dbDisconnect(con1)\n+   exectime &lt;- toc()\n+   exectime &lt;- exectime$toc - exectime$tic\n+   print(paste(&quot;Count distinct: &quot;, res1, &quot; - Duree :&quot;, exectime, &quot;secondes&quot;))\n+ }, ore.connect = TRUE)\n[1] &quot;Count distinct:  30284800  - Duree : 12.18 secondes&quot;\n&gt; \n&gt; ore.disconnect()\n&gt; \n<\/pre>\n<p style=\"text-align: justify;\">On notera aussi la r\u00e9f\u00e9rence au package tictoc (qui offre des fonctions de timing). Pour pouvoir utiliser ce dernier dans des appels ORE, il faut que son chargement soit mentionn\u00e9 explicitement dans le bloc (\u00ab\u00a0library(tictoc)\u00a0\u00bb). De plus, le package doit avoir \u00e9t\u00e9 pr\u00e9alablement ajout\u00e9 \u00e0 la distribution R cot\u00e9 serveur:<\/p>\n<pre class=\"brush: js; ruler: true;\">\u00a0\noracle@psu888: \/home\/oracle [IOSHR88D1_2]# ORE CMD INSTALL \/home\/oracle\/ORE\/tictoc_1.0.tar.gz\n* installing to library \u2018\/soft\/oracle\/product\/rdbms\/12.2.0.1\/R\/library\u2019\n* installing *source* package \u2018tictoc\u2019 ...\n** package \u2018tictoc\u2019 successfully unpacked and MD5 sums checked\n** R\n** inst\n** preparing package for lazy loading\n** help\n*** installing help indices\n  converting help for package \u2018tictoc\u2019\n    finding HTML links ... done\n    Stack                                   html\n    tic                                     html\n    tictoc                                  html\n** building package indices\n** testing if installed package can be loaded\n* DONE (tictoc)\noracle@psu888: \/home\/oracle [IOSHR88D1_2]#\n<\/pre>\n<p style=\"text-align: justify;\">Bien s\u00fbr, ici , on aurait plut\u00f4t tout int\u00e9r\u00eat \u00e0 utiliser une ex\u00e9cution SQL directe via ROracle plut\u00f4t qu&rsquo;un bloc ore.doEval!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Oracle R Enterprise \u00e9tant en place, je peux reprendre les tests de comptage la ou je les avais laiss\u00e9 dans<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[9],"tags":[],"class_list":["post-923","post","type-post","status-publish","format-standard","hentry","category-oracle-r-enterprise"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/923","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=923"}],"version-history":[{"count":0,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/923\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=923"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=923"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=923"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}