{"id":400,"date":"2015-10-28T08:04:58","date_gmt":"2015-10-28T08:04:58","guid":{"rendered":"http:\/\/blog.tiran.info\/?p=400"},"modified":"2015-10-28T08:04:58","modified_gmt":"2015-10-28T08:04:58","slug":"arbre-de-decision-avec-oracle","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=400","title":{"rendered":"Arbre de d\u00e9cision avec Oracle"},"content":{"rendered":"<p style=\"text-align: justify;\">A la suite du <a href=\"https:\/\/fr.wikipedia.org\/wiki\/Affaire_Volkswagen\" target=\"_blank\">scandale des moteurs Volkswagen truqu\u00e9s<\/a>, on a beaucoup parl\u00e9 de l&rsquo;impact de la fraude sur le bonus-malus \u00e9cologique octroy\u00e9 par le gouvernement. Par curiosit\u00e9, je me suis int\u00e9ress\u00e9 aux donn\u00e9es officielles collect\u00e9es par <a href=\"http:\/\/www.ademe.fr\/\" target=\"_blank\">l&rsquo;ADEME<\/a> afin de savoir s&rsquo;il \u00e9tait possible d&rsquo;inf\u00e9rer le montant dudit bonus-malus en fonction des caract\u00e9ristiques techniques d&rsquo;un v\u00e9hicule.<\/p>\n<p style=\"text-align: justify;\">Pour cela, j&rsquo;ai eu recours \u00e0 un <a href=\"https:\/\/en.wikipedia.org\/wiki\/Decision_tree_learning\" target=\"_blank\">arbre de d\u00e9cision<\/a>. Il s&rsquo;agit d&rsquo;une m\u00e9thode d&rsquo;<a href=\"https:\/\/fr.wikipedia.org\/wiki\/Apprentissage_supervis%C3%A9\" target=\"_blank\">apprentissage supervis\u00e9<\/a> qui permet de formuler des r\u00e8gles explicites de classification.<\/p>\n<p style=\"text-align: justify;\">Mon id\u00e9e \u00e9tait de produire l&rsquo;arbre en utilisant un ensemble d&rsquo;apprentissage constitu\u00e9 de v\u00e9hicules des marques non-impliqu\u00e9es dans l&rsquo;affaire. Puis, dans un second temps, d&rsquo;utiliser comme jeu\u00a0de test les v\u00e9hicules des marques incrimin\u00e9es afin de voir si la performance de classification divergeait significativement&#8230;<\/p>\n<p style=\"text-align: justify;\">Oracle propose une <a href=\"http:\/\/docs.oracle.com\/database\/121\/DMCON\/algo_decisiontree.htm#DMCON019\" target=\"_blank\">impl\u00e9mentation d&rsquo;arbre de d\u00e9cision<\/a> bas\u00e9e sur l&rsquo;algorithme CART. A noter que l&rsquo;utilisation de cette fonctionnalit\u00e9 requiert l&rsquo;option Oracle Advanced Analytics.<\/p>\n<p style=\"text-align: justify;\">Pour cette analyse, j&rsquo;ai utilis\u00e9 le <a href=\"https:\/\/www.data.gouv.fr\/fr\/datasets\/emissions-de-co2-et-de-polluants-des-vehicules-commercialises-en-france\/\" target=\"_blank\">fichier de donn\u00e9es correspondant \u00e0 l&rsquo;ann\u00e9e 2014<\/a> car il pr\u00e9sentait plus d&rsquo;informations (type de carrosserie en particulier) que celui de 2015:\u00a0<a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2015\/10\/mars-2014-complete.zip\">mars-2014-complete<\/a><\/p>\n<h3 style=\"text-align: justify;\">Chargement des donn\u00e9es<\/h2>\n<p style=\"text-align: justify;\">Une table externe est utilis\u00e9e pour acc\u00e9der aux donn\u00e9es depuis la base de donn\u00e9es:<\/p>\n<pre  class=\"brush: sql; ruler: true;\">SQL&gt; CREATE TABLE exttab_autos\n  2  (\n  3     lib_mrq             VARCHAR2 (40),\n  4     lib_mod_doss        VARCHAR2 (40),\n  5     lib_mod             VARCHAR2 (40),\n  6     dscom               VARCHAR2 (200),\n  7     cnit                VARCHAR2 (40),\n  8     tvv                 VARCHAR2 (40),\n  9     cod_cbr             VARCHAR2 (6),\n 10     hybride             VARCHAR2 (5),\n 11     puiss_admin_98      NUMBER,\n 12     puiss_max           NUMBER,\n 13     typ_boite_nb_rapp   VARCHAR2 (5),\n 14     conso_urb           NUMBER,\n 15     conso_exurb         NUMBER,\n 16     conso_mixte         NUMBER,\n 17     co2                 NUMBER,\n 18     co_typ_1            NUMBER,\n 19     hc                  NUMBER,\n 20     nox                 NUMBER,\n 21     hcnox               NUMBER,\n 22     ptcl                NUMBER,\n 23     masse_ordma_min     NUMBER,\n 24     masse_ordma_max     NUMBER,\n 25     champ_v9            VARCHAR2 (40),\n 26     date_maj            VARCHAR2 (40),\n 27     Carrosserie         VARCHAR2 (40),\n 28     gamme               VARCHAR2 (40)\n 29  )\n 30  ORGANIZATION EXTERNAL\n 31  (\n 32      TYPE oracle_loader\n 33      DEFAULT DIRECTORY datadir\n 34      ACCESS PARAMETERS\n 35          (\n 36          RECORDS DELIMITED BY NEWLINE\n 37          SKIP 1\n 38          FIELDS TERMINATED BY &#039;;&#039;\n 39          MISSING FIELD VALUES ARE NULL\n 40          )\n 41      LOCATION (&#039;mars-2014-complete.csv&#039;)\n 42  )\n 43  REJECT LIMIT 0\n 44  NOPARALLEL;\n\nTable created.\n\nSQL&gt;<\/pre>\n<h3 style=\"text-align: justify;\">Cr\u00e9ation d&rsquo;une fonction de d\u00e9termination du montant du bonus-malus<\/h2>\n<p style=\"text-align: justify;\">A partir\u00a0de la quantit\u00e9 de CO2 rejet\u00e9e par un v\u00e9hicule, la fonction get_bonus_malus renvoie le bonus\/malus auquel il est \u00e9ligible. Les seuils et les montants associ\u00e9s sont publi\u00e9s sur le site du\u00a0<a href=\"http:\/\/www.developpement-durable.gouv.fr\/Bonus-Malus-definitions-et-baremes.html\" target=\"_blank\">minist\u00e8re de l&rsquo;\u00e9cologie et du d\u00e9veloppement durable<\/a>.<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; CREATE OR REPLACE FUNCTION get_bonus_malus (p_co2 NUMBER)\n  2     RETURN VARCHAR2\n  3     RESULT_CACHE\n  4  IS\n  5  BEGIN\n  6     RETURN CASE\n  7               WHEN p_co2 BETWEEN 0 AND 20 THEN &#039;A (-6300)&#039;\n  8               WHEN p_co2 BETWEEN 21 AND 60 THEN &#039;B (-4000)&#039;\n  9               WHEN p_co2 BETWEEN 131 AND 135 THEN &#039;D (150)&#039;\n 10               WHEN p_co2 BETWEEN 136 AND 134 THEN &#039;E (250)&#039;\n 11               WHEN p_co2 BETWEEN 141 AND 145 THEN &#039;F (500)&#039;\n 12               WHEN p_co2 BETWEEN 146 AND 150 THEN &#039;G (900)&#039;\n 13               WHEN p_co2 BETWEEN 151 AND 155 THEN &#039;H (1600)&#039;\n 14               WHEN p_co2 BETWEEN 156 AND 175 THEN &#039;I (2200)&#039;\n 15               WHEN p_co2 BETWEEN 176 AND 180 THEN &#039;J (3000)&#039;\n 16               WHEN p_co2 BETWEEN 181 AND 185 THEN &#039;K (3600)&#039;\n 17               WHEN p_co2 BETWEEN 186 AND 190 THEN &#039;L (4000)&#039;\n 18               WHEN p_co2 BETWEEN 191 AND 200 THEN &#039;M (6500)&#039;\n 19               WHEN p_co2 &gt; 200 THEN &#039;N (8000)&#039;\n 20               ELSE &#039;C (0)&#039;\n 21            END;\n 22  END;\n 23  \/\n\nFunction created.\n\nSQL&gt;\n<\/pre>\n<h3 style=\"text-align: justify;\">Cr\u00e9ation des vues\u00a0d&rsquo;analyse<\/h2>\n<p style=\"text-align: justify;\">La vue v_autos extrait les donn\u00e9es d&rsquo;int\u00e9r\u00eats de exttab_autos &#8211; \u00e0 savoir, on s&rsquo;int\u00e9resse aux v\u00e9hicules grand-public essence ou diesel (d&rsquo;o\u00f9 l&rsquo;exclusion des 4&#215;4 et des minibus). On applique ensuite la fonction get_bonus_malus et enfin on supprime les doublons:<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; CREATE OR REPLACE VIEW v_autos\n  2  AS\n  3     SELECT DISTINCT\n  4            TRIM (lib_mrq) lib_mrq,\n  5            TRIM (lib_mod) lib_mod,\n  6            TRIM (cod_cbr) cod_cbr,\n  7            TRIM (hybride) hybride,\n  8            puiss_admin_98,\n  9            puiss_max,\n 10            SUBSTR (TYP_BOITE_NB_RAPP, 1, 1) typ_boite,\n 11            TO_NUMBER (REPLACE (SUBSTR (TYP_BOITE_NB_RAPP, 3, 1), &#039;.&#039;, &#039;0&#039;))\n 12               nb_rapp,\n 13            conso_urb,\n 14            conso_exurb,\n 15            conso_mixte,\n 16            masse_ordma_min,\n 17            masse_ordma_max,\n 18            decode(carrosserie,&#039;COMBISPCACE&#039;,&#039;COMBISPACE&#039;,carrosserie) carrosserie,\n 19            gamme,\n 20            get_bonus_malus (co2) bonus_malus\n 21       FROM exttab_autos\n 22      WHERE     carrosserie NOT IN (&#039;MINIBUS&#039;,&#039;TS TERRAINS\/CHEMINS&#039;,&#039;CABRIOLET&#039;)\n 23            AND cod_cbr IN (&#039;ES&#039;, &#039;GO&#039;);\n\nView created.\n\nSQL&gt;<\/pre>\n<p style=\"text-align: justify;\">Les vues v_autos_apprentissage et v_autos_test assurent respectivement la division du jeu de donn\u00e9es en un jeu d&rsquo;apprentissage (incluant les marques non-incrimin\u00e9es dans le scandale) et un jeu de test (incluant les marques incrimin\u00e9es):<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; CREATE OR REPLACE VIEW v_autos_apprentissage\n  2  AS\n  3     SELECT *\n  4       FROM v_autos\n  5      WHERE     lib_mrq NOT IN (&#039;VOLKSWAGEN&#039;,\n  6                                &#039;SEAT&#039;,\n  7                                &#039;SKODA&#039;,\n  8                                &#039;AUDI&#039;);\n\nView created.\n\nSQL&gt;\nSQL&gt;\nSQL&gt; CREATE OR REPLACE VIEW v_autos_test\n  2  AS\n  3     SELECT *\n  4       FROM v_autos\n  5      WHERE     lib_mrq IN (&#039;VOLKSWAGEN&#039;,\n  6                            &#039;SEAT&#039;,\n  7                            &#039;SKODA&#039;,\n  8                            &#039;AUDI&#039;);\n\nView created.\n\nSQL&gt;\nSQL&gt; SELECT COUNT (*) FROM v_autos_apprentissage;\n\n  COUNT(*)\n----------\n      2722\n\nSQL&gt; SELECT COUNT (*) FROM v_autos_test;\n\n  COUNT(*)\n----------\n       654\n\nSQL&gt;<\/pre>\n<p style=\"text-align: justify;\">On a donc une distribution 80\/20 des effectifs entre l&rsquo;ensemble d&rsquo;apprentissage et celui de test.<\/p>\n<h3 style=\"text-align: justify;\">Cr\u00e9ation de la table de param\u00e9trage du mod\u00e8le<\/h2>\n<p style=\"text-align: justify;\">A chaque \u00e9tape de croissance de l&rsquo;arbre, l&rsquo;algorithme tente de trouver l&rsquo;attribut de partage qui produit des sous-ensembles les plus homog\u00e8nes. Deux options sont disponibles pour quantifier cette homog\u00e9n\u00e9it\u00e9: le <a href=\"https:\/\/fr.wikipedia.org\/wiki\/Coefficient_de_Gini\" target=\"_blank\">crit\u00e8re de Gini<\/a> ou l&rsquo;<a href=\"https:\/\/fr.wikipedia.org\/wiki\/Entropie_de_Shannon\" target=\"_blank\">entropie<\/a>. Ici, j&rsquo;utilise l&rsquo;entropie (DBMS_DATA_MINING.TREE_IMPURITY_ENTROPY) qui a donn\u00e9 de meilleurs r\u00e9sultats lors de mes tests:<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; CREATE TABLE autos_dt_settings\n  2  (\n  3     setting_name    VARCHAR2 (30),\n  4     setting_value   VARCHAR2 (30)\n  5  );\n\nTable created.\n\nSQL&gt;\nSQL&gt;\nSQL&gt; BEGIN\n  2     INSERT INTO autos_dt_settings (setting_name, setting_value)\n  3             VALUES (\n  4                       DBMS_DATA_MINING.algo_name,\n  5                       DBMS_DATA_MINING.algo_decision_tree);\n  6\n  7     INSERT INTO autos_dt_settings (setting_name, setting_value)\n  8             VALUES (\n  9                       DBMS_DATA_MINING.TREE_IMPURITY_METRIC,\n 10                       DBMS_DATA_MINING.TREE_IMPURITY_ENTROPY);\n 11     COMMIT;\n 12  END;\n 13  \/\n\nPL\/SQL procedure successfully completed.\n\nSQL&gt;<\/pre>\n<h3 style=\"text-align: justify;\">Cr\u00e9ation du mod\u00e8le<\/h2>\n<p style=\"text-align: justify;\">Les champs LIB_MRQ et LIB_MOD &#8211; respectivement, la marque et le mod\u00e8le des v\u00e9hicules &#8211; sans valeur informative, sont exclus de l&rsquo;analyse via une op\u00e9ration de transformation (<a href=\"https:\/\/docs.oracle.com\/database\/121\/ARPLS\/d_dmtran.htm#ARPLS191\" target=\"_blank\">DBMS_DATA_MINING_TRANSFORM<\/a>):<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; DECLARE\n  2     l_xform   DBMS_DATA_MINING_TRANSFORM.transform_list;\n  3  BEGIN\n  4     DBMS_DATA_MINING_TRANSFORM.set_transform (l_xform,\n  5                                               &#039;LIB_MRQ&#039;,\n  6                                               NULL,\n  7                                               NULL,\n  8                                               NULL);\n  9\n 10     DBMS_DATA_MINING_TRANSFORM.set_transform (l_xform,\n 11                                               &#039;LIB_MOD&#039;,\n 12                                               NULL,\n 13                                               NULL,\n 14                                               NULL);\n 15\n 16     DBMS_DATA_MINING.CREATE_MODEL (\n 17        model_name            =&gt; &#039;AUTOS_DT_MODEL&#039;,\n 18        mining_function       =&gt; DBMS_DATA_MINING.classification,\n 19        data_table_name       =&gt; &#039;V_AUTOS_APPRENTISSAGE&#039;,\n 20        case_id_column_name   =&gt; &#039;&#039;,\n 21        target_column_name    =&gt; &#039;BONUS_MALUS&#039;,\n 22        settings_table_name   =&gt; &#039;AUTOS_DT_SETTINGS&#039;,\n 23        xform_list            =&gt; l_xform);\n 24  END;\n 25  \/\n\nPL\/SQL procedure successfully completed.\n\nSQL&gt;<\/pre>\n<h3 style=\"text-align: justify;\">Visualisation du mod\u00e8le<\/h2>\n<p style=\"text-align: justify;\">Le mod\u00e8le produit est un document au <a href=\"https:\/\/en.wikipedia.org\/wiki\/Predictive_Model_Markup_Language\" target=\"_blank\">format PMML<\/a> qui d\u00e9taille les r\u00e8gles de classification:<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; set long 300\nSQL&gt; SELECT DBMS_DATA_MINING.get_model_details_xml (&#039;AUTOS_DT_MODEL&#039;)\n  2            AS DT_DETAILS\n  3    FROM DUAL;\n\nDT_DETAILS\n--------------------------------------------------------------------------------\n&lt;PMML version=&quot;2.1&quot;&gt;\n &lt;Header copyright=&quot;Copyright (c) 2004, Oracle Corporation. All rights reserved\n.&quot;\/&gt;\n &lt;DataDictionary numberOfFields=&quot;8&quot;&gt;\n &lt;DataField name=&quot;BONUS_MALUS&quot; optype=&quot;categorical&quot;\/&gt;\n &lt;DataField name=&quot;CARROSSERIE&quot; optype=&quot;categorical&quot;\/&gt;\n &lt;DataField name=&quot;COD_CBR&quot; optype=&quot;cate\n\n\nSQL&gt;\n\n<\/pre>\n<header><\/header>\n<p style=\"text-align: justify;\">Il est possible d&rsquo;afficher sous forme textuelle les r\u00e8gles en question via un script publi\u00e9 sur le site du <a href=\"https:\/\/blogs.oracle.com\/datamining\/entry\/readable_rules_from_a_decision_tree_model\" target=\"_blank\">blog de l&rsquo;\u00e9quipe ODM<\/a>.<\/p>\n<p style=\"text-align: justify;\">J&rsquo;ai modifi\u00e9 l\u00e9g\u00e8rement ce script pour ne r\u00e9cup\u00e9rer que les terminaisons (feuilles) de l&rsquo;arbre et la r\u00e8gle associ\u00e9e:\u00a0<a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2015\/10\/dt_rules.zip\">dt_rules<\/a><\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; @C:\\RTI\\Stats\\dt_rules.sql\nEnter value for model_name: AUTOS_DT_MODEL\n\nRECORD_COUNT PREDICTION      FULL_SIMPLE_RULE\n------------ --------------- -----------------------------------------------------------------------------------------------------------------------\n         985 C (0)           (CONSO_MIXTE &lt;= 4.9500000475) 30 D (150) (CONSO_MIXTE &gt; 4.9500000475) AND (MASSE_ORDMA_MAX &lt;= 1504.5) AND (COD_CBR in (&quot;GO&quot; )) AND (CONSO_MIXTE &lt;= 5.25)\n          18 C (0)           (MASSE_ORDMA_MAX &lt;= 1504.5) AND (CONSO_MIXTE &lt;= 5.649999857) AND (COD_CBR in (&quot;GO&quot; )) AND (CONSO_MIXTE &gt; 5.25)\n         195 C (0)           (CONSO_MIXTE &gt; 4.9500000475) AND (MASSE_ORDMA_MAX &lt;= 1504.5) AND (CONSO_MIXTE &lt;= 5.649999857) AND (COD_CBR in (&quot;ES&quot; ))\n          73 D (150)         (MASSE_ORDMA_MAX &lt;= 1504.5) AND (CONSO_MIXTE &gt; 5.649999857) AND (CONSO_MIXTE &lt;= 5.850000143)\n          64 C (0)           (CONSO_MIXTE &lt;= 6.0499999525) AND (MASSE_ORDMA_MAX &lt;= 1504.5) AND (CONSO_MIXTE &gt; 5.850000143)\n          93 D (150)         (CONSO_MIXTE &gt; 4.9500000475) AND (MASSE_ORDMA_MAX &gt; 1504.5) AND (CONSO_MIXTE &lt;= 5.149999857) 86 C (0) (MASSE_ORDMA_MAX &gt; 1504.5) AND (CONSO_MIXTE &lt;= 5.350000143) AND (CONSO_MIXTE &gt; 5.149999857)\n          70 F (500)         (MASSE_ORDMA_MAX &gt; 1504.5) AND (CONSO_MIXTE &gt; 5.350000143) AND (CONSO_MIXTE &lt;= 5.5499999525) 88 G (900) (MASSE_ORDMA_MAX &gt; 1504.5) AND (CONSO_MIXTE &lt;= 5.75) AND (CONSO_MIXTE &gt; 5.5499999525)\n          78 H (1600)        (CONSO_MIXTE &lt;= 6.0499999525) AND (MASSE_ORDMA_MAX &gt; 1504.5) AND (CONSO_MIXTE &gt; 5.75) AND (COD_CBR in (&quot;GO&quot; ))\n          28 C (0)           (CONSO_MIXTE &lt;= 6.0499999525) AND (MASSE_ORDMA_MAX &gt; 1504.5) AND (CONSO_MIXTE &gt; 5.75) AND (COD_CBR in (&quot;ES&quot; ))\n          73 I (2200)        (CONSO_MIXTE &gt; 6.0499999525) AND (CONSO_MIXTE &lt;= 6.4500000475) AND (COD_CBR in (&quot;GO&quot; )) 73 F (500) (CONSO_MIXTE &gt; 6.0499999525) AND (COD_CBR in (&quot;ES&quot; )) AND (CONSO_MIXTE &lt;= 6.25)\n          91 G (900)         (CONSO_MIXTE &lt;= 6.4500000475) AND (COD_CBR in (&quot;ES&quot; )) AND (CONSO_MIXTE &gt; 6.25)\n          14 I (2200)        (CONSO_MIXTE &gt; 6.4500000475) AND (CONSO_MIXTE &lt;= 6.75) AND (COD_CBR in (&quot;GO&quot; )) 56 H (1600) (CONSO_MIXTE &gt; 6.4500000475) AND (COD_CBR in (&quot;ES&quot; )) AND (CONSO_MIXTE &lt;= 6.649999857)\n          26 I (2200)        (CONSO_MIXTE &lt;= 6.75) AND (COD_CBR in (&quot;ES&quot; )) AND (CONSO_MIXTE &gt; 6.649999857)\n          12 K (3600)        (CONSO_MIXTE &gt; 6.75) AND (COD_CBR in (&quot;GO&quot; )) AND (CONSO_MIXTE &lt;= 7.149999857)\n          10 M (6500)        (CONSO_MIXTE &lt;= 7.75) AND (COD_CBR in (&quot;GO&quot; )) AND (CONSO_MIXTE &gt; 7.149999857)\n         137 I (2200)        (CONSO_MIXTE &gt; 6.75) AND (COD_CBR in (&quot;ES&quot; )) AND (CONSO_MIXTE &lt;= 7.4500000475)\n          30 J (3000)        (CONSO_MIXTE &lt;= 7.75) AND (COD_CBR in (&quot;ES&quot; )) AND (CONSO_MIXTE &gt; 7.4500000475)\n          29 K (3600)        (CONSO_MIXTE &gt; 7.75) AND (CONSO_MIXTE &lt;= 7.9500000475)\n          28 L (4000)        (CONSO_MIXTE &lt;= 8.150000095) AND (CONSO_MIXTE &gt; 7.9500000475)\n          36 M (6500)        (CONSO_MIXTE &lt;= 8.650000095) AND (CONSO_MIXTE &gt; 8.150000095)\n         299 N (8000)        (CONSO_MIXTE &gt; 8.650000095)\n\n26 rows selected.\n\nSQL&gt;<\/pre>\n<p style=\"text-align: justify;\">Pour chaque v\u00e9hicule, on peut aussi visualiser via <a href=\"http:\/\/docs.oracle.com\/database\/121\/SQLRF\/functions149.htm#SQLRF06211\" target=\"_blank\">PREDICTION_DETAILS<\/a> les crit\u00e8res utilis\u00e9s pour aboutir \u00e0 l&rsquo;une des feuilles de l&rsquo;arbre:<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; column BONUS_MALUS format a15\nSQL&gt; column DETAIL format a95\nSQL&gt; set long 500\nSQL&gt; SELECT bonus_malus,\n  2         PREDICTION (AUTOS_DT_MODEL USING a.*) prediction,\n  3         PREDICTION_DETAILS (AUTOS_DT_MODEL USING a.*) detail\n  4    FROM v_autos_test a\n  5   WHERE lib_mod = &#039;GOLF&#039; AND puiss_max = 90 AND carrosserie=&#039;BERLINE&#039;;\n\nBONUS_MALUS     PREDICTION      DETAIL\n--------------- --------------- -----------------------------------------------------------------------------------------------\nG (900)         G (900)         &lt;Details algorithm=&quot;Decision Tree&quot; class=&quot;3&quot; node=&quot;39&quot;&gt;\n                                &lt;Attribute name=&quot;CONSO_MIXTE&quot; actualValue=&quot;6.300000191&quot; operator=&quot;between&quot; range=&quot;(6.25:6.45000\n                                00475]&quot; weight=&quot;.62&quot; rank=&quot;1&quot;\/&gt;\n                                &lt;Attribute name=&quot;COD_CBR&quot; actualValue=&quot;ES&quot; operator=&quot;in&quot; range=&quot;ES&quot; weight=&quot;.143&quot; rank=&quot;2&quot;\/&gt;\n                                &lt;\/Details&gt;\n\nH (1600)        H (1600)        &lt;Details algorithm=&quot;Decision Tree&quot; class=&quot;4&quot; node=&quot;41&quot;&gt;\n                                &lt;Attribute name=&quot;CONSO_MIXTE&quot; actualValue=&quot;6.5&quot; operator=&quot;between&quot; range=&quot;(6.4500000475:6.64999\n                                9857]&quot; weight=&quot;.726&quot; rank=&quot;1&quot;\/&gt;\n                                &lt;Attribute name=&quot;COD_CBR&quot; actualValue=&quot;ES&quot; operator=&quot;in&quot; range=&quot;ES&quot; weight=&quot;.108&quot; rank=&quot;2&quot;\/&gt;\n                                &lt;\/Details&gt;\n\n\nSQL&gt;<\/pre>\n<h3 style=\"text-align: justify;\">Performance du mod\u00e8le<\/h2>\n<p style=\"text-align: justify;\">La performance globale du classifieur est tr\u00e8s proche (sup\u00e9rieure \u00e0 90%) pour les deux \u00e9chantillons:<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; WITH classification_perf\n  2       AS (  SELECT jeu, pred_correcte, COUNT (*) cnt\n  3               FROM (SELECT &#039;TEST&#039; jeu,\n  4                            CASE\n  5                               WHEN a.bonus_malus =\n  6                                       PREDICTION (AUTOS_DT_MODEL USING *)\n  7                               THEN\n  8                                  1\n  9                               ELSE\n 10                                  0\n 11                            END\n 12                               pred_correcte\n 13                       FROM v_autos_test a\n 14                     UNION ALL\n 15                     SELECT &#039;APPRENTISSAGE&#039; jeu,\n 16                            CASE\n 17                               WHEN a.bonus_malus =\n 18                                       PREDICTION (AUTOS_DT_MODEL USING *)\n 19                               THEN\n 20                                  1\n 21                               ELSE\n 22                                  0\n 23                            END\n 24                               pred_correcte\n 25                       FROM v_autos_apprentissage a)\n 26           GROUP BY jeu, pred_correcte)\n 27    SELECT jeu, ROUND (100 * SUM (pred_correcte * cnt) \/ SUM (cnt), 1) pct_global\n 28      FROM classification_perf\n 29  GROUP BY jeu;\n\nJEU           PCT_GLOBAL\n------------- ----------\nAPPRENTISSAGE       92.1\nTEST                93.4\n\nSQL&gt;<\/pre>\n<p style=\"text-align: justify;\">On peut aussi produire une <a href=\"https:\/\/fr.wikipedia.org\/wiki\/Matrice_de_confusion\" target=\"_blank\">matrice de confusion<\/a> afin de visualiser la distribution des pr\u00e9dictions correctes\/incorrectes.<br \/>\nA noter l&rsquo;utilisation de la clause\u00a0<a href=\"https:\/\/docs.oracle.com\/database\/121\/SQLRF\/statements_10002.htm#CHDFAFIE\" target=\"_blank\">PIVOT<\/a> pour permuter les donn\u00e9es r\u00e9elles sous forme de colonnes:<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQL&gt; column PRED format a10\nSQL&gt; set NUMFORMAT 9999\nSQL&gt;   SELECT *\n  2      FROM (SELECT bonus_malus, PREDICTION (AUTOS_DT_MODEL USING *) AS pred\n  3              FROM v_autos_test) PIVOT (COUNT (*)\n  4                                    FOR bonus_malus\n  5                                    IN (&#039;A (-6300)&#039; A,\n  6                                       &#039;B (-4000)&#039; B,\n  7                                       &#039;C (0)&#039; C,\n  8                                       &#039;D (150)&#039; D,\n  9                                       &#039;E (250)&#039; E,\n 10                                       &#039;F (500)&#039; F,\n 11                                       &#039;G (900)&#039; G,\n 12                                       &#039;H (1600)&#039; H,\n 13                                       &#039;I (2200)&#039; I,\n 14                                       &#039;J (3000)&#039; J,\n 15                                       &#039;K (3600)&#039; K,\n 16                                       &#039;L (4000)&#039; L,\n 17                                       &#039;M (6500)&#039; M,\n 18                                       &#039;N (8000)&#039; N))\n 19  ORDER BY 1;\n\nPRED           A     B     C     D     E     F     G     H     I     J     K     L     M     N\n---------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----\nC (0)          0     0   369     7     0     3     0     0     0     0     0     0     0     0\nD (150)        0     0     3    47     0     0     0     0     0     0     0     0     0     0\nF (500)        0     0     0     0     0    19     1     0     0     0     0     0     0     0\nG (900)        0     0     0     0     0     1    46     0     0     0     0     0     0     0\nH (1600)       0     0     0     0     0     0     4    36     9     0     0     0     0     0\nI (2200)       0     0     0     0     0     0     0     3    39     4     0     0     0     0\nJ (3000)       0     0     0     0     0     0     0     0     1     5     0     0     0     0\nK (3600)       0     0     0     0     0     0     0     0     0     1     4     0     0     0\nL (4000)       0     0     0     0     0     0     0     0     0     1     0    10     0     0\nM (6500)       0     0     0     0     0     0     0     0     0     0     0     3     9     2\nN (8000)       0     0     0     0     0     0     0     0     0     0     0     0     0    27\n\n11 rows selected.\n\nSQL&gt;<\/pre>\n<p style=\"text-align: justify;\">En conclusion, l&rsquo;analyse des donn\u00e9es techniques des v\u00e9hicules par rapport aux cat\u00e9gories de bonus\/malus \u00e9cologique ne permet pas de mettre en \u00e9vidence d&rsquo;incoh\u00e9rences pour les v\u00e9hicules volkswagen. Cela-dit, c&rsquo;est finalement assez logique car, si j&rsquo;ai bien compris, la fraude porterait plut\u00f4t sur un autre polluant (le NOX) que le CO2&#8230; \ud83d\ude09<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A la suite du scandale des moteurs Volkswagen truqu\u00e9s, on a beaucoup parl\u00e9 de l&rsquo;impact de la fraude sur le<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[3,6,7],"tags":[],"class_list":["post-400","post","type-post","status-publish","format-standard","hentry","category-classification","category-oracle","category-oracle-advanced-analytics"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/400","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=400"}],"version-history":[{"count":0,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/400\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=400"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=400"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=400"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}