{"id":511,"date":"2016-02-05T07:30:41","date_gmt":"2016-02-05T07:30:41","guid":{"rendered":"http:\/\/blog.tiran.info\/?p=511"},"modified":"2016-02-05T07:30:41","modified_gmt":"2016-02-05T07:30:41","slug":"etude-de-distribution-avec-oracle","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=511","title":{"rendered":"Etude de distribution avec Oracle"},"content":{"rendered":"<p style=\"text-align: justify;\">L&rsquo;ONU publie via le portail <a href=\"http:\/\/faostat3.fao.org\/home\/F\" target=\"_blank\">FAOStat<\/a> une vaste collection de donn\u00e9es relatives \u00e0 l&rsquo;alimentation et \u00e0 l&rsquo;agriculture. Cela permet par exemple d&rsquo;appr\u00e9hender les contributions agricoles nationales \u00e0 l&rsquo;\u00e9chelon mondial.<\/p>\n<p style=\"text-align: justify;\">Dans ce billet, j&rsquo;utilise les donn\u00e9es \u00ab\u00a0<a href=\"http:\/\/faostat3.fao.org\/download\/Q\/QC\/E\" target=\"_blank\">Production Quantity<\/a>\u00a0\u00bb de l&rsquo;ann\u00e9e 2014 pour tout les types de cultures et tous les pays.\u00a0L&rsquo;extraction est r\u00e9alis\u00e9e depuis le site directement au format CSV.<br \/>\nA noter que j&rsquo;ai utilis\u00e9 la version anglaise du site car sinon le chargement des caract\u00e8res accentu\u00e9s me posait des probl\u00e8mes.<\/p>\n<p style=\"text-align: justify;\">Le fichier r\u00e9sultant est accessible <a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/02\/7686bc51-619f-48e0-a01b-2dbf3be7e575.csv\" rel=\"\">ici<\/a>.<\/p>\n<p style=\"text-align: justify;\">Les donn\u00e9es sont d&rsquo;abord charg\u00e9es en base. Pour cela, j&rsquo;utilise l&rsquo;utilitaire\u00a0<a href=\"http:\/\/www.oracle.com\/technetwork\/developer-tools\/sql-developer\/downloads\/index.html\" target=\"_blank\">SQLCl (extension ligne de commande SQLDeveloper)<\/a> qui dispose d&rsquo;une fonction LOAD tr\u00e8s simple:<\/p>\n<pre class=\"brush: js; ruler: true;\">RAFA@s1401037[db121]&gt; help LOAD\n\nLOAD\n-----\n\nLoads a comma separated value (csv) file into a table.\nThe first row of the file must be a header row. The columns in the header row must match the columns defined on the table.\n\nThe columns must be delimited by a comma and may optionally be enclosed in double quotes.\nLines can be terminated with standard line terminators for windows, unix or mac.\nFile must be encoded UTF8.\n\nThe load is processed with 50 rows per batch.\nIf AUTOCOMMIT is set in SQLCL, a commit is done every 10 batches.\nThe load is terminated if more than 50 errors are found.\n\nLOAD [schema.]table_name[@db_link] file_name\n\nRAFA@s1401037[db121]&gt;<\/pre>\n<p style=\"text-align: justify;\">La table de destination est cr\u00e9\u00e9e en respectant pr\u00e9cis\u00e9ment (casse et espaces) les informations de la ligne d&rsquo;ent\u00eate:<\/p>\n<pre class=\"brush: sql; ruler: true;\">SQLcl: Release 4.2.0.15.349.0706 RC on Tue Feb 02 18:53:21 2016\n\nCopyright (c) 1982, 2016, Oracle.  All rights reserved.\n\n\nUsername? (&#039;&#039;?) rafa@localhost\/STATPDB\nPassword? (**********?) ****\nConnected to:\nOracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production\nWith the Partitioning, OLAP, Advanced Analytics and Real Application Testing options\n\n\nRAFA@s1401037[db121]&gt;\nRAFA@s1401037[db121]&gt;\nRAFA@s1401037[db121]&gt; CREATE TABLE production_agricole\n  2  (\n  3     &quot;Domain Code&quot;   VARCHAR2 (10),\n  4     &quot;Domain&quot;        VARCHAR2 (20),\n  5     &quot;AreaCode&quot;      NUMBER,\n  6     &quot;AreaName&quot;      VARCHAR2 (50),\n  7     &quot;ElementCode&quot;   NUMBER,\n  8     &quot;ElementName&quot;   VARCHAR2 (30),\n  9     &quot;ItemCode&quot;      NUMBER,\n 10     &quot;ItemName&quot;      VARCHAR2 (50),\n 11     &quot;Year&quot;          NUMBER,\n 12     &quot;Value&quot;         NUMBER,\n 13     &quot;Flag&quot;          VARCHAR2 (10),\n 14     &quot;FlagD&quot;         VARCHAR2 (1000)\n 15  );\n\n\nTable PRODUCTION_AGRICOLE created.\n\nRAFA@s1401037[db121]&gt;<\/pre>\n<p style=\"text-align: justify;\">J&rsquo;invoque ensuite la commande LOAD:<\/p>\n<pre class=\"brush: sql; ruler: true;\">RAFA@s1401037[db121]&gt; load PRODUCTION_AGRICOLE C:\\RTI\\Stats\\Distrib\\7686bc51-619f-48e0-a01b-2dbf3be7e575.csv\n\n--Number of rows processed: 4,919\n--Number of rows in error: 0\n0 - SUCCESS: Load processed without errors\nRAFA@s1401037[db121]&gt;\nRAFA@s1401037[db121]&gt; commit;\n\n\nCommit complete.\n\nRAFA@s1401037[db121]&gt;<\/pre>\n<p style=\"text-align: justify;\">De mani\u00e8re \u00e0 travailler avec des champs sans contraintes de casse, une vue est cr\u00e9\u00e9e sur la table PRODUCTION_AGRICOLE de mani\u00e8re \u00e0 ne pr\u00e9senter que les champs d&rsquo;int\u00e9r\u00eat:<\/p>\n<pre class=\"brush: sql; ruler: true;\">RAFA@s1401037[db121]&gt; CREATE OR REPLACE VIEW v_production_agricole\n  2  AS\n  3     SELECT &quot;AreaName&quot; pays,\n  4            &quot;ItemName&quot; culture,\n  5            &quot;Year&quot; annee,\n  6            &quot;Value&quot; quantite\n  7       FROM production_agricole\n  8      WHERE &quot;Value&quot; IS NOT NULL;\n\n\nView V_PRODUCTION_AGRICOLE created.\n\nRAFA@s1401037[db121]&gt;\n<\/pre>\n<p>On peut ensuite analyser ces chiffres \u00e0 l&rsquo;aide de fonctions analytiques qui permettent de r\u00e9aliser des sous-groupes (partitions) au sein du dataset par ann\u00e9e et type de culture.<br \/>\nIci, j&rsquo;utilise les fonctions analytiques :<\/p>\n<ul style=\"text-align: justify;\">\n<li><a href=\"https:\/\/docs.oracle.com\/database\/121\/SQLRF\/functions156.htm#SQLRF00691\" target=\"_blank\">RATIO_TO_REPORT<\/a> qui permet de connaitre au sein de chaque partition (c&rsquo;est \u00e0 dire pour un type de culture), pour un pays donn\u00e9, la quote-part que repr\u00e9sente sa production par rapport \u00e0 la production mondiale<\/li>\n<li><a href=\"https:\/\/docs.oracle.com\/database\/121\/SQLRF\/functions140.htm#SQLRF00686\" target=\"_blank\">PERCENT_RANK<\/a> qui permet de connaitre au sein de chaque partition, pour un pays donn\u00e9, le pourcentage de pays dont la production est sup\u00e9rieure pour le type de culture concern\u00e9<\/li>\n<li><a href=\"https:\/\/docs.oracle.com\/database\/121\/SQLRF\/functions155.htm#SQLRF00690\" target=\"_blank\">RANK<\/a> qui permet de connaitre au sein de chaque partition, pour un pays donn\u00e9, son classement en termes de quantit\u00e9 produite<\/li>\n<li><a href=\"https:\/\/docs.oracle.com\/database\/121\/SQLRF\/functions197.htm#SQLRF06115\" target=\"_blank\">SUM<\/a> (dans sa version analytique) de mani\u00e8re \u00e0 r\u00e9aliser au sein de chaque partition une somme roulante des pourcentages des contributions unitaires<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">Ces derni\u00e8res sont embarqu\u00e9es dans la vue v_distribution_production:<\/p>\n<pre class=\"brush: sql; ruler: true;\">RAFA@s1401037[db121]&gt;\nRAFA@s1401037[db121]&gt; CREATE OR REPLACE VIEW v_distribution_production\n  2  AS\n  3     SELECT culture,\n  4            pays,\n  5            annee,\n  6            quantite,\n  7            pct_contrib,\n  8            pct_rank,\n  9            rk,\n 10            SUM (pct_contrib)\n 11               OVER (PARTITION BY culture, annee ORDER BY quantite DESC)\n 12               roll_pct_contrib\n 13       FROM (SELECT culture,\n 14                    pays,\n 15                    annee,\n 16                    quantite,\n 17                      100\n 18                    * RATIO_TO_REPORT (quantite)\n 19                         OVER (PARTITION BY culture, annee)\n 20                       pct_contrib,\n 21                      100\n 22                    * CUME_DIST ()\n 23                      OVER (PARTITION BY culture, annee ORDER BY quantite DESC)\n 24                       pct_rank,\n 25                    RANK ()\n 26                    OVER (PARTITION BY culture, annee ORDER BY quantite DESC)\n 27                       rk\n 28               FROM v_production_agricole);\n\n\nView V_DISTRIBUTION_PRODUCTION created.\n\nRAFA@s1401037[db121]&gt;<\/pre>\n<p style=\"text-align: justify;\">La requ\u00eate suivante permet de connaitre l&rsquo;\u00e9tat de la production Fran\u00e7aise pour chaque type de culture.<br \/>\nPour le bl\u00e9, par exemple, la France est au 5eme rang mondial. Seuls 4.03% des pays produisent davantage et la production Fran\u00e7aise correspond \u00e0 5.34% de la production mondiale:<\/p>\n<pre class=\"brush: sql; ruler: true;\">RAFA@s1401037[db121]&gt;   SELECT culture,\n  2           quantite,\n  3           TRUNC (pct_contrib, 2) pct_contrib,\n  4           TRUNC (pct_rank, 2) pct_rank,\n  5           rk\n  6      FROM v_distribution_production\n  7     WHERE pays = &#039;France&#039;\n  8  ORDER BY quantite DESC;\n\n\nCULTURE                        QUANTITE  PCT_CONTRIB  PCT_RANK  RK\nCereals,Total                  56151227  2            4.37      8\nCereals (Rice Milled Eqv)      56123455  2.19         4.37      8\nWheat                          38966600  5.34         4.03      5\nSugar beet                     37630688  14.1         1.85      1\nCoarse Grain, Total            17101227  1.28         8.33      15\nBarley                         11770680  8.15         1.96      2\nRoots and Tubers,Total         8054500   0.96         10        20\nPotatoes                       8054500   2.09         5.06      8\nRapeseed                       5522980   7.78         7.69      5\nOilcakes Equivalent            4301403   1.17         5.64      11\nOilcrops Primary               2820272   1.42         6.06      12\nTriticale                      2022500   11.84        10.52     4\nMaize                          1854180   0.18         25.14     42\nSunflower seed                 1559100   3.77         12.32     9\nPulses,Total                   842259    1.08         10.91     19\nPeas, dry                      512094    4.51         6.18      6\nOats                           443528    1.93         21.05     16\nSorghum                        397936    0.58         17.85     20\nBroad beans, horse beans, dry  278645    6.41         6.55      4\nSoybeans                       227262    0.07         22.1      21\nGrain, mixed                   221700    5.92         8.33      2\nRye                            128153    0.83         25        15\nBuckwheat                      111300    5.41         13.79     4\nCereals, nes                   111250    1.62         12.28     7\nRice, paddy                    83400     0.01         66.1      78\nHempseed                       57162     73.01        10        1\nMillet                         40000     0.14         40.47     33\nOlives                         23700     0.15         60        24\nLinseed                        23319     0.9          19.6      10\nLentils                        23000     0.47         27.45     14\nOilseeds nes                   18891     0.5          36.84     21\nLupins                         15020     1.53         37.5      9\nMustard seed                   14000     2.03         34.78     8\nPoppy seed                     8000      9.4          31.25     5\nBeans, dry                     7500      0.02         68        85\nPulses, nes                    6000      0.11         52.72     58\n\n\n 36 row selected\n\nRAFA@s1401037[db121]&gt;<\/pre>\n<p style=\"text-align: justify;\">Il est aussi possible de r\u00e9aliser des analyses sur le <a href=\"https:\/\/fr.wikipedia.org\/wiki\/Principe_de_Pareto\" target=\"_blank\">principe du 80\/20 de Pareto<\/a>.<br \/>\nDans la requ\u00eate suivante, pour chaque type de culture, je cherche le pourcentage de pays producteurs qui correspondent \u00e0 80% de la production mondiale:<\/p>\n<pre class=\"brush: sql; ruler: true;\">RAFA@s1401037[db121]&gt;      SELECT culture,\n  2              quantite,\n  3              TRUNC (pct_rank, 2) pct_rank,\n  4              TRUNC (roll_pct_contrib, 2) roll_pct_contrib\n  5         FROM (SELECT a.*,\n  6                      ROW_NUMBER ()\n  7                      OVER (PARTITION BY culture\n  8                            ORDER BY ABS (roll_pct_contrib - 80))\n  9                         r\n 10                 FROM v_distribution_production a)\n 11        WHERE r = 1\n 12     ORDER BY quantite DESC\n 13  FETCH FIRST 20 ROWS ONLY;\n\n\nCULTURE                    QUANTITE  PCT_RANK  ROLL_PCT_CONTRIB\nOil, palm fruit            96066760  4.65      81.22\nSoybeans                   53397715  3.15      80.46\nSugar cane                 32464000  7.84      79.65\nCereals,Total              26739008  11.47     80.1\nRice, paddy                26423300  5.93      79.73\nCereals (Rice Milled Eqv)  23587258  11.47     79.74\nOil, palm                  19667016  13.33     92.78\nCoarse Grain, Total        13025910  10        80.07\nWheat                      11628670  12.09     79.75\nMaize                      11486800  6.58      80.51\nOilcakes Equivalent        8057933   4.1       79.44\nYams                       7119000   3.33      76.46\nRoots and Tubers,Total     6615950   13        80.29\nCassava                    4910810   12.5      79.33\nPalm kernels               4888756   4.65      80.99\nSugar beet                 4805559   22.22     80.45\nPotatoes                   4166000   12.65     80.19\nSeed cotton                3400200   6.89      79.05\nGrain, mixed               2922436   4.16      78.13\nOilcrops Primary           2820272   6.06      80.49\n\n\n 20 row selected\n\nRAFA@s1401037[db121]&gt;<\/pre>\n<p style=\"text-align: justify;\">On peut noter que pour la plupart des productions ci-dessus, on observe une concentration de la production sur un faible pourcentage de pays:\u00a080% de la production mondiale de Soja \u00e9mane de 3% des pays producteurs, ~6% pour le riz etc&#8230;<\/p>\n<p style=\"text-align: justify;\">A l&rsquo;inverse, \u00e0 partir d&rsquo;une distribution existante, les fonctions <a href=\"https:\/\/docs.oracle.com\/database\/121\/SQLRF\/functions141.htm#SQLRF00687\" target=\"_blank\">PERCENTILE_CONT<\/a> &amp; <a href=\"https:\/\/docs.oracle.com\/database\/121\/SQLRF\/functions142.htm#SQLRF00688\" target=\"_blank\">PERCENTILE_DISC<\/a>\u00a0permettent de r\u00e9aliser une extrapolation de quelle devrait \u00eatre la quantit\u00e9 produite pour \u00eatre dans un percentile donn\u00e9.<\/p>\n<p style=\"text-align: justify;\">Dans l&rsquo;exemple ci-dessous, on cherche \u00e0 savoir \u00e0 quel niveau de production de Ma\u00efs et d&rsquo;Avoine il faut se situer pour produire plus que 85% des pays?<\/p>\n<pre class=\"brush: sql; ruler: true;\">RAFA@s1401037[db121]&gt; SELECT DISTINCT\n  2         culture,\n  3         PERCENTILE_CONT (0.85)\n  4            WITHIN GROUP (ORDER BY quantite)\n  5            OVER (PARTITION BY culture)\n  6            Qty\n  7    FROM v_distribution_production a\n  8   WHERE culture IN (&#039;Maize&#039;, &#039;Oats&#039;);\n\n\nCULTURE  QTY\nMaize    4692300\nOats     611826.5\n\nRAFA@s1401037[db121]&gt;\n<\/pre>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>L&rsquo;ONU publie via le portail FAOStat une vaste collection de donn\u00e9es relatives \u00e0 l&rsquo;alimentation et \u00e0 l&rsquo;agriculture. Cela permet par<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[6,15],"tags":[],"class_list":["post-511","post","type-post","status-publish","format-standard","hentry","category-oracle","category-statistique-exploratoire"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/511","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=511"}],"version-history":[{"count":0,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/511\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=511"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=511"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=511"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}