{"id":505,"date":"2016-01-05T06:23:20","date_gmt":"2016-01-05T06:23:20","guid":{"rendered":"http:\/\/blog.tiran.info\/?p=505"},"modified":"2016-01-05T06:23:20","modified_gmt":"2016-01-05T06:23:20","slug":"pattern-matching-avec-r","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=505","title":{"rendered":"\u00ab\u00a0Pattern matching\u00a0\u00bb avec R"},"content":{"rendered":"<p style=\"text-align: justify;\">Dans la continuit\u00e9 du <a href=\"http:\/\/blog.tiran.info\/pattern-matching-avec-oracle\">pr\u00e9c\u00e9dent billet<\/a>, je tente cette fois-ci de r\u00e9aliser la d\u00e9tection des \u00e9pisodes r\u00e9cessifs survenus dans la zone Euro depuis 2000 \u00e0 l&rsquo;aide de R.<br \/>\nLes donn\u00e9es utilis\u00e9es sont identiques (et sont d&rsquo;ailleurs directement extraites de la base de donn\u00e9es).<\/p>\n<h3>Chargement des donn\u00e9es<\/h3>\n<pre class=\"brush: js; ruler: true;\">&gt; library(ROracle)\nLe chargement a n\u00e9cessit\u00e9 le package : DBI\nWarning message:\nle package \u2018DBI\u2019 a \u00e9t\u00e9 compil\u00e9 avec la version R 3.2.1\n&gt; ora = Oracle()\n&gt; cnx = dbConnect(ora, username=&quot;rafa&quot;, password=&quot;rafa&quot;, dbname=&quot;S1401037:1521\/STATPDB&quot;)\n&gt; pib_evol &lt;- dbGetQuery(cnx, &quot;select * from pib_evol&quot;)\n&gt; dbDisconnect(cnx)\n[1] TRUE\n&gt;\n&gt; summary(pib_evol)\n     PAYS               ANNEE        TRIMESTRE          PIB         \n Length:1068        Min.   :2000   Min.   :1.000   Min.   :  19646  \n Class :character   1st Qu.:2003   1st Qu.:1.000   1st Qu.:  59231  \n Mode  :character   Median :2007   Median :2.000   Median : 269247  \n                    Mean   :2007   Mean   :2.475   Mean   : 691366  \n                    3rd Qu.:2011   3rd Qu.:3.000   3rd Qu.: 751208  \n                    Max.   :2015   Max.   :4.000   Max.   :3497207\n&gt;\n&gt; pib_evol$PAYS &lt;- as.factor(pib_evol$PAYS)\n&gt;<\/pre>\n<h3>Codage du trimestre (as.yearqtr)<\/h3>\n<pre class=\"brush: js; ruler: true;\">&gt; library(zoo)\n\nAttachement du package : \u2018zoo\u2019\n\nThe following objects are masked from \u2018package:base\u2019:\n\n as.Date, as.Date.numeric\n\n&gt; pib_evol$QTR &lt;- paste(pib_evol$ANNEE,&quot;-&quot;, pib_evol$TRIMESTRE, sep=&quot;&quot;)\n&gt; pib_evol$QTR &lt;- as.yearqtr(pib_evol$QTR)\n&gt; \n&gt; pib_evol &lt;- pib_evol[,c(&quot;PAYS&quot;,&quot;QTR&quot;,&quot;PIB&quot;)]\n&gt; head(pib_evol)\n      PAYS     QTR      PIB\n1 Autriche 2000 Q1 297275.4\n2 Autriche 2000 Q2 301053.0\n3 Autriche 2000 Q3 302616.9\n4 Autriche 2000 Q4 307202.8\n5 Autriche 2001 Q1 306744.5\n6 Autriche 2001 Q2 305478.4\n&gt;<\/pre>\n<h3>Pivotement du dataset (dcast)<\/h3>\n<pre class=\"brush: js; ruler: true;\">&gt; library(reshape2)\n&gt; pib_evol_pays &lt;- dcast(data = pib_evol, formula = QTR ~ PAYS)\nUsing PIB as value column: use value.var to override.\n&gt; \n&gt; head(pib_evol_pays)\n      QTR Allemagne Autriche Belgique Espagne  Estonie Finlande  France    Gr\u00e8ce  Irlande  Italie Lettonie Lituanie Luxembourg Pays-Bas Portugal\n1 2000 Q1   2941021 297275.4 361522.4 1189010 19645.98 171950.0 2046747 265121.7 140751.9 1970825 24553.47 39927.56   32714.13 642933.5 263260.2\n2 2000 Q2   2970499 301053.0 363755.5 1203697 20152.38 171857.8 2062126 267881.0 144412.8 1989325 24574.84 40307.34   32606.38 650271.5 261729.2\n3 2000 Q3   2965640 302616.9 365060.7 1216639 20342.40 174338.8 2075392 272843.6 148590.6 2000858 25354.32 40732.99   33460.48 656277.2 265393.1\n4 2000 Q4   2968232 307202.8 368319.3 1230144 20927.21 176051.4 2092983 276631.9 152488.7 2026320 25220.49 41431.49   32370.47 663496.8 267408.2\n5 2001 Q1   3017306 306744.5 368372.6 1242406 21169.45 177720.0 2106714 279797.6 154225.7 2040414 25524.77 42630.47   34083.26 664379.8 266772.7\n6 2001 Q2   3019574 305478.4 368004.1 1252242 21398.56 177641.0 2107050 279534.8 154267.4 2030788 26857.67 42531.60   32821.82 667105.5 269042.3\n  R\u00e9publique slovaque Slov\u00e9nie\n1            81099.31 42507.91\n2            81920.08 43388.81\n3            82458.10 43865.37\n4            82991.53 44158.33\n5            83534.55 44607.08\n6            84443.95 44718.89\n&gt;<\/pre>\n<h3>Conversion en s\u00e9rie temporelle (xts)<\/h3>\n<pre class=\"brush: js; ruler: true;\">&gt; library(xts)\n&gt; pib_evol_pays_ts &lt;- xts(x = pib_evol_pays[, -1], order.by = pib_evol_pays$QTR)\n&gt; head(pib_evol_pays_ts)\n        Allemagne Autriche Belgique Espagne  Estonie Finlande  France    Gr\u00e8ce  Irlande  Italie Lettonie Lituanie Luxembourg Pays-Bas Portugal\n2000 Q1   2941021 297275.4 361522.4 1189010 19645.98 171950.0 2046747 265121.7 140751.9 1970825 24553.47 39927.56   32714.13 642933.5 263260.2\n2000 Q2   2970499 301053.0 363755.5 1203697 20152.38 171857.8 2062126 267881.0 144412.8 1989325 24574.84 40307.34   32606.38 650271.5 261729.2\n2000 Q3   2965640 302616.9 365060.7 1216639 20342.40 174338.8 2075392 272843.6 148590.6 2000858 25354.32 40732.99   33460.48 656277.2 265393.1\n2000 Q4   2968232 307202.8 368319.3 1230144 20927.21 176051.4 2092983 276631.9 152488.7 2026320 25220.49 41431.49   32370.47 663496.8 267408.2\n2001 Q1   3017306 306744.5 368372.6 1242406 21169.45 177720.0 2106714 279797.6 154225.7 2040414 25524.77 42630.47   34083.26 664379.8 266772.7\n2001 Q2   3019574 305478.4 368004.1 1252242 21398.56 177641.0 2107050 279534.8 154267.4 2030788 26857.67 42531.60   32821.82 667105.5 269042.3\n        R\u00e9publique slovaque Slov\u00e9nie\n2000 Q1            81099.31 42507.91\n2000 Q2            81920.08 43388.81\n2000 Q3            82458.10 43865.37\n2000 Q4            82991.53 44158.33\n2001 Q1            83534.55 44607.08\n2001 Q2            84443.95 44718.89\n&gt;<\/pre>\n<h3>Recherche des s\u00e9quences de contraction du PIB<\/h3>\n<p style=\"text-align: justify;\">Dans le bloc ci-dessous, la fonction diff permet de r\u00e9aliser la diff\u00e9rence de la valeur d&rsquo;une ligne avec la suivante. Ce r\u00e9sultat est pass\u00e9 \u00e0 la fonction sign (-1\/0\/1) dont la sortie est test\u00e9e (==1) afin de d\u00e9terminer si la diff\u00e9rence est positive ou pas.<br \/>\nLa conversion as.numeric du r\u00e9sultat du test transforme le bool\u00e9en (TRUE\/FALSE) en 0 ou 1. Ensuite, l&rsquo;ensemble de ces r\u00e9sultats 0\/1 sont concat\u00e9n\u00e9s via la fonction paste.<\/p>\n<p style=\"text-align: justify;\">On obtient alors pour chaque pays une chaine de caract\u00e8re repr\u00e9sentant les survenues de contractions du PIB sous la forme d&rsquo;un bitmap.<br \/>\nIl est alors ais\u00e9 de rechercher dans ce bitmap les occurrences de 3 (ou plus) contractions successives \u00e0 l&rsquo;aide de la fonction gregexpr et de l&rsquo;expression r\u00e9guli\u00e8re \u00ab\u00a0000+\u00a0\u00bb:<\/p>\n<pre class=\"brush: js; ruler: true;\">&gt; recessions &lt;-lapply(\n + pib_evol_pays_ts,\n + function(x) (\n + gregexpr(&quot;000+&quot;,\n + paste(\n + as.numeric(\n + sign(diff(x))==1\n + )\n + , collapse=&quot;&quot;)\n + )\n + )\n + )\n&gt;<\/pre>\n<p style=\"text-align: justify;\">Le r\u00e9sultat est une liste dont chaque membre correspond \u00e0 un pays et dont les \u00e9l\u00e9ments correspondent respectivement aux indices (dans le dataframe pib_evol_pays_ts) des \u00e9pisodes r\u00e9cessifs ainsi qu&rsquo;\u00e0 leur dur\u00e9e (en nombre de diminutions successives):<\/p>\n<pre class=\"brush: js; ruler: true;\">&gt; typeof(recessions)\n[1] &quot;list&quot;\n&gt;\n&gt; head(recessions,3)\n$Allemagne\n$Allemagne[[1]]\n[1] 35\nattr(,&quot;match.length&quot;)\n[1] 4\nattr(,&quot;useBytes&quot;)\n[1] TRUE\n\n\n$Autriche\n$Autriche[[1]]\n[1] 6 35\nattr(,&quot;match.length&quot;)\n[1] 3 5\nattr(,&quot;useBytes&quot;)\n[1] TRUE\n\n\n$Belgique\n$Belgique[[1]]\n[1] 7 36\nattr(,&quot;match.length&quot;)\n[1] 3 4\nattr(,&quot;useBytes&quot;)\n[1] TRUE\n\n\n&gt;<\/pre>\n<h3>Pr\u00e9sentation du r\u00e9sultat<\/h3>\n<p style=\"text-align: justify;\">On construit finalement un dataframe qui liste l&rsquo;ensemble des \u00e9pisodes r\u00e9cessifs de la zone Euro. Chaque ligne reprend une r\u00e9cession en indiquant le pays concern\u00e9, le trimestre de d\u00e9but, le nombre de trimestres de baisse et finalement le pourcentage de baisse:<\/p>\n<pre class=\"brush: js; ruler: true;\">&gt; res1 &lt;- data.frame(Pays=character(1), Trim=numeric(1), Nb_trim_rec=numeric(1), Pct_rec=numeric(1) )\n&gt; for (y in 1:length(recessions))\n+ {\n+ if ( unlist(recessions[[y]])[1] != -1 )\n+ {\n+ idx_deb_rec &lt;- unlist(recessions[[y]]) - 1;\n+ nb_trim_rec &lt;- attr(recessions[[y]][[1]],&quot;match.length&quot;);\n+ pays_rec &lt;- names(recessions)[y];\n+ pct_rec &lt;- round(100 * ((pib_evol_pays[idx_deb_rec,pays_rec] - pib_evol_pays[idx_deb_rec + nb_trim_rec -1,pays_rec]) \/ pib_evol_pays[idx_deb_rec,pays_rec] ),2);\n+ df1 &lt;- data.frame(pays_rec, pib_evol_pays[idx_deb_rec,&quot;QTR&quot;],nb_trim_rec, pct_rec)\n+ names(df1) &lt;- c(&quot;Pays&quot;,&quot;Trim&quot;,&quot;Nb_trim_rec&quot;,&quot;Pct_rec&quot;)\n+ res1 &lt;- rbind(res1, df1)\n+ }\n+ }\n&gt; res1$Trim &lt;- as.yearqtr(res1$Trim)\n&gt; res1[-1,]\n         Pays    Trim Nb_trim_rec Pct_rec\n2   Allemagne 2008 Q2           4    6.68\n3    Autriche 2001 Q1           3    0.61\n4    Autriche 2008 Q2           5    5.14\n5    Belgique 2001 Q2           3    0.29\n6    Belgique 2008 Q3           4    3.32\n7     Espagne 2008 Q3           6    3.90\n8     Espagne 2011 Q1          10    5.04\n9     Estonie 2008 Q3           5   17.90\n10   Finlande 2008 Q4           3    7.00\n11   Finlande 2012 Q2           4    1.33\n12     France 2008 Q2           5    3.49\n13      Gr\u00e8ce 2008 Q2           4    6.14\n14      Gr\u00e8ce 2009 Q3          16   23.20\n15    Irlande 2008 Q1           8    9.26\n16     Italie 2001 Q2           3    0.47\n17     Italie 2007 Q2           3    0.30\n18     Italie 2008 Q2           5    6.90\n19     Italie 2011 Q3           8    4.66\n20     Italie 2013 Q4           5    0.45\n21   Lettonie 2007 Q4           8   21.82\n22   Lettonie 2010 Q1           3    0.75\n23   Lituanie 2008 Q3           4   14.50\n24 Luxembourg 2002 Q3           3    1.77\n25 Luxembourg 2008 Q2           5    8.35\n26   Pays-Bas 2008 Q3           4    4.22\n27   Portugal 2002 Q2           3    1.23\n28   Portugal 2008 Q2           4    3.84\n29   Portugal 2010 Q4           9    7.81\n30   Slov\u00e9nie 2008 Q3           4    9.21\n31   Slov\u00e9nie 2011 Q3           7    4.27\n&gt;<\/pre>\n<p style=\"text-align: justify;\">On retrouve finalement les m\u00eame informations que celles obtenues dans le billet pr\u00e9c\u00e9dent mais il faut bien reconnaitre que l&rsquo;utilisation de MATCH_RECOGNIZE est nettement plus simple que les op\u00e9rations ci-dessus.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dans la continuit\u00e9 du pr\u00e9c\u00e9dent billet, je tente cette fois-ci de r\u00e9aliser la d\u00e9tection des \u00e9pisodes r\u00e9cessifs survenus dans la<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[12,15],"tags":[],"class_list":["post-505","post","type-post","status-publish","format-standard","hentry","category-r","category-statistique-exploratoire"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/505","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=505"}],"version-history":[{"count":0,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/505\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=505"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=505"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=505"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}