{"id":565,"date":"2016-04-18T07:33:33","date_gmt":"2016-04-18T07:33:33","guid":{"rendered":"http:\/\/blog.tiran.info\/?p=565"},"modified":"2017-12-05T14:21:16","modified_gmt":"2017-12-05T13:21:16","slug":"analyse-de-correlation-avec-r","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=565","title":{"rendered":"Analyse de corr\u00e9lation avec R"},"content":{"rendered":"<p style=\"text-align: justify;\">Dans la continuit\u00e9 du <a href=\"http:\/\/blog.tiran.info\/analyse-de-correlation-avec-oracle\" target=\"_blank\" rel=\"noopener\">pr\u00e9c\u00e9dent billet<\/a>, je r\u00e9alise cette fois-ci l&rsquo;analyse de corr\u00e9lation \u00e0 l&rsquo;aide de R.<\/p>\n<p style=\"text-align: justify;\">Les donn\u00e9es sont initialement charg\u00e9es dans un dataframe debits_loire depuis le fichier Excel d\u00e9j\u00e0 utilis\u00e9:\u00a0<a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/03\/Debits_Loire.xlsx\" rel=\"\">Debits_Loire.xlsx<\/a><\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; require(xlsx)\r\nLe chargement a n\u00e9cessit\u00e9 le package : xlsx\r\nLe chargement a n\u00e9cessit\u00e9 le package : rJava\r\nLe chargement a n\u00e9cessit\u00e9 le package : xlsxjars\r\n&gt; setwd(&quot;C:\/RTI\/Stats\/Correlation&quot;)\r\n&gt; debits_loire &lt;- read.xlsx(&quot;Debits_Loire.xlsx&quot;, sheetIndex = 1)\r\n&gt; names(debits_loire) &lt;- c(&quot;Date&quot;,&quot;Nevers&quot;,&quot;Saint-Satur&quot;,&quot;Orleans&quot;,&quot;Blois&quot;,&quot;Tours&quot;)\r\n&gt; summary(debits_loire)\r\n                 Date          Nevers       Saint-Satur        Orleans           Blois            Tours       \r\n 01 FEV. 2016 00:00:   1   Min.   :143.0   Min.   : 281.0   Min.   : 294.0   Min.   : 311.0   Min.   : 350.0  \r\n 01 FEV. 2016 01:00:   1   1st Qu.:214.0   1st Qu.: 379.5   1st Qu.: 401.8   1st Qu.: 414.0   1st Qu.: 444.8  \r\n 01 FEV. 2016 02:00:   1   Median :289.0   Median : 453.0   Median : 479.0   Median : 527.0   Median : 572.0  \r\n 01 FEV. 2016 03:00:   1   Mean   :335.7   Mean   : 523.3   Mean   : 532.2   Mean   : 577.3   Mean   : 619.8  \r\n 01 FEV. 2016 04:00:   1   3rd Qu.:391.0   3rd Qu.: 577.2   3rd Qu.: 571.2   3rd Qu.: 640.5   3rd Qu.: 654.0  \r\n 01 FEV. 2016 05:00:   1   Max.   :928.0   Max.   :1360.0   Max.   :1270.0   Max.   :1310.0   Max.   :1430.0  \r\n (Other)           :1266                                                                                      \r\n&gt;\r\n<\/pre>\n<p style=\"text-align: justify;\">La commande cor permet d&rsquo;obtenir les matrices de corr\u00e9lation (Pearson &amp; Spearman) pour les diverses combinaisons:<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; cor(debits_loire[,-1])\r\n               Nevers Saint-Satur   Orleans     Blois     Tours\r\nNevers      1.0000000   0.9581882 0.8398289 0.7659756 0.7212110\r\nSaint-Satur 0.9581882   1.0000000 0.9431337 0.8879669 0.8468633\r\nOrleans     0.8398289   0.9431337 1.0000000 0.9852670 0.9628878\r\nBlois       0.7659756   0.8879669 0.9852670 1.0000000 0.9890436\r\nTours       0.7212110   0.8468633 0.9628878 0.9890436 1.0000000\r\n&gt; \r\n&gt; cor(debits_loire[,-1], method=&quot;spearman&quot;)\r\n               Nevers Saint-Satur   Orleans     Blois     Tours\r\nNevers      1.0000000   0.9167092 0.8043585 0.7286661 0.6889871\r\nSaint-Satur 0.9167092   1.0000000 0.9506001 0.9021153 0.8676888\r\nOrleans     0.8043585   0.9506001 1.0000000 0.9845628 0.9441848\r\nBlois       0.7286661   0.9021153 0.9845628 1.0000000 0.9752079\r\nTours       0.6889871   0.8676888 0.9441848 0.9752079 1.0000000\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\">On peut repr\u00e9senter sur un m\u00eame graphique l&rsquo;\u00e9volution temporelle des d\u00e9bits pour ces 3 villes:<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; plot(debits_loire$Tours,type=&quot;l&quot;,col=&quot;red&quot;, ylim=c(0,1500),yaxs=&quot;i&quot;,xlab=&quot;# Mesure&quot;,ylab=&quot;D\u00e9bit m3\/s&quot;)\r\n&gt; lines(debits_loire$Nevers,col=&quot;green&quot;)\r\n&gt; lines(debits_loire$Orleans,col=&quot;blue&quot;)\r\n&gt; title(&quot;Evolution compar\u00e9e du d\u00e9bit&quot;)\r\n&gt; legend(950,1400, c(&quot;Tours&quot;,&quot;Orleans&quot;,&quot;Nevers&quot;), lty=c(1,1,1), lwd=c(2.5,2.5,2.5),col=c(&quot;red&quot;,&quot;blue&quot;,&quot;green&quot;))\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/evol_debit_lag.png\" rel=\"attachment wp-att-569\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-569\" src=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/evol_debit_lag.png\" alt=\"evol_debit_lag\" width=\"600\" height=\"503\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">La commande pairs permet d&rsquo;obtenir une matrice de scatterplots. Ici, on s&rsquo;int\u00e9resse aux \u00e9chantillons issus de Nevers, Orl\u00e9ans &amp; Tours:<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; pairs(debits_loire[,c(&quot;Nevers&quot;,&quot;Orleans&quot;,&quot;Tours&quot;)])\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/pairs_lag.png\" rel=\"attachment wp-att-571\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-571\" src=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/pairs_lag.png\" alt=\"pairs_lag\" width=\"600\" height=\"502\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">On remarque la forme singuli\u00e8re &#8211; en \u00ab\u00a0lasso\u00a0\u00bb &#8211; des graphes. Celle-ci s&rsquo;explique par la latence des \u00e9pisodes de crues\/d\u00e9crues entre les points de mesure.<\/p>\n<p style=\"text-align: justify;\">La recherche de cette latence peut \u00eatre r\u00e9alis\u00e9e \u00e0 l&rsquo;aide de la fonction ccf pour <a href=\"https:\/\/en.wikipedia.org\/wiki\/Cross-correlation\" target=\"_blank\" rel=\"noopener\">cross correlation function<\/a>.<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; nevers_tours &lt;- ccf(debits_loire$Tours,debits_loire$Nevers, lag.max=120)\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/ccf_tours.png\" rel=\"attachment wp-att-568\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-568\" src=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/ccf_tours.png\" alt=\"ccf_tours\" width=\"600\" height=\"503\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">Le sommet de la courbe correspond au d\u00e9calage pour lequel le coefficient de corr\u00e9lation est maximal. On peut retrouver directement sa valeur num\u00e9rique (54 heures pour Tours) via les attributs du r\u00e9sultat de ccf:<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; nevers_tours$lag[which.max(nevers_tours$acf)]\r\n[1] 54\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\">M\u00eame chose pour Orl\u00e9ans:<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; nevers_orleans &lt;- ccf(debits_loire$Orleans,debits_loire$Nevers, lag.max=120)\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/ccf_orleans.png\" rel=\"attachment wp-att-567\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-567\" src=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/ccf_orleans.png\" alt=\"ccf_orleans\" width=\"600\" height=\"503\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">On trouve cette fois un d\u00e9calage de 35 heures:<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; nevers_orleans$lag[which.max(nevers_orleans$acf)]\r\n[1] 35\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\">Une fois d\u00e9termin\u00e9 la latence de propagation des ph\u00e9nom\u00e8nes de crues\/d\u00e9crues &#8211; respectivement 35 et 54 heures pour Orleans et Tours &#8211; on peut de nouveau repr\u00e9senter l&rsquo;\u00e9volution des d\u00e9bits apr\u00e8s avoir d\u00e9cal\u00e9 les \u00e9chantillons.<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; plot(debits_loire$Tours[seq(54,length(debits_loire$Tours))],type=&quot;l&quot;,col=&quot;red&quot;, ylim=c(0,1500),yaxs=&quot;i&quot;,xlab=&quot;# Mesure&quot;,ylab=&quot;D\u00e9bit m3\/s&quot;)\r\n&gt; lines(debits_loire$Nevers,col=&quot;green&quot;)\r\n&gt; lines(debits_loire$Orleans[seq(35,length(debits_loire$Orleans))],col=&quot;blue&quot;)\r\n&gt; title(&quot;Evolution compar\u00e9e du d\u00e9bit (apr\u00e8s d\u00e9calage temporel)&quot;)\r\n&gt; legend(950,1400, c(&quot;Tours&quot;,&quot;Orleans&quot;,&quot;Nevers&quot;), lty=c(1,1,1), lwd=c(2.5,2.5,2.5),col=c(&quot;red&quot;,&quot;blue&quot;,&quot;green&quot;))\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/evol_debit_nolag.png\" rel=\"attachment wp-att-570\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-570\" src=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/evol_debit_nolag.png\" alt=\"evol_debit_nolag\" width=\"600\" height=\"503\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">On constate une parfaite correspondance des profils.<\/p>\n<p style=\"text-align: justify;\">On peut aussi produire une nouvelle matrice de scatterplots en tenant compte de ces d\u00e9calages:<\/p>\n<pre class=\"brush: sql; ruler: true;\">&gt; d\u00e9bits_d\u00e9cal\u00e9s &lt;- data.frame(debits_loire$Tours[seq(54,length(debits_loire$Tours))])\r\n&gt; names(d\u00e9bits_d\u00e9cal\u00e9s) &lt;- &quot;Tours&quot;\r\n&gt; d\u00e9bits_d\u00e9cal\u00e9s$Orleans &lt;- debits_loire$Orleans[seq(35,length(debits_loire$Orleans)-(54-35))]\r\n&gt; d\u00e9bits_d\u00e9cal\u00e9s$Nevers &lt;- debits_loire$Nevers[seq(1,length(debits_loire$Nevers)-53)]\r\n&gt; pairs(d\u00e9bits_d\u00e9cal\u00e9s)\r\n&gt; \r\n<\/pre>\n<p style=\"text-align: justify;\"><a href=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/pairs_nolag.png\" rel=\"attachment wp-att-572\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-572\" src=\"https:\/\/blog.tiran.stream\/wp-content\/uploads\/2016\/04\/pairs_nolag.png\" alt=\"pairs_nolag\" width=\"600\" height=\"502\" \/><\/a><\/p>\n<p style=\"text-align: justify;\">La relation entre les d\u00e9bits est d\u00e9sormais quasi-lin\u00e9aire.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dans la continuit\u00e9 du pr\u00e9c\u00e9dent billet, je r\u00e9alise cette fois-ci l&rsquo;analyse de corr\u00e9lation \u00e0 l&rsquo;aide de R. Les donn\u00e9es sont<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[12,14],"tags":[],"class_list":["post-565","post","type-post","status-publish","format-standard","hentry","category-r","category-statistique"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/565","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=565"}],"version-history":[{"count":2,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/565\/revisions"}],"predecessor-version":[{"id":1162,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/565\/revisions\/1162"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=565"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=565"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=565"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}