{"id":152,"date":"2015-05-19T07:44:17","date_gmt":"2015-05-19T07:44:17","guid":{"rendered":"http:\/\/cms04397.apps-1and1.net\/?p=152"},"modified":"2020-11-25T11:56:18","modified_gmt":"2020-11-25T10:56:18","slug":"khi-2-dindependance-avec-r","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=152","title":{"rendered":"Khi-2 d&rsquo;ind\u00e9pendance avec R"},"content":{"rendered":"<p style=\"text-align: justify;\"><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">Dans le <a href=\"http:\/\/rtiran.cloudns.cl\/?p=143\">pr\u00e9c\u00e9dent billet<\/a>, j&rsquo;ai utilis\u00e9 la fonction Oracle STATS_CROSSTAB pour r\u00e9aliser un test du Khi-2.<\/span><br \/>\n<span style=\"font-family: tahoma, arial, helvetica, sans-serif;\"> Cette fonction ne supportant pas l&#8217;emploi d&rsquo;une table de contingence comme argument, il a fallu passer par une \u00e9tape (plut\u00f4t contre-intuitive!) de \u00ab\u00a0d\u00e9sagr\u00e9gation\u00a0\u00bb des donn\u00e9es.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">En utilisant R, l&rsquo;op\u00e9ration est beaucoup plus ais\u00e9e.<\/span><\/p>\n<p style=\"text-align: justify;\"><strong><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">1) Chargement des donn\u00e9es dans deux data-frames<\/span><\/strong><\/p>\n<pre>&gt; tgv &lt;- read.csv2(\"regularite-mensuelle-tgv.csv\")\n&gt; corail &lt;- read.csv2(\"regularite-mensuelle-intercites.csv\")\n&gt;\n&gt; dim(tgv)\n[1] 4100 10\n&gt; dim(corail)\n[1] 750 9\n&gt;  \n<\/pre>\n<p style=\"text-align: justify;\"><strong><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">2) Nettoyage des donn\u00e9es<\/span><\/strong><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">On constitue un second dataframe en ne conservant que les donn\u00e9es des ann\u00e9es 2014 &amp; 2015 relatives aux d\u00e9parts\/arriv\u00e9es de St Pierre des Corps. De plus, seuls les champs 6 et 8 sont r\u00e9cup\u00e9r\u00e9s (trains ayant circul\u00e9s et trains ayant subi un retard):<\/span><\/p>\n<pre>&gt; tgv.clean &lt;- tgv[which(substr(tgv$Date,1,4) %in% c(\"2014\",\"2015\") &amp; (tgv[,3]==\"ST PIERRE DES CORPS\" | tgv[,4]==\"ST PIERRE DES CORPS\")),c(6,8)]\n&gt;\n&gt; names(tgv.clean) &lt;- c(\"Trains.Circules\",\"Trains.Retard\")\n&gt;\n&gt; summary(tgv.clean)\n Trains.Circules Trains.Retard   \n Min.   :322.0   Min.   : 16.00  \n 1st Qu.:421.0   1st Qu.: 38.25  \n Median :434.5   Median : 46.50  \n Mean   :430.4   Mean   : 50.96  \n 3rd Qu.:448.8   3rd Qu.: 66.75  \n Max.   :476.0   Max.   :105.00  \n&gt;\n&gt; head(tgv.clean)\n     Trains.Circules Trains.Retard\n685              418            56\n1194             409            48\n2258             443            30\n2262             420            81\n2283             458            49\n2400             431            39\n&gt;  \n<\/pre>\n<p><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">On r\u00e9alise une op\u00e9ration similaire pour le second dataframe:<\/span><\/p>\n<pre>&gt; corail.clean &lt;- corail[which((substr(corail[,3],1,3)==\"ORL\" | substr(corail[,4],1,3)==\"ORL\")),c(6,8)]\n&gt;\n&gt; names(corail.clean) &lt;- c(\"Trains.Circules\",\"Trains.Retard\")\n&gt;\n&gt; summary(corail.clean)\n Trains.Circules Trains.Retard   \n Min.   :154.0   Min.   : 10.00  \n 1st Qu.:336.8   1st Qu.: 40.25  \n Median :354.0   Median : 58.00  \n Mean   :337.7   Mean   : 57.83  \n 3rd Qu.:365.5   3rd Qu.: 71.00  \n Max.   :379.0   Max.   :109.00  \n&gt; head(corail.clean)\n    Trains.Circules Trains.Retard\n79              356            91\n115             339            57\n130             364            59\n210             342            63\n238             154            10\n267             367            33\n&gt;  \n<\/pre>\n<p style=\"text-align: justify;\"><strong><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">3) Cr\u00e9ation d&rsquo;une table de contingence<\/span><\/strong><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">A l&rsquo;aide de l&rsquo;op\u00e9rateur apply appliqu\u00e9e en colonne sur les dataframes tgv.clean et corail.clean, on r\u00e9alise la somme des effectifs de trains ayant circul\u00e9s et en retard. On fusionne ces informations dans un seul dataframe. Enfin on ajoute une colonne \u00ab\u00a0Trains.Ponctuels\u00a0\u00bb dont les effectifs correspondent \u00e0 la soustraction ligne \u00e0 ligne Trains.Circules-Trains.Retard:<\/span><\/p>\n<pre>&gt; cont.tab &lt;- as.data.frame( rbind( apply(corail.clean[,],2,sum), apply(tgv.clean[,],2,sum) ) )\n&gt;\n&gt; row.names(cont.tab) &lt;- c(\"Corail - Orleans\/Paris\", \"TGV - SpDC\/Paris\")\n&gt;\n&gt; cont.tab$Trains.Ponctuels &lt;- cont.tab$Trains.Circules-cont.tab$Trains.Retard\n&gt;\n&gt; cont.tab\n                       Trains.Circules Trains.Retard Trains.Ponctuels\nCorail - Orleans\/Paris           10131          1735             8396\nTGV - SpDC\/Paris                 11190          1325             9865\n&gt;  \n<\/pre>\n<p><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">On supprime enfin la colonne Trains.Circules dans la mesure o\u00f9 elle ne servira pas dans le test du Khi-2 (les cat\u00e9gories doivent \u00eatre mutuellement exclusives):<\/span><\/p>\n<pre>&gt; cont.tab &lt;- cont.tab[,-1]\n&gt; cont.tab\n                       Trains.Retard Trains.Ponctuels\nCorail - Orleans\/Paris          1735             8396\nTGV - SpDC\/Paris                1325             9865\n&gt;  \n<\/pre>\n<p><strong><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">4) R\u00e9alisation du test<\/span><\/strong><\/p>\n<pre>&gt; chi2 &lt;- chisq.test(cont.tab, correct=FALSE)\n&gt; chi2\n\n Pearson's Chi-squared test\n\ndata: cont.tab\nX-squared = 120.8061, df = 1, p-value &lt; 2.2e-16\n\n&gt; chi2$expected\n                       Trains.Retard Trains.Ponctuels\nCorail - Orleans\/Paris      1454.006         8676.994\nTGV - SpDC\/Paris            1605.994         9584.006\n&gt;  \n<\/pre>\n<p style=\"text-align: justify;\"><span style=\"font-family: tahoma, arial, helvetica, sans-serif;\">La valeur p est l\u00e0 encore tr\u00e8s faible et on arrive \u00e0 la m\u00eame conclusion que dans le billet pr\u00e9c\u00e9dent! A noter cependant que si la valeur du Khi-2 est identique entre les deux m\u00e9thodes (120.8), la p-value retenue est diff\u00e9rente (mais elle reste proche de z\u00e9ro).<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dans le pr\u00e9c\u00e9dent billet, j&rsquo;ai utilis\u00e9 la fonction Oracle STATS_CROSSTAB pour r\u00e9aliser un test du Khi-2. Cette fonction ne supportant<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[12,14,16],"tags":[],"class_list":["post-152","post","type-post","status-publish","format-standard","hentry","category-r","category-statistique","category-tests-dhypotheses"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/152","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=152"}],"version-history":[{"count":1,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/152\/revisions"}],"predecessor-version":[{"id":1269,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/152\/revisions\/1269"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=152"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=152"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}