{"id":1258,"date":"2018-07-27T15:20:33","date_gmt":"2018-07-27T14:20:33","guid":{"rendered":"http:\/\/130.61.50.57\/?p=1258"},"modified":"2020-11-23T22:44:05","modified_gmt":"2020-11-23T21:44:05","slug":"classification-textuelle-avec-oracle-text-2","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=1258","title":{"rendered":"Classification Textuelle avec Oracle Text #2"},"content":{"rendered":"<p>Dans la continuit\u00e9 de l&rsquo;<a href=\"https:\/\/blog.tiran.stream\/classification-textuelle-avec-oracle-text-1\/\">article pr\u00e9c\u00e9dent<\/a>, la classification est r\u00e9alis\u00e9e cette fois-ci \u00e0 l&rsquo;aide d&rsquo;un arbre de d\u00e9cision &#8211; <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/12.2\/ccref\/oracle-text-indexing-elements.html#GUID-AFF1EFBE-CC4F-4394-9353-35542C52D6B3\" target=\"_blank\" rel=\"noopener noreferrer\">RULE_CLASSIFIER<\/a><\/p>\n<p>Je ne reprends pas les premi\u00e8res \u00e9tapes du processus &#8211; elles sont identiques \u00e0 la classification SVM.<\/p>\n<p>Une table est cr\u00e9\u00e9e pour recevoir les r\u00e8gles produites par le classifieur:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\r\nSQL&gt; CREATE TABLE rules\r\n  2  (\r\n  3      rule_cat_id NUMBER,\r\n  4      rule_text VARCHAR2 (4000),\r\n  5      rule_confidence NUMBER\r\n  6  );\r\n\r\nTable created.\r\n\r\nSQL&gt; EXEC ctx_ddl.drop_preference(&#039;WINE_CLASSIFIER&#039;);\r\n\r\nPL\/SQL procedure successfully completed.\r\n\r\nSQL&gt; EXEC ctx_ddl.create_preference(&#039;WINE_CLASSIFIER&#039;,&#039;RULE_CLASSIFIER&#039;);\r\n\r\nPL\/SQL procedure successfully completed.\r\n\r\nSQL&gt; BEGIN\r\n  2      ctx_cls.train (index_name   =&gt; &#039;train_set_wines_ctxidx&#039;,\r\n  3                     docid        =&gt; &#039;wine#&#039;,\r\n  4                     cattab       =&gt; &#039;train_set_variety&#039;,\r\n  5                     catdocid     =&gt; &#039;wine#&#039;,\r\n  6                     catid        =&gt; &#039;cat#&#039;,\r\n  7                     restab       =&gt; &#039;rules&#039;,\r\n  8                     rescatid     =&gt; &#039;rule_cat_id&#039;,\r\n  9                     resquery     =&gt; &#039;rule_text&#039;,\r\n 10                     resconfid    =&gt; &#039;rule_confidence&#039;,\r\n 11                     pref_name    =&gt; &#039;wine_classifier&#039;);\r\n 12  END;\r\n 13  \/\r\n\r\nPL\/SQL procedure successfully completed.\r\n\r\nSQL&gt; \r\n<\/pre>\n<p>Un index de routage (CTXRULE) est alors construit \u00e0 partir des r\u00e8gles stock\u00e9es dans la table RULES:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\r\nSQL&gt; CREATE INDEX rules_idx\r\n  2      ON rules (rule_text)\r\n  3      INDEXTYPE IS ctxsys.ctxrule;\r\n\r\nIndex created.\r\n\r\nSQL&gt; \r\n<\/pre>\n<p>L&rsquo;op\u00e9rateur MATCHES permet alors de d\u00e9tecter la cat\u00e9gorie pr\u00e9dite en fonction des r\u00e8gles inf\u00e9r\u00e9es pr\u00e9c\u00e9demment. Le vin #6 est associ\u00e9 \u00e0 la cat\u00e9gorie 4:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\r\nSQL&gt; SELECT b.wine#, a.rule_cat_id pred_cat#\r\n  2    FROM rules a, test_set_wines b\r\n  3   WHERE matches (rule_text, description, 1) &gt; 0 AND b.wine# = 6;\r\n\r\n     WINE#  PRED_CAT#\r\n---------- ----------\r\n         6          4\r\n\r\nSQL&gt;\r\n<\/pre>\n<p>On r\u00e9alise le scoring pour l&rsquo;ensemble des \u00e9chantillons de test:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\r\nSQL&gt; CREATE TABLE res\r\n  2  AS\r\n  3      SELECT *\r\n  4        FROM (SELECT wine#,\r\n  5                     pred_cat#,\r\n  6                     cat#,\r\n  7                     ROW_NUMBER () OVER (PARTITION BY wine# ORDER BY scr DESC)\r\n  8                         rn\r\n  9                FROM (SELECT b.wine#,\r\n 10                             a.rule_cat_id  pred_cat#,\r\n 11                             b.cat#,\r\n 12                             match_score (1) scr\r\n 13                        FROM rules a, test_set_wines b\r\n 14                       WHERE matches (rule_text, description, 1) &gt; 0))\r\n 15       WHERE rn = 1;\r\n\r\nTable created.\r\n\r\nSQL&gt; WITH\r\n  2      badpred\r\n  3      AS\r\n  4          (SELECT COUNT (*) c1\r\n  5             FROM res\r\n  6            WHERE pred_cat# != cat#),\r\n  7      totpop AS (SELECT COUNT (*) c2 FROM res)\r\n  8  SELECT ROUND (100 * (c2 - c1) \/ c2, 2) pct_good\r\n  9    FROM badpred, totpop;\r\n\r\n  PCT_GOOD\r\n----------\r\n      86.2\r\n\r\nSQL&gt;\r\n<\/pre>\n<p>Le taux de pr\u00e9dictions correctes est de <strong>86%<\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dans la continuit\u00e9 de l&rsquo;article pr\u00e9c\u00e9dent, la classification est r\u00e9alis\u00e9e cette fois-ci \u00e0 l&rsquo;aide d&rsquo;un arbre de d\u00e9cision &#8211; RULE_CLASSIFIER<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[3,6,19],"tags":[],"class_list":["post-1258","post","type-post","status-publish","format-standard","hentry","category-classification","category-oracle","category-donnees-non-structurees"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1258"}],"version-history":[{"count":5,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1258\/revisions"}],"predecessor-version":[{"id":1264,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1258\/revisions\/1264"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}