{"id":1258,"date":"2018-07-27T15:20:33","date_gmt":"2018-07-27T14:20:33","guid":{"rendered":"http:\/\/130.61.50.57\/?p=1258"},"modified":"2026-04-09T20:59:05","modified_gmt":"2026-04-09T19:59:05","slug":"classification-textuelle-avec-oracle-text-2","status":"publish","type":"post","link":"https:\/\/blog.tiran.stream\/?p=1258","title":{"rendered":"Classification Textuelle avec Oracle Text #2"},"content":{"rendered":"<p>Dans la continuit\u00e9 de l&rsquo;<a href=\"https:\/\/blog.tiran.stream\/?p=1254\">article pr\u00e9c\u00e9dent<\/a>, la classification est r\u00e9alis\u00e9e cette fois-ci \u00e0 l&rsquo;aide d&rsquo;un arbre de d\u00e9cision &#8211; <a href=\"https:\/\/docs.oracle.com\/en\/database\/oracle\/oracle-database\/12.2\/ccref\/oracle-text-indexing-elements.html#GUID-AFF1EFBE-CC4F-4394-9353-35542C52D6B3\" target=\"_blank\" rel=\"noopener noreferrer\">RULE_CLASSIFIER<\/a><\/p>\n<p>Je ne reprends pas les premi\u00e8res \u00e9tapes du processus &#8211; elles sont identiques \u00e0 la classification SVM.<\/p>\n<p>Une table est cr\u00e9\u00e9e pour recevoir les r\u00e8gles produites par le classifieur:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\nSQL&gt; CREATE TABLE rules\n  2  (\n  3      rule_cat_id NUMBER,\n  4      rule_text VARCHAR2 (4000),\n  5      rule_confidence NUMBER\n  6  );\n\nTable created.\n\nSQL&gt; EXEC ctx_ddl.drop_preference('WINE_CLASSIFIER');\n\nPL\/SQL procedure successfully completed.\n\nSQL&gt; EXEC ctx_ddl.create_preference('WINE_CLASSIFIER','RULE_CLASSIFIER');\n\nPL\/SQL procedure successfully completed.\n\nSQL&gt; BEGIN\n  2      ctx_cls.train (index_name   =&gt; 'train_set_wines_ctxidx',\n  3                     docid        =&gt; 'wine#',\n  4                     cattab       =&gt; 'train_set_variety',\n  5                     catdocid     =&gt; 'wine#',\n  6                     catid        =&gt; 'cat#',\n  7                     restab       =&gt; 'rules',\n  8                     rescatid     =&gt; 'rule_cat_id',\n  9                     resquery     =&gt; 'rule_text',\n 10                     resconfid    =&gt; 'rule_confidence',\n 11                     pref_name    =&gt; 'wine_classifier');\n 12  END;\n 13  \/\n\nPL\/SQL procedure successfully completed.\n\nSQL&gt; \n<\/pre>\n<p>Un index de routage (CTXRULE) est alors construit \u00e0 partir des r\u00e8gles stock\u00e9es dans la table RULES:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\nSQL&gt; CREATE INDEX rules_idx\n  2      ON rules (rule_text)\n  3      INDEXTYPE IS ctxsys.ctxrule;\n\nIndex created.\n\nSQL&gt; \n<\/pre>\n<p>L&rsquo;op\u00e9rateur MATCHES permet alors de d\u00e9tecter la cat\u00e9gorie pr\u00e9dite en fonction des r\u00e8gles inf\u00e9r\u00e9es pr\u00e9c\u00e9demment. Le vin #6 est associ\u00e9 \u00e0 la cat\u00e9gorie 4:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\nSQL&gt; SELECT b.wine#, a.rule_cat_id pred_cat#\n  2    FROM rules a, test_set_wines b\n  3   WHERE matches (rule_text, description, 1) &gt; 0 AND b.wine# = 6;\n\n     WINE#  PRED_CAT#\n---------- ----------\n         6          4\n\nSQL&gt;\n<\/pre>\n<p>On r\u00e9alise le scoring pour l&rsquo;ensemble des \u00e9chantillons de test:<\/p>\n<pre class=\"brush: sql; ruler: true;\">\u00a0\nSQL&gt; CREATE TABLE res\n  2  AS\n  3      SELECT *\n  4        FROM (SELECT wine#,\n  5                     pred_cat#,\n  6                     cat#,\n  7                     ROW_NUMBER () OVER (PARTITION BY wine# ORDER BY scr DESC)\n  8                         rn\n  9                FROM (SELECT b.wine#,\n 10                             a.rule_cat_id  pred_cat#,\n 11                             b.cat#,\n 12                             match_score (1) scr\n 13                        FROM rules a, test_set_wines b\n 14                       WHERE matches (rule_text, description, 1) &gt; 0))\n 15       WHERE rn = 1;\n\nTable created.\n\nSQL&gt; WITH\n  2      badpred\n  3      AS\n  4          (SELECT COUNT (*) c1\n  5             FROM res\n  6            WHERE pred_cat# != cat#),\n  7      totpop AS (SELECT COUNT (*) c2 FROM res)\n  8  SELECT ROUND (100 * (c2 - c1) \/ c2, 2) pct_good\n  9    FROM badpred, totpop;\n\n  PCT_GOOD\n----------\n      86.2\n\nSQL&gt;\n<\/pre>\n<p>Le taux de pr\u00e9dictions correctes est de <strong>86%<\/strong>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Dans la continuit\u00e9 de l&rsquo;article pr\u00e9c\u00e9dent, la classification est r\u00e9alis\u00e9e cette fois-ci \u00e0 l&rsquo;aide d&rsquo;un arbre de d\u00e9cision &#8211; RULE_CLASSIFIER<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"colormag_page_container_layout":"default_layout","colormag_page_sidebar_layout":"default_layout","footnotes":""},"categories":[3,6,19],"tags":[],"class_list":["post-1258","post","type-post","status-publish","format-standard","hentry","category-classification","category-oracle","category-donnees-non-structurees"],"_links":{"self":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1258","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1258"}],"version-history":[{"count":7,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1258\/revisions"}],"predecessor-version":[{"id":1277,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=\/wp\/v2\/posts\/1258\/revisions\/1277"}],"wp:attachment":[{"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1258"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1258"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.tiran.stream\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1258"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}