ACL2003 WS on Patent Corpus Processing Patent Claim Processing for Readability - Structure Analysis and Term Explanation July 12, 2003 Akihiro Shinmori†, Manabu Okumura‡, Yuzo Marukawa ‡, Makoto Iwayama* † Tokyo Institute of Technology & INTEC Web and Genome Informatics ‡ Japan Science and Technology & National Institute of Informatics * Tokyo Institute of Technology & Hitachi Problem & Approach Problem=Improve patent claim readability Structural difficulty Term difficulty Approach Analyze the structure and present it visually Apply RST and utilize tools for RST Cue-phrase-based approach Give explanation for terms Utilize the “detailed explanation” part of the specification . 2 Structure of Patent Document Patent Specification Invention Title Claim Detailed Explanation Brief Explanation of Drawings Drawings Summary “The claims specify the boundaries of the legal monopoly created by the patent.” (Burgunder 1995) . 3 Sample Japanese Patent Claim 操作手段によりアクチュエータを駆動して所望の作業を行な う作業機において、前記作業機の作業機構に作用する負荷 を検出する負荷検出手段と、この負荷検出手段の検出値に 応じた周波数の信号を出力する第1 の周波数変換器と、当 該負荷検出手段の検出値に応じた周波数のパルスを出力す る第2 の周波数変換器と、前記第1 の周波数変換器から出 力される信号を前記第2 の周波数変換器からのパルスの出 力期間だけ間欠的に出力する変調手段と、この変調手段の 出力信号に応じて振動を発生する振動発生手段とを設けた ことを特徴とする作業機の操作用仮想振動生成装置。 (Publication Number=10-011111, a patent on virtual oscillation generator for construction) One sentence (noun phrase) with 259 characters!! . 4 Characteristics of Patent Claim Description 1. The length of sentence is long. The average is 242 chars. (cf. 55.4 chars for newspaper articles) 2. The structure is complex. Even native speakers cannot understand them for the first reading! 3. Difficult terms are often used. Abstract terms are preferred. 4. Description styles are established. Patent specifications are usually written by professionals (such as patent attorneys and IP specialists) . 5 Description Styles of Japanese Patent Claims [Kasai 1999] Process Sequence Style “・・・し[shi](does)、・・・し[shi](does)、・・・した [shita](and does)、・・・” Element Enumeration Style “・・・と[to](and)、・・・と[to](and)、・・・とからなる[to karanaru](comprising)・・・” Jepson-like Style “・・・において[ni-oite](in)、・・・を特徴とする[wotokuchou tosuru](be characterized by)、・・・” First describe the known or precondition part, and next describe the new or main part. . 6 Structure Analysis of Patent Claims Our Position: To improve the readability of Japanese Patent claim, the structure of description needs to be presented in a readable way Japanese Patent Claims are: Composed of multiple clauses which have some relationship with each other There exist cue phrases around clause boundaries Apply RST (Rhetorical Structure Theory). Use Cue-phrase-based Approach. . 7 Result of Structure Analysis of Japanese Patent Claim Graphical view by. RSTTool [Odonnel 1997] 8 Relations for Patent Claim Type Relation MultiPROCEDURE Nuclear COMPONENT Mono- ELABORATION Nuclear FEATURE PRECONDITION COMPOSE Description [~し、][~し、][~する]XXX (XXX which does ~, and does ~, and does ~) [~と、][~と、][~と]を備えたXXX (~, ~, and ~) [XXXした][YYY] (YYY which does XXX) [YYY][を特徴とする] (characterized by YYY) [XXXであって、][YYY] (In XXX, YYY) [~と、~と、~と][を備えた] (comprising ~, ~, and ~) Collection of Cue Phrases 1. From description pattern analysis に(お|於)いて(in), であって(in), ... を特徴とした(be characterized by) 2. From the description patterns of the claims which contain explicitly-inserted newlines . 10 Example of claims in which newlines are explicitly inserted 原稿が載置される原稿台と、<NL> この原稿台に対して主走査方向に移動する走査光 学手段と、<NL> この走査光学手段上に配置され原稿を副走査方向 に照明する照明手段と、を備えた画像読取装置に おいて、<NL> 前記照明手段は、前記走査光学手段に対して走査 移動平面に略平行に回動自在に取付けられること を特徴とする画像読取装置。 (Publication Number=8-182670, An image reading device) . 11 Description pattern just before the newlines of newline-inserted claims No Pattern Ratio Cumulative Ratio 1 (Noun|Symbol)と(、|,) [Note: “と” is a postpositional particle and means “and”.] 46.1% 46.1% 3 (Verb-Renyoukei|Adverb-Renyoukei) (、|,) 17.5% 63.6% 2 (Noun|Symbol)において (、|,) [Note: “において” plays a role of postpositional particle and means “in”.] 16.4% 80.0% 7.2% 87.2% 4 (Noun|Symbol)であって(、|,) [Note: “であって” plays a role of postpositional particle means “in”.] . 12 Cue phrases which can be used to analyze patent claims Token Name Cue Phrase Gloss JEPSON_CUE に(お|於)いて(、|,) であって(、|,) に(当|あ)(た)?り(、|,) in FEATURE_CUE を特徴と(した|する) (、|,) characterized by COMPOSE_CUE を搭載して構成され(た|る|ている)(、|,)? comprising を(、|,)?(具|備|そな)え(た|る|ている)(、|,)? を(、|,)?具備(する|した|している|してなる)(、|,)? (で|から)構成され(た|る|ている)(、|,)? を(、|,)?有(する|した|している)(、|,)? を(、|,)?包含(する|した|している)(、|,)? を(、|,)?含(む|んだ|んでいる)(、|,)? から(、|,)?(なる|なった|なっている)(、|,)? から(、|,)?(成る|成った|成っている)(、|,)? を(、|,)?設け(た|ている)(、|,)? を(、|,)?装備(する|した|している)(、|,)? Cue phrases which can be used to analyze patent claims Token Name Cue Phrase Gloss NOUN, POSTP_TO, PUNCT_TOUTEN Sequence of “ (Noun|Symbol)と(、|,)” and VERB_RENYOU, PUNCT_TOUTEN VERB_KIHON Sequence of “(Verb-Renyoukei|AdverbRenyoukei) (、|,)”, before “(VerbKihonkei|Adverb-Kihonkei)+(Noun|Symbol)” does Algorithm 1. Morphological Analysis Using Chasen(with –j option, specifying the sentence delimiter as “。:;”) 2. Lexical Analysis Context-dependent output token and string value Judge whether Jepson-like style or not Judge whether process sequence style or element enumeration style . 15 Algorithm (cont.) 3. Syntax Analysis (= Structure Analysis) Parser generated from a context-free grammar (CFG) Using BISON-compatible parser-generator CFG: 57 rules, 11 terminals, 19non-terminals Actions Build-up RS-Tree Newline insertion and indentation Paraphrase . 16 Evaluation Data for Structure Analysis 59,956 claims (in 1999) extracted from “NTCIR3 patent data collection” Analysis was done by using “Sample data” (59,968 claims in 1998) The IPC (International Patent Classification) code distribution was almost the same as the total data in 1999 published by Japan Patent Office. . 17 Evaluation and Result Accept Ratio Ratio of the claims accepted by the CFG grammar 99.77% Processing Speed 0.30 sec/claim (on Linux PC with Pentium 1GHz and 512MB Memory) . 18 Accuracy Evaluation Indirect Evaluation Newline-insertion by using the result of RS analysis Baseline: Mechanically insert newlines at the end of every sequence of “(NOUN|SYMBOL)(、|,)” and “(VerbRenyoukei|Adverb-Renyoukei) (、|,)”. Direct Evaluation Evaluation of result of randomly selected 100 claims . 19 Accuracy Evaluation Result Indirect Evaluation Baseline Newline Insertion utilizing Structure analysis Upper Limit Recall(R) 0.478 0.674 0.873 Precision(P ) F-measure 0.374 0.663 - 0.420 0.669 - . 20 Accuracy Evaluation Result Direct Evaluation Category Count Percentage (Excluding “No Judgment”) Correct 76 80.85% Partially Correct 11 11.70% Incorrect 7 7.45% No Judgment 6 - . 21 Term Explanation Difficult terms used in patent claims: Terms specific to the invention Terms specific to the domain Approach Use the result of structure analysis Give explanation for terms by utilizing the “detailed explanation” part Because, what is claimed must be explained in detail in the “detailed explanation” part. . 22 Structure of Patent Document Patent Specification Invention Title Claim Detailed Explanation Technical field Prior art Problem to be resolved by the invention Means of solving the problems Embodiments of the invention Effects of the invention . 23 Preliminary Survey For the Jepson-like claims, the words used in the first part (the known or precondition part) appear more often in the technical field and the prior art than the words used in the last part. 76.3% (cf. 55.5% for the words in the last part) “Terms specific to the domain” are often explained in the prior art by using the following cue phrases. so-called, or, () . 24 Words usage in Jepson-like claims Patent Specification Invention Title Claim (Jepson-like type) First part (known things or the precondition) Last part (new things or the body) Detailed Explanation 55.5% 76.3% Technical field Prior art Problem to be resolved by the invention Means of solving the problems Embodiments of the invention Effects of the invention . 25 “Terms specific to the domain” that can be extracted from “prior art” For the 132 patent specifications in the field of ink-jet printer: 29 terms can be extracted by the cue phrase “いわゆる” (so-called”) from the “prior art” part. 9 of 27 terms are used in the claim description. For 3 terms, useful explanation can be extracted from the “prior art” part. . 26 Conclusion NLP technologies can contribute toward improving the readability. Structure can be analyzed by cue-phrase-based approach and CFG-based parsing. Explanations for some terms can be given by utilizing the expression in the detailed explanation. This can be a step toward more challenging task of automatic “patent map” generation. . 27
© Copyright 2024 ExpyDoc