Document

ACL2003 WS on Patent Corpus Processing
Patent Claim Processing for
Readability
- Structure Analysis and Term Explanation July 12, 2003
Akihiro Shinmori†, Manabu Okumura‡,
Yuzo Marukawa ‡, Makoto Iwayama*
† Tokyo Institute of Technology & INTEC Web and Genome Informatics
‡ Japan Science and Technology & National Institute of Informatics
* Tokyo Institute of Technology & Hitachi
Problem & Approach
Problem=Improve patent claim readability


Structural difficulty
Term difficulty
Approach

Analyze the structure and present it visually
 Apply RST and utilize tools for RST
 Cue-phrase-based approach

Give explanation for terms
 Utilize the “detailed explanation” part of the specification
.
2
Structure of Patent Document
Patent Specification




Invention Title
Claim
Detailed Explanation
Brief Explanation of Drawings
Drawings
Summary
“The claims specify the boundaries of the legal
monopoly created by the patent.” (Burgunder 1995)
.
3
Sample Japanese Patent Claim
操作手段によりアクチュエータを駆動して所望の作業を行な
う作業機において、前記作業機の作業機構に作用する負荷
を検出する負荷検出手段と、この負荷検出手段の検出値に
応じた周波数の信号を出力する第1 の周波数変換器と、当
該負荷検出手段の検出値に応じた周波数のパルスを出力す
る第2 の周波数変換器と、前記第1 の周波数変換器から出
力される信号を前記第2 の周波数変換器からのパルスの出
力期間だけ間欠的に出力する変調手段と、この変調手段の
出力信号に応じて振動を発生する振動発生手段とを設けた
ことを特徴とする作業機の操作用仮想振動生成装置。
(Publication Number=10-011111, a patent on virtual
oscillation generator for construction)
One sentence (noun phrase) with 259 characters!!
.
4
Characteristics of Patent Claim
Description
1. The length of sentence is long.
The average is 242 chars.
(cf. 55.4 chars for newspaper articles)
2. The structure is complex.
Even native speakers cannot understand them
for the first reading!
3. Difficult terms are often used.

Abstract terms are preferred.
4. Description styles are established.

Patent specifications are usually written by
professionals (such as patent attorneys and IP
specialists)
.
5
Description Styles of Japanese
Patent Claims [Kasai 1999]
Process Sequence Style

“・・・し[shi](does)、・・・し[shi](does)、・・・した
[shita](and does)、・・・”
Element Enumeration Style

“・・・と[to](and)、・・・と[to](and)、・・・とからなる[to
karanaru](comprising)・・・”
Jepson-like Style

“・・・において[ni-oite](in)、・・・を特徴とする[wotokuchou tosuru](be characterized by)、・・・”
 First describe the known or precondition part, and next
describe the new or main part.
.
6
Structure Analysis of Patent Claims
Our Position:
 To improve the readability of Japanese Patent claim,
the structure of description needs to be presented
in a readable way
Japanese Patent Claims are:
 Composed of multiple clauses which have some
relationship with each other
 There exist cue phrases around clause boundaries
 Apply RST (Rhetorical Structure Theory).
 Use Cue-phrase-based Approach.
.
7
Result of Structure Analysis of
Japanese Patent Claim
Graphical view by. RSTTool [Odonnel 1997]
8
Relations for Patent Claim
Type
Relation
MultiPROCEDURE
Nuclear
COMPONENT
Mono- ELABORATION
Nuclear
FEATURE
PRECONDITION
COMPOSE
Description
[~し、][~し、][~する]XXX
(XXX which does ~, and does ~, and does ~)
[~と、][~と、][~と]を備えたXXX
(~, ~, and ~)
[XXXした][YYY]
(YYY which does XXX)
[YYY][を特徴とする]
(characterized by YYY)
[XXXであって、][YYY]
(In XXX, YYY)
[~と、~と、~と][を備えた]
(comprising ~, ~, and ~)
Collection of Cue Phrases
1. From description pattern analysis
に(お|於)いて(in), であって(in), ...
を特徴とした(be characterized by)
2. From the description patterns of the
claims which contain explicitly-inserted
newlines
.
10
Example of claims in which
newlines are explicitly inserted
原稿が載置される原稿台と、<NL>
この原稿台に対して主走査方向に移動する走査光
学手段と、<NL>
この走査光学手段上に配置され原稿を副走査方向
に照明する照明手段と、を備えた画像読取装置に
おいて、<NL>
前記照明手段は、前記走査光学手段に対して走査
移動平面に略平行に回動自在に取付けられること
を特徴とする画像読取装置。
(Publication Number=8-182670, An image
reading device)
.
11
Description pattern just before the
newlines of newline-inserted claims
No
Pattern
Ratio
Cumulative
Ratio
1 (Noun|Symbol)と(、|,)
[Note: “と” is a postpositional particle and
means “and”.]
46.1%
46.1%
3 (Verb-Renyoukei|Adverb-Renyoukei) (、|,)
17.5%
63.6%
2 (Noun|Symbol)において (、|,)
[Note: “において” plays a role of
postpositional particle and means “in”.]
16.4%
80.0%
7.2%
87.2%
4 (Noun|Symbol)であって(、|,)
[Note: “であって” plays a role of
postpositional particle means “in”.]
.
12
Cue phrases which can be used to analyze patent
claims
Token Name
Cue Phrase
Gloss
JEPSON_CUE
に(お|於)いて(、|,)
であって(、|,)
に(当|あ)(た)?り(、|,)
in
FEATURE_CUE
を特徴と(した|する) (、|,)
characterized by
COMPOSE_CUE を搭載して構成され(た|る|ている)(、|,)?
comprising
を(、|,)?(具|備|そな)え(た|る|ている)(、|,)?
を(、|,)?具備(する|した|している|してなる)(、|,)?
(で|から)構成され(た|る|ている)(、|,)?
を(、|,)?有(する|した|している)(、|,)?
を(、|,)?包含(する|した|している)(、|,)?
を(、|,)?含(む|んだ|んでいる)(、|,)?
から(、|,)?(なる|なった|なっている)(、|,)?
から(、|,)?(成る|成った|成っている)(、|,)?
を(、|,)?設け(た|ている)(、|,)?
を(、|,)?装備(する|した|している)(、|,)?
Cue phrases which can be used to analyze
patent claims
Token Name
Cue Phrase
Gloss
NOUN,
POSTP_TO,
PUNCT_TOUTEN
Sequence of “ (Noun|Symbol)と(、|,)”
and
VERB_RENYOU,
PUNCT_TOUTEN
VERB_KIHON
Sequence of “(Verb-Renyoukei|AdverbRenyoukei) (、|,)”, before “(VerbKihonkei|Adverb-Kihonkei)+(Noun|Symbol)”
does
Algorithm
1. Morphological Analysis

Using Chasen(with –j option, specifying
the sentence delimiter as “。:;”)
2. Lexical Analysis

Context-dependent output token and
string value
 Judge whether Jepson-like style or not
 Judge whether process sequence style or
element enumeration style
.
15
Algorithm (cont.)
3. Syntax Analysis (= Structure Analysis)

Parser generated from a context-free
grammar (CFG)
Using BISON-compatible parser-generator
 CFG: 57 rules, 11 terminals, 19non-terminals


Actions
Build-up RS-Tree
 Newline insertion and indentation
 Paraphrase

.
16
Evaluation Data for Structure Analysis
59,956 claims (in 1999) extracted from
“NTCIR3 patent data collection”


Analysis was done by using “Sample data”
(59,968 claims in 1998)
The IPC (International Patent
Classification) code distribution was almost
the same as the total data in 1999
published by Japan Patent Office.
.
17
Evaluation and Result
Accept Ratio


Ratio of the claims accepted by the CFG
grammar
99.77%
Processing Speed

0.30 sec/claim (on Linux PC with Pentium
1GHz and 512MB Memory)
.
18
Accuracy Evaluation
Indirect Evaluation


Newline-insertion by using the result of RS
analysis
Baseline:
 Mechanically insert newlines at the end of every
sequence of “(NOUN|SYMBOL)(、|,)” and “(VerbRenyoukei|Adverb-Renyoukei) (、|,)”.
Direct Evaluation

Evaluation of result of randomly selected 100
claims
.
19
Accuracy Evaluation Result
Indirect Evaluation
Baseline
Newline Insertion
utilizing Structure
analysis
Upper
Limit
Recall(R)
0.478
0.674
0.873
Precision(P
)
F-measure
0.374
0.663
-
0.420
0.669
-
.
20
Accuracy Evaluation Result
Direct Evaluation
Category
Count
Percentage
(Excluding “No
Judgment”)
Correct
76
80.85%
Partially Correct
11
11.70%
Incorrect
7
7.45%
No Judgment
6
-
.
21
Term Explanation
Difficult terms used in patent claims:


Terms specific to the invention
Terms specific to the domain
Approach


Use the result of structure analysis
Give explanation for terms by utilizing the
“detailed explanation” part
 Because, what is claimed must be explained in
detail in the “detailed explanation” part.
.
22
Structure of Patent Document
Patent Specification



Invention Title
Claim
Detailed Explanation
 Technical field
 Prior art
 Problem to be resolved by the invention
 Means of solving the problems
 Embodiments of the invention
 Effects of the invention
.
23
Preliminary Survey
For the Jepson-like claims, the words used in
the first part (the known or precondition part)
appear more often in the technical field and
the prior art than the words used in the last
part.

76.3% (cf. 55.5% for the words in the last part)
“Terms specific to the domain” are often
explained in the prior art by using the
following cue phrases.

so-called, or, ()
.
24
Words usage in Jepson-like claims
Patent Specification


Invention Title
Claim (Jepson-like type)
 First part (known things or the precondition)
 Last part (new things or the body)

Detailed Explanation






55.5%
76.3%
Technical field
Prior art
Problem to be resolved by the invention
Means of solving the problems
Embodiments of the invention
Effects of the invention
.
25
“Terms specific to the domain” that
can be extracted from “prior art”
For the 132 patent specifications in the
field of ink-jet printer:



29 terms can be extracted by the cue
phrase “いわゆる” (so-called”) from the
“prior art” part.
9 of 27 terms are used in the claim
description.
For 3 terms, useful explanation can be
extracted from the “prior art” part.
.
26
Conclusion
NLP technologies can contribute toward
improving the readability.


Structure can be analyzed by cue-phrase-based
approach and CFG-based parsing.
Explanations for some terms can be given by
utilizing the expression in the detailed explanation.
This can be a step toward more challenging
task of automatic “patent map” generation.
.
27