Market Analysis Using a Combination of Bayesian Networks and Description Logics Phillip M. Yelland © 1999 Sun Microsystems, Inc. All rights reserved. The SML Technical Report Series is published by Sun Microsystems Laboratories, of Sun Microsystems, Inc. Printed in U.S.A. Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of the copyright owner. TRADEMARKS Sun, Sun Microsystems, and the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. For information regarding the SML Technical Report Series, contact Jeanie Treichel, Editor-in-Chief <[email protected]>. Market Analysis Using a Combination of Bayesian Networks and Description Logics Phillip M. Yelland, Sun Microsystems Laboratories SMLI TR-99-78 August 1999 Abstract: The work described in this paper was inspired by a problem increasingly vexatious to many businesses confronting the ever-diminishing life cycles of modern products—viz., that of predicting characteristics (such as overall demand, segmentation, etc.) of markets facing new product introductions. A framework is proposed that allows the market parameters of new products to be derived by analogy with those of old ones. To do so, the framework combines the capabilities of Bayesian networks [14] and description logics [16]. The paper commences with an exposition of the problem that motivates the work. There follow brief descriptions of Bayesian networks and description logics in their unalloyed state, and a discussion of issues surrounding their combination. The combined system is presented, along with an account of its formal details and an inference procedure for entailment. A sample application of the framework is given. The paper concludes by comparing the proposed framework with related existing systems and by suggesting possible courses for future development. M/S MTV29-01 901 San Antonio Road Palo Alto, CA 94303-4900 email address: [email protected] home page URL: http://www.sunlabs.com/~yelland 1 Introduction The introduction of new products has been an enduring cause of difficulty for many companies. To design, produce, market, and distribute a new product most effectively, a company is required to predict characteristics of its market, such as the most propitious market segments, the ideal positioning of the product with respect to those segments, the demand in those segments over time, and so on. The problems arising from such requirements are particularly acute in modern high technology markets, where very short product life cycles are the norm. Unfortunately, by virtue of its novelty, a new product is usually a poor candidate for the sort of time-series forecasting methods applied to established products [17]. On the other hand, in many instances, the market parameters of a new product can be obtained by analogy. By distinguishing the features of a new product embodied (albeit in slightly different form) in established products, it is often possible to make some assessment of likely markets for the new one.1 The work described in this paper attempts to place such analogy-based forecasting methods on a formal footing. To do so, it combines two popular approaches to knowledge representation for expert systems and decision support: Bayesian networks (BN’s) and description logics (DL’s). In this context, BN’s and DL’s offer complementary capabilities. On the one hand, Bayesian networks allow the uncertainties implicit in the forecasting process to be expressed in a concise and intuitive manner. Description logics, on the other hand, furnish an effective means of describing and organizing the characteristics of products. Conversely, BN’s are lacking as regards the representation of complex entities (like products), and DL’s have very limited capabilities for modeling uncertainty. By combining both formalisms, the framework described here seeks to bring the strengths of both to bear, while ameliorating their weaknesses. 1.1 Overview of the Paper The paper begins with a fuller description of the problem that motivates the work. Brief descriptions are given of Bayesian networks and description logics separately, and issues surrounding their combination are explored. The combined system is presented, along with an account of its 1 An informal description of such procedures is also given in [17]. —1— formal characteristics and an inference procedure for entailment in the system. An example shows how the combined system may be applied to a sample problem of the kind envisaged at the outset. Related existing work is outlined, and compared with that described here. The paper concludes by suggesting possible courses for future development. 2 Background 2.1 Motivation As was indicated in the introduction, a manifest feature of modern high technology markets is the rapidity with which products are introduced, mature and approach obsolescence. For many classes of product, life cycles of a mere six months’ duration are not unusual. Needless to say, breakneck product turnover can impose great burdens on a company involved in such a market, given the necessity of preparations for the design, production, marketing and distribution of new products. Under different circumstances, reasonably accurate forecasts of demand might be produced by extrapolating prior demands using time-series analyses like those described in [17]. Unfortunately, with so many products commanding only evanescent life spans, such analyses are rarely applicable. A major vendor of technical workstation and server computers, Sun Microsystems, Inc., must contend with the torrid pace of change in high technology markets and the problems it causes. In response to these exigencies of new product introductions, Sun recently instituted a project entitled KAPTUR (Knowledge Acquisition for Planning Product Transitions through Understanding Risk). The rubric of the KAPTUR project is to formulate forecasts for new products based not on time-series analysis, but by drawing analogies with those products’ forebears. To achieve this, attributes of existing products are identified, and the effects of those attributes on market parameters for the products are ascertained. The intent of the work described in this paper is to supplement the KAPTUR process with a formal framework for expressing the relationships between product attributes and markets and for —2— combining those relationships to arrive at forecasts for new products. The next section introduces the constituents of this formal framework. 2.2 Bayesian Networks Bayesian networks are graphic representations of relationships between sets of random variables, where for the purposes of this discussion, a random variable may be thought of as a variable that adopts values from a given set according to some probability distribution. In more precise terms, a Bayesian network (or BN) is a directed acyclic graph (DAG) with nodes representing random variables and arcs portraying direct correlation between variables. Each node in a BN is also associated with a conditional probability distribution, which gives the probabilities associated with values of the variable represented by the node for different configurations of its parents’ variables. A common practice—followed here—is to restrict attention to BN’s containing discrete random variables only (i.e., variables ranging over finite sets), with conditional probability distributions specified using tables. A BN encapsulates a set of independence assertions between the variables it contains, in the sense that each variable in the network is independent of its non-descendants given its parents. To state this formally, a little notation is necessary. The probability that random variable x takes on the value c is denoted p(x = c). Similarly, p(x) represents the probability distribution comprising all possible values of x. If y is another random variable, the corresponding joint and conditional probabilities are denoted p( x = c, y = c ′ ), p( x , y ), p( x = c | y = c ′) and p( x | y ) . Now let x1, x2, …, xn enumerate the nodes in a BN, such that each node appears after its parents. For each i, let πi be the set of parents of node xi in the graph. Then according to the network, variable xi is conditionally independent of the variables {x1, x2, …, xn} given values for its parents πi, which is to say that !, x p( xi | x1 , x2 , i −1 ) = p( xi |π i ). Thus a BN represents an encoding of a joint probability distribution for the variables it contains. For by the chain rule: !, x ) = ∏ p( x | x , x ,!, x n p( x1 , x 2 , n i i =1 —3— 1 2 i −1 ) And we also have that: p( xi | x1 , x2 , !, x i −1 ) = p( xi | π i ), for i = 1, !, n Whence: ! n p( x1 , x 2 , , x n ) = ∏ p( xi | π i ) i =1 Observe that the factors p(xi | πi) are precisely the conditional probability tables attached to each node in the network. Having computed the joint probability encoded by a BN, calculation of the probability distributions associated with sets of nodes is straightforward, at least in concept. To illustrate, consider ; !, c @. Then random variables x1 and x2, such that the entire range of x2 comprises the values c1 , n the probability distribution of x1 may be derived from the joint probability distribution p( x1 , x2 ) by marginalizing out the variable x2: p( x1 ) = n ∑ p( x1 , x2 = c) < ! A c ∈ c1 , ,cn More generally, marginalization can be applied to eliminate selections of variables from the full joint probability distribution, allowing the computation of arbitrary joint probability distributions. For BN’s of any significant size, computing probabilities directly by multiplication and marginalization is impractical; it requires the enumeration of sets of values whose sizes are potentially exponential in the number of variables in the network. Fortunately, in the past decade or so, researchers have devised a number of algorithms for computing probabilities in Bayesian networks that offer acceptable performance in many cases by exploiting the encoded independence assertions delineated above. Introductions to such algorithms may be found in [7], [22] or [14]. The framework presented in this paper does not rely on the particular characteristics of any of these algorithms, and any of them may be selected to perform the required calculations. 2.3 Description Logics Description logics (also known as terminological description systems) grew out of attempts to formalize the semantic networks and frame systems used in many early knowledge representation efforts [4]. The past twenty years have seen the development of an impressive array of description —4— logics, ranging in expressive power from those equivalent to first-order predicate logic to far more restricted systems like those described below [16]. All description logics comprise a means for describing sets of objects in a domain (such sets are termed concepts) and relations (known as roles) between the objects in those sets. By way of an example, consider the simple description logic )/ introduced in [3]. The syntax of )/ is as follows: A − Atomic concept name R − Role name C :: = A | − − C1 | | C2 | − ∀R . C | − ∃R − Atomic concept Conjunction Universal role quantification Existential role quantification Informally, atomic concepts are sets of objects that are assumed to be given. Also assumed given are roles, relating objects in the domain. Three operations are provided for defining complex concepts: • Conjunction: The concept C1 |−| C2 comprises those objects that are members of both C1 and C2. • Universal role quantification: All objects in the concept ∀R . C are related by role R only to objects in C. The objects related to a given object by role R are known as the R-fillers of that object; thus the concept ∀R . C may be described as the set of object with R-fillers all in C. • Existential role quantification: All objects in ∃R have at least one R-filler. These informal specifications can be formalized by providing a model-theoretic semantics for concept expressions. To wit, a model, D, c, r ,[[_]] , for )/ specifies a domain of objects D, for each primitive concept P a set c( P ) ⊆ D , for each role R a relation r ( R) ⊆ D × D , and an interpretation function, [[ _ ]]: C →℘( D) , mapping concept expressions to subsets of the domain, satisfying: [[ P ]] = c( P) ]] = [[ C1 ]] ∩ [[ C2 ]] [[ C1 |−| C2 < [[ ∃R ]] = <o ∈ D | ∃o′ ∈ D. o, o′ ∈ r( R)A A [[ ∀R . C ]] = o ∈ D | ∀o ′ ∈ D. o, o ′ ∈ r ( R) ⇒ o′ ∈[[ C2 ]] —5— 3 Preliminary Considerations Recall that the notational framework described in this paper was devised with a view to forecasting demand for proposed new product introductions. In principle, addressing such a problem with the two formalisms described above is straightforward enough: Descriptions of features and products are furnished using a description logic, and Bayesian networks specify the relationships between product characteristics and market demand. Combining the two formalisms requires some means of adapting BN’s (that normally express relationships between random variables) to the specification of relationships between sets (the denotations of DL expressions). To illustrate how this might be done, consider a node in a net- ; !, v @ . work specifying the probability distribution of the random variable, x, with values v1 , n Now in formal terms (see [8], for example), x defines a function fx from a sample space (the set of ; !, v @. By extension, it also defines a collection of sets events under consideration) to the set v1 , f x−1 (v1 ), !, f −1 x ( vn ) n — the pre-images of each value. Note that since each point in the sample space is associated with precisely one value of x, this collection of pre-images partitions the space—all are disjoint, and their union equals the space itself. An equivalent formulation of the node in the network, therefore, is as an association of the pre-images f x−1 (v1 ), !, f −1 x ( vn ) with a set of probabilities (the original probability distribution function of x). These pre-image sets may be specified directly using expressions in a description logic, rather than indirectly using values of a random variable. Accordingly, each event in the sample space would be interpreted as a random selection from the domain of objects, and the probability distribution defined by the network would define the probabilities that such a random selection resulted in an object in the sets defined by the DL expressions. Assigning sets of DL expressions to the nodes of a BN, however, is fraught with difficulties. Consider, for example, ascribing the expression C1 |−| C2 to a node as part of its label set. In lieu of any further information, this would only specify the probability of the combined expression, and would fail to determine precisely the probability of either component. Such imprecision seems contrary to the spirit of the Bayesian network paradigm, which seeks to produce a complete specification of probabilities in a problem. Such an expression would also interact with labels on other —6— nodes; for example, if C1 occurs in the label set of another node, the drafter of the network would need to ensure that p(C1 |−| C2 , C1 ) = p(C1 , C2 ). A possible remedy for such difficulties would involve restricting the labeling of nodes to some class of “atomic” concept descriptions that do not feature conjunctions. Candidates from the language )/ would be primitive concepts, universal role quantification of primitive concepts (of the ! form ∀R1 . ∀Rn . P , say), and existential role qualification ( ∃R ). Even with such restrictions, however, difficulties abound. For example, if the expressions ∀R. C1 and ∀R. C2 appear in the label set of any node, then the network must assert (directly or indirectly) that p(∃R) = 1; otherwise it follows that there are objects with no role fillers for R, implying that the sets ∀R. C1 and ∀R. C2 are not disjoint—a requirement if they are to label the same node. A proliferation of such implicit consistency requirements can make the design of labeled networks inordinately onerous. As these examples illustrate, the interactions between different parts of such a network can be extremely subtle, and apparently minor adjustments in one node can have ramifications throughout the network. It would be desirable, therefore, to contrive a framework within which a complete probability distribution can be defined for the concepts of interest, while minimizing the possibility for inconsistencies like those detailed above. The next section lays out one such framework. 4 TDL: A Tiny Description Logic The first step in constructing the combined framework is to adapt the description logic )/. The resulting notation actually resembles the logics of feature structures [18] employed in linguistics, in as far as it conflates universal and existential role quantification in a single construct. It also invokes some of the flavor of type systems that incorporate record types and intersection [20]. —7— 4.1 Syntax Formally, the syntax of this “Tiny Description Logic” is as follows: P − Primitive concept name R − Role name C :: = P | − Primitive concept − | | C1 C2 | Conjunction − R: C | − Role quantification − Some notation: If S is a set2 with members C1, C2, …, Cn, the expression | | S denotes the concept C1 |−| C2 |−| ! |−| C . (That the latter is unambiguous follows from the commutativity and associativn ity of conjunction in the semantics below.) Correspondingly, the expression C ∈C ′ is true iff C´ can be rearranged (again using commutativity and associativity) into the form C1 |−| ; !, C @. where C ∈ C1 , ! |−| C , n n The atomic expressions, A, of TDL are characterized by the following (recursive) definition: A :: = P | R : A In the sequel, it will be convenient to assume that the set of atomic expressions is finite.3 While this restriction could be removed in principle, the technicalities required to deal with infinite sets and probability measures over them would complicate the presentation. 4.2 Semantics The semantics of TDL resembles that of )/, except that role quantification implies the existence of at least one role filler. More explicitly, a model, D, c, r ,[[_]] , for TDL specifies a domain of objects D, for each primitive concept P a set c( P ) ⊆ D , for each role R a relation r ( R) ⊆ D × D , and an interpretation function, [[ _ ]]: C →℘( D) , mapping concept expressions to subsets of the domain, satisfying: [[ P ]] = c( P) [[ C1 |−| C2 ]] = [[ C1 ]] ∩ [[ C2 ]] %K &K ' [[ R: C ]] = o ∈ D ∀o′ ∈ D. o, o′ ∈ r( R) ⇒ o′ ∈[[ C2 ]] and ∃o ′ ∈ D. o, o ′ ∈ r ( R) 2 (K )K * Henceforth, all variables referring to sets appear in bold font. This could be arranged by using only a finite number of primitive concepts and roles, and placing an upper bound on the nesting of role quantification. 3 —8— A useful property of atomic expressions in TDL is encapsulated in the following theorem: Lemma 1. Every expression in TDL is equivalent to a conjunction of atomic expressions. Proof : Define a function ν on expressions: ν ( P) = P ν (C1 |−| C2 ) = ν (C1 ) |−| ν (C2 ) − ν ( R: C) = | | R: A A ∈ν (C) < A It is routine to show that for all expressions C, ν (C ) is a conjunction of atomic expressions, and that the denotations of both expressions are equal. Q.E.D. 5 Labeled Bayesian Networks The second element of the combined modeling framework provides a means of expressing probabilistic relationships between TDL concepts using Bayesian networks. The syntactic structure of these networks is specified below, followed by a guide to their interpretation. 5.1 Syntax Definition 1. A labeled Bayesian network (LBN) n, π , L , p comprises: 1. A DAG with nodes 1, …, n, such that each node i has parents ; ! @ ;! @ π i = π i1 , , π im ⊆ 1, , i − 1 . 2. An indexed collection of disjoint label sets L1 , !, L , such that each L ⊆ A , where A is n i the set of atomic TDL expressions. For convenience’s sake, let C ∈L abbreviate ; !, n@. C ∈ L . ∃i ∈ 1, i 3. For each node i, L ∈ Li and ( L1 , p( L | L1 , !, L ) . !, L ) ∈ L × ! × L m π i1 m —9— π im , a conditional probability !, L ∈ L , j ∈;1,!, n@, i ∈;1,!, j@, define: p( L ,!, L ) = ∏ p( L | L ,!, L ) p( L ,!, L , L ,!, L ) = ∑ p( L ,!, L , L, L ,!, L ) For all L1 ∈ L1 , n n n 1 i −1 1 n i +1 n i =1 L ∈Li i 1 π i1 π im i −1 i +1 n The nodes of a labeled Bayesian network represent collections of disjoint sets of objects denoted by atomic expressions in TDL. The conditional probabilities in such a network assert the probability that an object selected at random will be a member of a given set, given that it is a member of certain other sets. That LBN’s express correlations between atomic concepts only is less of a restriction than it might appear, given that, as shown in Lemma 1, every concept in TDL can be represented as a conjunction of atomic concepts. The definition below specifies how such a conjunction can be construed as a “joint event”—intuitively, the event that a randomly selected object is a member of all the sets in the conjunction. Full details of the treatment of concepts in general are given in section 6. Definition 2. A non-empty set S of atomic concepts is consistent wrt. an LBN n, π , L , p if for ; !, n@, | S ∩ L | ≤ 1. If S is consistent wrt. n,π , L, p , and S ⊆ L , let ( L ,!, L ) all i ∈ 1, 1 i m enumerate the members of S in the order of the label sets of which they are members. Then − p( | | S) = p( L1 , !, L ) . m 5.2 Semantics The models for TDL presented in section 4.2 provide a basis for interpreting the meaning of labeled Bayesian networks, in as far as they represent the atomic expressions that occur in the label sets of such networks. The notion of a model needs to be extended, however, if it is to capture the probabilistic attributes of LBN’s. The definition below encapsulates such an extension in the form of a probability measure, which for the purposes of this paper can be thought of simply as a function that assigns a probability to sets of objects (see [8] for more technical details). Intuitively, the value assigned a set by the probability measure of such a model represents the probability with which a randomly-picked object is a member of that set. — 10 — Definition 3. A probabilistic TDL (PTDL) model D, c, r ,[[_]], µ is a TDL model D, c, r ,[[_]] , together with a probability measure µ mapping subsets4 of D to the interval [0,1], such that the denotation [[ C ]] of every concept expression C in TDL is in the domain of µ. The relationship between a network and its models derives from the probabilities they ascribe to sets of atomic concepts (as was hinted above, such probabilities form the basis for the assessment of concepts in general). Specifically, for a probabilistic TDL model to satisfy a labeled Bayesian network, it must: • Assign a probability to a conjunction of the atomic concepts in a network that agrees with the probability assigned to that conjunction by the network itself. • Ensure that an object’s membership of concepts in the network is independent of its membership of concepts outside of the network. This means that the probability that an object is a member of a conjunction with components inside and outside the network can be computed simply by multiplying the probability that it’s a member of all the components inside the network by the probability of membership of those components outside. Definition 4. A PTDL model D, c, r ,[[_]], µ satisfies an LBN n, π , L , p (written D, c, r ,[[_]], µ |= n, π , L , p ) iff the following hold: − − 1. For all non-empty, consistent S ⊆ L , µ ([[ | | S ]]) = p( | | S) − 2. For all consistent S ⊆ L such that S ∩ L is non-empty, µ ([[ | | ( S \ L) ]]) ≠ 0 and − − − µ ([[ | | S ]]) = p( | | ( S ∩ L)) × µ ([[ | | ( S \ L) ]]) 5.3 Consistency Thus far, a framework has been proposed for labeling Bayesian networks with description logic expressions. It was pointed out in section 3, however, that it is all too easy in general to produce inconsistent specifications in this manner. While the definition of TDL and the restriction to atomic concepts reduces the opportunities for producing inconsistencies, they are not eliminated. The following lays down a condition sufficient to rule out inconsistency entirely. 4 Strictly speaking, the domain of µ should constitute a σ-algebra — again, see [8]. — 11 — K Definition 5. A labeled Bayesian network n, π , L , p is consistent iff p( L) = 0 for all K !× L such that K L = !, ( R : ! : R : P), !, ( R : ! : R : P ′ ), ! , for P ≠ P ′ and P , P ′ ∈L , for some i ∈;1,!, n@ . L ∈ L1 × 1. 2. n 1 1 m m i Thus, if an LBN meets the above requirement, inference can proceed from it ensured of wellgroundedness. Formally: Lemma 2. All consistent LBN’s have at least one non-trivial model. Proof: Consider the construction of a model D, c, r ,[[_]], µ for an LBN n, π , L , p . To construct D, first assume an enumeration B1 , > C !, B m of those atomic concepts not in L.5 For each concept Bi, form a set N i = Bi , Bi , where the expression Bi is guaranteed distinct from any atomic expression. Then D is simply the product L1 × L2 × = !, A ) ∈ D Notation: Let a ( A) = ( A1 , k !× L n × N1 × !× N m. ; !, k @. A = A B , for atomic expression A. ∃i ∈ 1, i For each primitive concept P, put c( P ) = a ( P ). For each role R, put r ( R) = %& o, o′ ' () * ∃A. o ∈a ( R: A), and . ∀B. o ∈ a ( R: B ) ⇒ o ′ ∈a ( B ) Define the function [[_]] by induction using the equations given in section 4.2. Associate arbitrarily a value pBi in the open interval (0,1) with each atomic concept Bi in the enumeration B1 , !, B , and let p m Bi = 1 − pBi . Define µ as follows: • µ ( ∅) = 0 • For all o = ( A1 , • For disjoint S1 , S2 ∈ D, let µ ( S1 ∪ S2 ) = µ ( S1 ) + µ ( S2 ) 5 !, A , N ,!, N n 1 m ) ∈ D, ;@ let µ ( o ) = p( A1 , Recall that it is assumed that the number of atomic concepts is finite. — 12 — !, A ) × p ×!× p n N1 Nm . It is fairly straightforward to verify that if the network n, π , L , p is consistent, the above postulates are non-contradictory and define a PTDL model that satisfies n, π , L , p . Q.E.D. 6 Making Inferences Recall the original intent stated in section 3: The determination of correlation between product features and likely demand. In the context of the framework described in the previous sections, “correlation” between TDL concepts can be construed as a conditional probability—the probability that a randomly-chosen object is a member of one concept given that it is a member of another. Given such an understanding, known correlation between product features and demand can be specified using a labeled Bayesian network. Such a network circumscribes a class of models— those PTDL models that satisfy it. Correlative relationships not appearing directly in the network are derived as conditional probabilities that hold in all the network’s models. The conditional probability of the (general) concept C1 given (general) C2 is denoted λ (C1 | C2 ) to keep it distinct from the conditional probabilities (denoted by the symbol p) involving atomic concepts asserted in the network. With respect to a given PTDL model D, c, r ,[[_]], µ , λ (C1 |C2 ) is defined as the ratio of the joint probability of C1 and C2 to the probability of C2. In symbols: D, c, r ,[[_]], µ |= λ (C1 |C2 ) = λ iff µ ([[ C1 ]] ∩ [[ C2 ]]) = λ × µ ([[ C2 ]]) A network entails a conditional probability only if that probability holds in all models of the network: n, π , L, p |− λ (C1 | C2 ) = λ ⇒ 0. 0 |= n,π , L, p ⇒ 0 |= λ(C |C ) = λ ∀ 1 2 Combining the two definitions above gives: n, π , L , p |− λ (C1 |C2 ) = λ ⇒ ∀ D, c, r ,[[ _]], µ |= n, π , L , p . µ ([[ C1 ]] ∩ [[ C2 ]]) = λ × µ ([[ C2 ]]) There is a particular class of conditional probability assertion for which a simple inference algorithm can be readily formulated. Assume we are given a consistent LBN % = n, π , L, p and two − − TDL concepts C1 and C2. Further, assume that | | S1 = ν (C1 ) and | | S2 = ν (C2 ), where S1 ⊆ L, and — 13 — both S1 and S2 are consistent wrt. %. Define S2′ = S2 ∩ L. Then, for any model D, c, r ,[[_]], µ of %, provided µ ([[ C2 ]]) ≠ 0: − − µ ([[ C1 ]] ∩ [[ C2 ]]) µ ([[ | | S1 ]] ∩ [[ | | S2 ]]) = − µ ([[ C2 ]]) µ ([[ | | S2 ]]) = = = − µ ([[ | | ( S1∪ S2 )]]) − µ ([[ | | S2 ]]) − − p( | | (( S1∪ S2 ) ∩ L)) × µ ([[ | | (( S1∪ S2 ) \ L)]]) − − p( | | ( S2 ∩ L )) × µ ([[ | | ( S2 \ L) ]]) − − p( | | ( S1∪ ( S2 ∩ L))) × µ ([[ | | ( S2 \ L)]]) − − p( | | ( S2 ∩ L)) × µ ([[ | | ( S2 \ L) ]]) since S1 ⊆ L − p( | | ( S1∪ S2′ )) = − p( | | S2′ ) − − Finally, note that by the definition of a PTDL, µ ([[ C2 ]]) ≠ 0 ⇔ p( | | S2 ) ≠ 0 ⇔ p( | | S2′ ) ≠ 0. − So for any LBN % = n, π , L, p and any two TDL concepts C1 and C2, such that | | S1 = ν (C1 ) and |−| S2 = ν (C2 ), establish that all of the following are true: 1. % is consistent, 2. S1 and S2 are consistent wrt. %, 3. S1 ⊆ L, 4. − p( | | S2′ ) ≠ 0, where S2′ = S2 ∩ L, Then the following inference is valid: % |− λ (C1 | C2 ) = − p( | | ( S1∪ S2′ )) − p( | | S2′ ) Note that in computing the probabilities on the right hand side of the above equation, any of the algorithms for inference in Bayesian networks mentioned in section 2.2 may be employed. Such inferences may be set in the context of the forecasting problem posed at the outset: Provided an LBN that isolates all the significant relationships between product characteristics and market parameters in the form of conditional probabilities involving atomic concepts, formulate descriptions of a new product (C2) and a hypothetical market for that product (C1) as TDL expressions involving (some of) those atomic concepts. Then the conditional probability λ (C1 | C2 ) provides — 14 — an indication that the new product will indeed face the market hypothesized. A concrete demonstration of this procedure is given in the next section. 7 Worked Example This section demonstrates the application of the framework described above to an elementary sample problem representative of (but far simpler than) those that arise in the context of Sun’s KAPTUR program. The problem relates to the marketing of a new line of server computer products, each offering a particular combination of price, performance and storage capability. The intent is to determine target market segments for different members of the product line-up. The products themselves are described by concept expressions in TDL, composed from the primitive concepts and roles described informally in Figure 1. Figure 2 lists expressions that formally characterize the defined concepts in the previous figure. Figure 3 illustrates these concepts, and the hierarchical relationships between them (a concept fits above another in the hierarchy if the latter is a specialization of it). In Figure 4, a labeled Bayesian network (in the sense of Definition 1) is displayed that relates certain properties of products to their ideal market segments.6 For example, the network suggests that an expensive product with high storage capacity is most likely a product for large businesses; conversely, an expensive product with low storage capacity is probably a dud. Given the information above, the inference procedure in the previous section may be employed to estimate various aspects of the market for the new server products. For instance, the suitability of the Silver model for sale as an EnterpriseProduct is given by the expression: λ (EnterpriseProduct | Silver ) 6 The LBN also specifies unconditional probabilities for storage capacities and prices. — 15 — Primitive Concepts StorageUnit Server High, Medium, Low LargeBusiness, SmallBusiness Dud — The set of all storage products — The set of all server products — Quantity ranges — Market segments — A product that’s not likely to sell anywhere Roles Capacity Speed Tech Price Storage MktSegment — Capacity of a storage unit — Speed of a storage unit — Quality of innovation in a server product — Price of a server product — The type of storage unit in a server — Ideal market segment for a server product Defined Concepts S10 S100 Gemstone Metal Opal Sapphire Silver Gold EnterpriseProduct DepartmentalProduct — Low-end storage unit — High-end storage unit — A high-end server line (not necessarily expensive) — A low-end server line (cheap) — A server in the Gemstone line with an S10 storage unit — An expensive server in the Gemstone line with an S100 storage unit — A server in the Metal line with an S10 storage unit — A server in the Metal line with an S100 storage unit — A product best suited for sale into large businesses — A product best suited for sale into medium business Figure 1: Informal descriptions S10 = StorageUnit |−| Capacity:Low |−| Speed: Medium S100 = StorageUnit |−| Capacity:High |−| Speed: High Gemstone = Server |−| Tech: High Metal = Server |−| Tech: Low |−| Price: Low Opal = Gemstone |−| Storage: S10 Sapphire = Gemstone |−| Storage: S100 |−| Price: High Silver = Metal |−| Storage: S10 Gold = Metal |−| Storage: S100 EnterpriseProduct = MktSegment: LargeBusiness DepartmentalProduct = MktSegment: MediumBusiness Figure 2: Formal concept definitions — 16 — Server Metal Tech: Low Price: Low Gemstone Tech: High Gold Storage: S100 Silver Storage:S10 Storage Unit Sapphire Storage: S100 Price: High Opal Storage: S10 S100 Capacity: High Speed: High Figure 3: Product taxonomies — 17 — S10 Capacity: Low Speed: Medium Storage:Capacity:High Storage:Capacity:Low 0.4 0.6 Storage:Capacity:High Storage:Capacity:High Storage:Capacity:Low Storage:Capacity:Low Price:High Price:Low Price:High Price:Low Price:High Price:Low MktSegment: MktSegment: LargeBusiness SmallBusiness 0.7 0.1 0.8 0.2 0.2 0.1 0.2 0.4 0.6 0.4 Dud 0.2 0 0.7 0.4 Figure 4: Labeled Bayesian network To compute the value of this expression, first express each concept in terms of its atomic constituents, as show in Lemma 1: ν (EnterpriseProduct) = MktSegment:LargeBusiness ν (Gold) = Server |−| Tech:Low |−| Price:Low |−| Storage:StorageUnit |−| Storage:Capacity:High |−| Storage:Speed:High This gives: ; @ S = ;Server , Tech:Low , Price:Low , Storage:StorageUnit , Storage:Capacity:High , Storage:Speed:High@ S ′ = ;Price:Low , Storage:Capacity:High@ S1 = MktSegment:LargeBusiness 2 2 It is not difficult to verify that all the preconditions set out in section 6 are satisfied in this case. Hence by the inference rule: : ? − p( | | MktSegment:LargeBusiness, Price:Low , Storage:Capacity:High ) − p( | | Price:Low, Storage:Capacity:High ) 0.8 × 0.4 × 0.4 7 = 0.4 × 0.4 = 0.8 7 λ (EnterpriseProduct | Silver ) = : 7 ? It so happens in this case that the appropriate probability can be retrieved directly from the graph; this is not true in general, but see section 9. — 18 — Other estimates of interest can be computed in similar fashion; for example: λ (EnterpriseProduct | Opal) = 0.2 λ (EnterpriseProduct | Sapphire) = 0.7 λ (DepartmentalProduct | Sapphire) = 01 . 8 Related Work Naturally, the work described here is by no means the first to combine elements of formal logic and probability. Some of the later research of Carnap [6], for example, seeks to construe inferential reasoning in just this manner. More recently, Bacchus [2] and Halpern [9] have sought to fashion modal logics encompassing probabilistic reasoning. While such general frameworks are useful in elucidating judgements regarding probabilities, they give rise to inference procedures too complex to mechanize effectively [1]. A number of researchers have turned to description logics as more tractable basis for probabilistic inference. The work of five such researchers is most germane to that recounted here, and is outlined in the following sections. 8.1 Heinsohn In [11], Heinsohn highlights a probabilistic extension to description logics developed in full in his 1993 Ph.D. thesis [12]. Heinsohn’s system extends a simple description logic (a minor generalization of the language )/ defined in section 2.3) with p-conditionings. A p-conditioning involving concepts C1 and C2 is an assertion the form: p ,p ] C1 [ → C2 , l where 0 ≤ pl ≤ pu ≤ 1. u Such a p-conditioning imposes a constraint on the extensions of C1 and C2 and their cardinalities in valid models such that: [[ C1 ]] ∩ [[ C2 ]] [[ C1 ]] ∈ pl , pu (Clearly, Heinsohn’s framework requires that the domain of all objects is finite.) Heinsohn provides inference rules that allow new p-conditionings to be derived in the presence of concept definitions. — 19 — 8.2 Jaeger Jaeger’s work, summarized in [13], is closely allied with Heinsohn’s, though Jaeger places rather greater emphasis on the derivation of beliefs about individual objects in the presence of statistical assertions. (The work detailed in this paper focuses exclusively on statistical assertions; for an introduction to the wider context, see [9].) In Jaeger’s framework, general probabilistic assertions of the form p(C | D) , for (possibly non-atomic) concept expressions C and D, impose linear equalities on probability measures defined over the Lindenbaum algebra of concept expressions.8 Jaeger shows how such equalities determine a convex polytope (in the space of discrete probability distributions), which may be used to derive bounds on the conditional probabilities of other concept expressions. Both Heinsohn and Jaeger’s approach to probabilistic inference (viz., the derivation of probability intervals from partial descriptions of conditional probabilities) resemble the rule-based approaches of some early expert systems [19]. While this sort of framework is very general, and makes it easy to establish the basics of a knowledge base, it suffers in certain regards in comparison with the Bayesian network approach espoused here. For example (as is shown in [7]), it can be very difficult to establish a priori that a set of conditional probability assertions of this nature is actually consistent; in contrast, any labeled Bayesian network that satisfies the fairly simple constraints in Definition 5 is consistent by construction. Furthermore, the efficiency of inferences based on Bayesian networks can often be carried out much more efficiently than in the more general Heinsohn/Jaeger framework, as Koller, Levy and Pfeffer [15] point out. 8.3 Koller, Levy and Pfeffer Introduced in [15], P-CLASSIC is likely the system most closely allied to that described here, in that it features elements of Bayesian networks and description logics. However, in contrast to this work, which uses a very simple description logic to extend the capabilities of Bayesian networks, the intent of P-CLASSIC is to bolster an existing description logic (the CLASSIC system described 8 The Lindenbaum algebra is formed by taking the quotient of the set of all concept expressions modulo logical equivalence. — 20 — in [5]) with the facilities of Bayesian networks. These divergent motivations give rise to some important differences in technical character. The central contribution of P-CLASSIC is the probabilistic class, or p-class. A p-class resembles a labeled Bayesian network (defined in section 5), in that it comprises a Bayesian network whose nodes are associated with concept expressions. Again, the intent is to assert a degree of correlation (or “overlap,” as it is termed in [15]) between certain concepts by specifying a joint probability distribution involving those concepts. Primitive concepts may label the nodes of a p-class, as with LBN’s. In contrast to LBN’s, however (where they may appear on a node in the company of arbitrary atomic expressions), a primitive concept may appear in a p-class on a node whose only other label is the negation of that concept.9 The most profound difference between the approach taken in this paper and that of P-CLASSIC occurs in the treatment of role quantification. Recall that in an LBN, it is possible to specify explicitly dependencies between arbitrary atomic concept expressions, including concepts involving role quantification, such as R: P. P-CLASSIC handles such dependencies much more indirectly. In P-CLASSIC, a p-class ρ must contain for each role R, a node PC ( R ) whose labels are p-classes. For any configuration of its parents, this node determines a probability Prρ , R ( ρ ′ ) for each pclass ρ ′. 10 P-class ρ ′ in turn defines a probability Prρ ′ (C ) for concept C. Let Φ denote the set of all p-classes in the system. Then according to the original p-class ρ, the probability that an object has n R-fillers in concept C is cept ∀R: C is calculated as ∑ρ ∑ ∑ρ bR n =1 n Prρ , R ( ρ ′) .Prρ ′ (C) . The probability assigned the con′∈Φ n Prρ , R ( ρ ′) .Prρ ′ (C) , where bR is an arbitrary finite upper ′∈Φ bound that P-CLASSIC insists is imposed on the number of R-fillers of any object. This approach has some intuitive appeal, but it makes it very difficult to express relationships involving concepts involving role quantification—particularly since (at least as described in [15]) P-CLASSIC allows no node PC ( R ) to be the parent of any other node. In fact, the semantics of P- 9 Koller et al., extend CLASSIC with negations of primitive concepts. Strictly speaking, this probability depends on the number of R-fillers; to simplify the explanation here, however, this dependency is ignored. 10 — 21 — CLASSIC require that the properties of an object and its fillers are independent given the filler’s pclass, making it impossible to express the sort of conditional independence assertions entailed by Figure 4. While there is no doubt that the mechanisms encapsulated in P-CLASSIC are useful adjuncts to the capabilities of the underlying CLASSIC description logic, the limitations in its approach regarding role quantification render it ill-suited to the particular type of application the framework described in this paper was designed to address. 9 Conclusion Though the framework proposed in this paper goes some way towards meeting the goals set out for it, there are doubtless innumerable ways in which it might be extended and improved. It is incontestable, for example, that the description logic TDL provides only the barest essentials required of such a notation. It would be interesting to explore the possibility of incorporating features found in more expressive logics, such as the description logic CLASSIC, for example, that underlies P-CLASSIC. Whether the introduction of such features in the framework are compatible with the fully determinate probabilities and simple consistency conditions offered by the current configuration is an open question, however. It may be that a more complex, interval-based approach to probabilities like that of Heinsohn and Jaeger is required. From a more pragmatic point of view, for a practitioner using the framework to provide a complete specification of the probabilities in an LBN such as that in section 7 can constitute a major challenge—such information is normally only extracted through hours of rather tedious interviewing of the chief participants in product design and marketing. Certain short-cuts can be made to ease the process: For example, in making inferences of the form λ (C1 | C2 ), it is not difficult to show that the probability tables ascribed the root nodes in an LBN (that is, those nodes with no parents) are immaterial, and need not be estimated. Nonetheless, for reasonably large networks, fully estimating all of the required probabilities may still be burdensome. One interesting alternative would be to attempt to learn (or refine by learn- — 22 — ing) the probabilities or indeed the structure itself of the appropriate LBN by examination of historical data; see [10] for more details. The framework, as it currently stands, requires that a single LBN capture at once all product features that are relevant to the determination of market parameters. Implicitly, any product features not appearing in the network are considered immaterial. Truly innovative products (that sport features not considered in the original formulation) and unanticipated competitive developments require that the network be updated. Thus some means of structuring the networks—so as to allow such updates to occur in an incremental and well-defined fashion—would be desirable; the multiply-sectioned Bayesian networks of [21] might provide a basis for such an extension. 10 Acknowledgements I’d like to thank the members of the Advanced Planning Tools Group of the Supply Planning and Management Department at Sun Microsystems—in particular, Jim Mullin and Eunice Lee—for their kind cooperation and patient tuition concerning the intent and conduct of Sun’s KAPTUR process. I’m also greatly obliged to my manager at Sun Microsystems Laboratories, Mario Wolzcko, and to other members of the Sun Labs Application Technologies Center. 11 References [1] Abadi, M., Halpern, J., Decidability and expressiveness for first-order logics of probability. Information and Computation 112:1, 1994, pp. 1-36. [2] Bacchus, F., Representing and Reasoning with Probabilistic Knowledge. MIT Press, Cambridge, MA, 1990. [3] Brachman, R. J., Levesque, H. J., The tractability of subsumption in frame-based description languages. Proceedings of the 4th National Conference on Artificial Intelligence (AAAI-84), 1984. [4] Brachman, R. J., Schmolze, J. G., An overview of the KL-ONE knowledge representation system. Cognitive Science, Vol. 9, No. 2, 1985. — 23 — [5] Brachman, R., Borgida, A., McGuinness, D., Patel-Schneider, P., Resnick, L., Living with CLASSIC: When and how to use a KL-ONE-like language. In Sowa, J., (ed.), Principles of Semantics Networks, Morgan Kaufmann, 1991. [6] Carnap, R., Logical Foundations of Probability. University of Chicago Press, Chicago IL, 2 edition, 1962. [7] Castillo, E., Gutiérrez, J., Hadi, A., Expert Systems and Probabilistic Network Models. Monographs in Computer Science, Springer-Verlag, New York, NY, 1997. [8] Halmos, P., Measure Theory. Graduate Texts in Mathematics, 18, Springer-Verlag, New York, NY, 1974. [9] Halpern, J., An analysis of first-order logics of probability, Artificial Intelligence 46, 1990, pp. 311-350. [10] Heckerman, D., Geiger, D., Chickering, D. M., Learning Bayesian networks: The combination of knowledge and statistical data. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, R. Lopez de Mantaras and D. Poole (editors), Morgan Kaufmann Publishers, San Francisco, CA, 1994. [11] Heinsohn, J., A Hybrid Approach for Modeling Uncertainty in Terminological Logics, Deutsche Forschungszentrum für Künstliche Intelligenz Research Report-1991-24, 1991. [12] Heinsohn, J., ALCP—Ein hybrider Ansatz zur Modellierung von Unsicherheit in terminologischen Logiken. Ph.D. thesis, Universität des Saarlandes, Saarbrücken, 1993. [13] Jaeger, M., A logic for default reasoning about probabilities, in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, R. Lopez de Mantaras and D. Poole (editors), Morgan Kaufmann Publishers, San Francisco, CA, 1994. [14] Jensen, F. V., An Introduction to Bayesian Networks. Springer, New York, 1996. [15] Koller, D., Levy, A., Pfeffer, A., P-CLASSIC: A tractable probabilistic description logic. Proceedings of 14th National Conference on Artificial Intelligence (AAAI-97), pp. 360-397, 1997. — 24 — [16] MacGregor, R., The evolving technology of classification-based knowledge representation systems. in Sowa, J. F., (ed.) Principles of Semantic Networks: Explorations in the Representation of Knowledge, Morgan Kaufmann, San Mateo, 1991. [17] Makridakis, S., Wheelwright, S., Hyndman, R., Forecasting: Methods and Applications (3rd ed.). John Wiley & Sons, New York, NY, 1998. [18] Moss, L. S., Completeness theorems for logics of feature structures, in Y.N. Moschovakis (ed.), Logic From Computer Science, MSRI Publications Vol. 21, Springer-Verlag, 1991, pp. 387-401. [19] Neapolitan, R., E., Probabilistic Reasoning in Expert Systems: Theory and Algorithms. Wiley-Interscience, New York, NY, 1990. [20] Pierce B. C., Programming with Intersection Types and Bounded Polymorphism. Technical Report CMU-CS-91-205, Carnegie-Mellon University, 1991. [21] Xiang, Y., Poole, D., Beddoes, M., Multiply sectioned Bayesian networks and junction forests for large knowledge-based systems. Computational Intelligence, Vol.9, No.2, 1993. [22] Zhang, N. L., Poole, D., Exploiting causal independence in Bayesian network inference. Journal of Artificial Intelligence Research, 5, 1996. — 25 — About the Author Phillip M. Yelland is a staff engineer at Sun Microsystems Laboratories. Since receiving his M.A. and Ph.D. in Computer Science from the University of Cambridge in England, he has worked in a wide variety of settings, ranging from software development management in a number of high-tech startups to industrial research for major corporations. He has made a number of contributions to academic and industry publications. Throughout, his activities have reflected a preoccupation with the application of theoretical knowledge to the practical conduct of business. In addition to his current research, which encompasses formal methods for the analysis of object-oriented programming systems and the use of statistical techniques in supply-chain management, he is also completing an M.B.A. at the University of California at Berkeley. — 26 —
© Copyright 2025 ExpyDoc