Introduction to Probability and Statistics Probability & Statistics for Engineers & Scientists, 9th Ed. 2009 Handout #5 Lingzhou Xue The pdf file for this class is available on the class web page. 1 Chapter 6 Some Continuous Probability Distributions 2 Name Uniform Notation U[a, b] P.D.F. f (x) 1 , a ≤ x ≤ b. b−a Normal N (µ, σ 2) Exponential Exp(β) Gamma Γ(α, β) 2 √ 1 exp{− (x−µ) }, −∞ < x < ∞. 2σ 2 2πσ 1 e−x/β , x > 0. β 1 α−1 e−x/β , x x > 0. α β Γ(α) 1 xυ/2−1e−x/2, x > 0. υ/2 2 Γ(υ/2) Chi-Squared Lognormal υ, 2 = Γ χ2 υ 2 LogN(µ, σ 2) 1 √ xσ 2π − 12 [ln(x)−µ]2 e 2σ , x>0 3 Continuous Uniform Distribution (Rectangular Distribution) The density function of the continuous uniform random variable X on the interval [a, b] is 1 , a ≤ x ≤ b; b−a ( f (x; a, b) = 0, a+b E(X) = , and 2 elsewhere. (b − a)2 Var(X) = . 12 4 Probability density function for a continuous uniform random variable. 5 Example 1 Suppose that a large conference room for a certain company can be reserved for no more than 4 hours. However, the use of the conference room is such that both long and short conferences occur quite often. In fact, it can assumed that length X of a conference has a uniform distribution on the interval [0, 4]. 1. What is the probability density? 2. What is the probability that any given conference lasts at least 3 hours? 6 Solution: • The appropriate density function for the uniformly distributed random variable X in this situation is ( f (x) = 1 , 0 ≤ x ≤ 4; 4 0, elsewhere. • P (X ≥ 3) = Z 4 1 3 4 dx = 1 . 4 7 Normal Distribution 8 The most important continuous probability distribution in the entire field of statistics is the normal distribution. Its graph, called the normal curve, is the bell-shaped curve, which describes approximately many phenomena that occur in nature, industry, and research. • Physical measurements in areas such as meteorological experiments, rainfall studies, and measurements of manufactured parts are often more than adequately explained with a normal distribution. • Errors in scientific measurements are extremely well approximated by a normal distribution. 9 Probability density function for a normal random variable. 10 Normal Distribution The density function of the normal random variable X, with mean µ and variance σ 2, is 2 1 (x − µ) }, −∞ < x < ∞. f (x; µ, σ 2) = √ exp{− 2 2σ 2πσ where π = 3.12159 . . ., and e = 2.71828 . . . . E(X) = µ, and Var(X) = σ 2. Standard Normal Distribution The distribution of a normal random variable with mean 0 and variance 1 is called a standard normal distribution. 11 Example 2 Let the probability density function of a random variable X be 1 x2 f (x) = √ exp{− }, if −∞ < x < ∞. 10 10π 1. Find the mean and variance of the random variable X. 2. Use Chebyshev’s inequality to show that √ 1 P (|X| ≥ 10) ≤ . 2 12 Solution: 1. E(X) = 0, Var(X) = 5. 2. P (|X| ≥ P (|X| ≥ √ √ 1 10) ≤ . 2 10) = 1 − P (|X| ≤ √ 10) √ √ = 1 − P (|X| ≤ 2 5) 1 1 ≤1−√ = . 2 22 13 Normal curves with µ1 < µ2 and σ1 = σ2. 14 Normal curves with µ1 = µ2 and σ1 < σ2. Normal curves with µ1 < µ2 and σ1 < σ2. Symmetry A function f (x) is sometimes said to be symmetric about the origin at x-axis if f (−x) = f (x). A function f (x) is sometimes said to be symmetric about the µ at x-axis if f (µ − x) = f (µ + x). 15 Properties of Normal Distribution 1. The mode (the mode is the value that occurs the most frequently in a data set or a probability distribution.) occurs at x = µ. 2. The curve is symmetric about a vertical axis through the mean µ. 3. The curve has its points of inflection at x = µ ± σ, is concave downward if µ − σ < X < µ + σ, and is concave upward otherwise. 4. The normal curve approaches the horizontal axis asymptotically as we proceed in either direction away from the mean. 16 5. The total area under the curve and above the horizontal axis is equal to 1. Mean of a Normal Random Variable 2 1 − (x−µ) 2 f (x; µ, σ ) = √ e 2σ2 , 2πσ E(X) = Z ∞ −∞ −∞ < x < ∞. xf (x; µ, σ 2)dx 2 ∞ − (x−µ) 1 =√ e 2σ2 dx 2πσ −∞ x−µ (setting z = and dx = σdz) σ Z ∞ z2 1 − =√ (µ + σz)e 2 dz 2π −∞ Z ∞ Z ∞ 2 z z2 µ σ − − =√ e 2 dz + √ ze 2 dz 2π −∞ 2π −∞ = µ. Z 17 Variance of a Normal Random Variable 2 1 − (x−µ) 2 f (x; µ, σ ) = √ e 2σ2 , 2πσ −∞ < x < ∞. Var(X) = E[(X − µ)2] = Z ∞ −∞ (x − µ)2f (x; µ, σ 2)dx x−µ (setting z = and dx = σdz) σ Z ∞ z2 σ2 − 2 =√ z e 2 dz 2π −∞ Z ∞ 2 z z2 σ2 − − ∞ = √ (−ze 2 |−∞ + ze 2 dz) −∞ 2π = σ 2. 18 Areas Under Normal Curve 19 Areas Under the Normal Curve The curve of any continuous probability distribution or density function is constructed so that the area under the curve bounded by the two ordinates x = x1 and x = x2 equals the probability that the random variable X assumes a value between x1 and x2. Thus, for the normal curve 2 1 − (x−µ) √ P (x1 < X < x2) = e 2σ2 dx x1 2πσ is represented by the area of the shaded region. Z x 2 20 P (x1 < X < x2)= area of the shaded region. 21 The difficulty encountered in solving integrals of normal density functions necessitates the tabulation of normal curve areas for quick reference. However, it would be a hopeless task to attempt to set up separate tables for every conceivable values of µ and σ. Fortunately, we are able to transform all the observations of any normal random variable X to a new set of observation of a normal random variable Z with mean 0 and variance 1. This can be done by means of the transformation X −µ . σ Whenever X assumes a values x, the corresponding value of Z is given by z = x−µ σ . Therefore, if X falls between the values x = x1 and x = x2, the random variable Z will fall between the Z= 22 corresponding values z1 = x1σ−µ and z2 = x2σ−µ . Consequently, P (x1 < X < x2) = Z x 2 x1 n(x; µ, σ)dx x1 − µ X −µ x2 − µ = P z1 = < < z2 = σ σ σ x −µ x −µ = P z1 = 1 < Z < z2 = 2 σ σ Z = z2 z1 n(x; 0, 1)dz where Z is seen to be a normal random variable with mean 0 and variance 1. Example 3 Given a standard normal distribution, find the area under the curve that lies 1. to the right of z = 1.84, and 2. between z = −1.97 and z = 0.86. 23 Solution: 1. The area in Figure (a) to the right of z = 1.84 is equal to 1 minus the area in Table to the left of z = 1.84, namely, 1-0.9671=0.0329. 2. The area in Figure (b) to between z = −1.97 and z = 0.86 is equal to the area to the left of z = 0.86 minus the area to the left of z = -1.97. From Table we find the desired area to be 0.8051-0.0244=0.7807. 24 Area of Example 3. 25 Example 4 Given a standard normal distribution, find the value of k such that 1. P (Z > k) = 0.3015 and 2. P (k < Z < −0.18) = 0.4917. 26 Solution: 1. In Figure (a) we see that the k value leaving an area of 0.3015 to the right must then leave an area of 0.6985 to the left. From the Table it follows that k = 0.52. 2. From Table we note that the total area to the left of -0.18 is equal to 0.4286. In Figure (b) we see that the area between k and -0.18 is 0.4197 so that the area to the left of k must be 0.4286-0.4197=0.0089. Hence, from Table, we have k = -2.37. 27 Area of Example 4. 28 Example 5 Given a random variable X having a normal distribution with µ = 50 and σ = 10, find the probability that X assumes a value between 45 and 62. Solution: X − 50 62 − 50 45 − 50 < < P (45 < X < 62) = P 10 10 10 = P (−0.5 < Z < 1.2) = 0.8849 − 0.3085 = 0.5764. 29 Using the Normal Curve in Reverse Occasionally, we are required to find the value of z corresponding to a specified probability that falls between values. For convenience, we shall always choose the z value corresponding to the tabular probability that comes closest to the specified probability. We reverse the process and begin with a known area or probability, find the z value, and then determine x by rearranging the formula x−µ to give x = σz + µ. z= σ 30 Example 6 Given a normal distribution with µ = 40 and σ = 6, find the value of x that has 1. 45% of the area to the left, and 2. 14% of area to the right. 31 Solution: 1. P (X < x) = 0.45 x − 40 X − 40 = 0.45 ⇒P < 6 6 x − 40 ⇒P Z < = 0.45 6 ⇒P (Z < −0.13) = 0.45 ⇒x = 6 − (−0.13) + 40 = 39.22 32 2. P (X > x) = 0.14 ⇒P (X < x) = 0.86 X − 40 x − 40 ⇒P < = 0.86 6 6 x − 40 = 0.86 ⇒P Z < 6 ⇒P (Z < 1.08) = 0.86 ⇒x = 6 · (1.08) + 40 = 46.48 Area of Example 6. 33 Applications of the Normal Distribution 34 Example 7 A certain type of storage battery lasts, on average, 3.0 years with a standard deviation of 0.5 year. Assuming that the battery lives are normally distributed, find the probability that a given battery will last less than 2.3 years. Solution: Let X represent the life of the storage battery. To find the P (X < 2.3), we need to evaluate the area under the normal curve to the left of 2.3. This is accomplished by finding the area to the left of the corresponding z value. Hence we find that P (X < 2.3) = P X − 2.3 2.3 − 3 < = P (Z < −1.4) = 0.0808. 0.5 0.5 35 Example 8 An electrical firm manufactures light bulbs that have a life, before burn-out, that is normally distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a bulb burns between 778 and 834 hours. Solution: Let X be the life of the light bulb. X ∼ N (µ = 800, σ 2 = 402). 778 − 800 X − 800 834 − 800 < < P (778 < X < 834) = P 40 40 40 = P (−0.55 < Z < 0.85) = 0.8023 − 0.2912 = 0.5111. 36 Example 9 In an industrial process the diameter of a ball bearing is an important component part. The buyer sets specifications on the diameter to be 3.0 ± 0.01 cm. The implication is that no part falling outside these specifications will be accepted. It is known that in the process the diameter of a ball bearing has a normal distribution with mean µ = 3.0 and standard deviation σ = 0.005. On average, how many manufactured ball bearings will be scrapped? 37 Solution: The values corresponding to the specification are x1 = 2.99 and x2 = 3.01. Hence X −3 3.1 − 3 2.99 − 3 < < P (2.99 < X < 3.01) = P 0.005 0.005 0.005 = P (−2.0 < Z < 2.0) From Table, P (Z < −2.0) = 0.0228. Due to symmetry of the normal distribution, we find that 1 − P (−2.0 < Z < 2.0) = 1 − P (Z < 2.0) + P (Z < −2.0) = P (Z > 2.0) + P (Z < −2.0) = 2(0.0228) = 0.0456. As a result, it is anticipated that on the average, 4.56% of manufactured ball bearings will be scrapped. 38 Example 10 The average grade for an exam is 74, and the standard deviation is 7. If 25% of the class is given A’s, and the grades are curved to follow a normal distribution what is the lowest possible A and the height possible B? Solution: In this example we begin with a known area of probability, find the z value, and the determine x from the formula x = zσ + µ. We require a z value that leaves 0.25 of the area to the right and hence, an area of 0.75 to the left. P (Z < 0.67) has the closet to 0.75. Hence x = 7 × 0.67 + 74 = 78.69. Therefore, the lowest A is 79. and the highest B is 78. 39 Normal Approximation to the Binomial 40 Theorem If X is a binomial random variable with mean µ = np and variance σ = npq = np(1 − p), then the limiting form of the distribution of X − np T = √ npq as n → ∞, is the standard normal distribution. Normal approximation of b(x; 15, 0.4). 41 Normal Approximation to the Binomial-I Let X be a binomial random variable with parameters n and p. Then X has approximately a normal distribution with µ = np and σ 2 = npq = np(1 − p) and P (X ≤ x) = x X b(k; n, p) (1) k=0 ≈ area under normal curve to the left of x + 0.5 (2) ! x + 0.5 − np ≈P Z≤ (3) √ npq where Z ∼ N (0, 1), and the approximation will be good if np and n(1 − p) are greater than or equal to 5. 42 Normal Approximation to the Binomial-II Let X be a binomial random variable with parameters n and p. Then X has approximately a normal distribution with µ = np and σ 2 = npq = np(1 − p) and P (x1 ≤ X ≤ x2) = x2 X b(k; n, p) k=x1 ≈ area under normal curve to the right of x1 − 0.5 and left of x2 + 0.5. ! x2 + 0.5 − np x1 − 0.5 − np ≈P ≤ Z ≤ √ √ npq npq where Z ∼ N (0, 1), and the approximation will be good if np and n(1 − p) are greater than or equal to 5. 43 Example 11 The probability that a patient recovers from a rare blood disease is 0.4. If 100 people are known to have contracted this disease, what is the probability that less than 30 survive? Solution: Let the binomial variable X represent the number of patients that survive. Since n = 100, we should obtain fairly accurate results using the normal-curve approximation with µ = np = 40 √ and σ = npq = 4.8994. P (X < 30) ≈ P (Z < −2.14) = 0.0162. 44 Example 12 A multiple-choice quiz has 200 questions each with 4 possible answers of which only 1 is the correct answer. What is the probability that sheer guesswork yields from 25 to 30 correct answers for 80 of the 200 problems about which the student has no knowledge? Solution: The probability of a correct answer for each of the 80 questions is p = 1 4 . If X represents the number of correct answers due to guesswork, then P (25 ≤ X ≤ 30) = 30 X b(x; 80, 1/4) x=25 Using the normal-curve approximation with µ = np = 20 and √ σ = npq = 3.873. P (25 ≤ X ≤ 30)) ≈ P (1.16 < Z < 2.71) = 0.1196. 45 Gamma and Exponential Distributions 46 Exponential and Gamma Distributions Exponential Distribution Describing the time until the occurrence of a Poisson event (or the time between Poisson events). Gamma Distribution Describing the time (or space) occurring until a specified number of Poisson events occur. 47 Gamma and Exponential Distributions Although the normal distribution can be used to solve many problems in engineering and science, there are still numerous situations that require different types of density functions. Two such density functions, the gamma and exponential distributions. The exponential and gamma distributions play an important role in both queuing theory and reliability problems. Time between arrivals at service facilities, and time to failure of component parts and electrical systems. 48 Gamma function The gamma function is defined by Γ(α) = Z ∞ 0 xα−1e−xdx, for α > 0. Prove: Γ(α) = (α − 1)Γ(α − 1) for α > 1. Integrating by parts with u = xα−1 and dv = e−xdx, we obtain Γ(α) = xα−1e−x|∞ 0 + Z ∞ 0 (α−1)xα−2e−xdx = (α−1) Z ∞ 0 xα−2e−xdx for α > 1 which yields the recursion formula Γ(α) = (α − 1)Γ(α − 1). Property √ 1 Γ = π. 2 49 Gamma Distribution The continuous random variable X has a gamma distribution, with parameters α and β, if the density function given by ( f (x; α, β) = 1 x β α Γ(α) α−1 e−x/β , x > 0; 0, elsewhere. where α > 0 and β > 0. E(X) = αβ and Var(X) = αβ 2. 50 Gamma Distribution Gamma distributions. 51 Exponential Distribution The continuous random variable X has an exponential distribution, with parameter β, if the density function given by ( f (x; β) = 1 e−x/β , x > 0; β 0, elsewhere. where β > 0. E(X) = β and Var(X) = β 2. 52 Example 13 What is the probability of an item surviving until t = 100 units if the item is exponentially distributed with a mean time between failure of 80 units? Given that the item survived to 200 units, what is the probability of survival until t = 300units? Solution: Let X denote the amount of time units that the item survives. X ∼ Exp(β = 80) 1. P (X > 100) = e−100/80. 2. P (X > 300|X > 200) = P (X > 100) = e−100/80. 53 Relationship to the Poisson Process Let X be the amount of time until the first arrival in a Poisson process with rate λ. Then X ∼ Exp(λ). Proof: Note that the number of arrivals in [0, t] is Poi(λt). F (x) = P (X ≤ t) (4) = 1 − P (no arrivals in [0, t]) (5) eλt(λt)0 =1− 0! = 1 − eλt. (6) (7) 54 Relationship to the Poisson Process The Poisson distribution is used to compute the probability of specific numbers of ”events” during a particular period of time or space. In many applications, the time period or span of space is the random variable. For example, an industrial engineer may be interested in modeling the time T between arrivals at a congested intersection during rush hour in a large city. An arrival represents the Poisson event. The poisson distribution was developed as a single-parameter distribution with parameter λ, where λ may be interpreted as the mean number of events per unit time. Consider now the random variable described by the time required for the first event to occur. Using the Poisson distribution, we find that the probability of no events occurring in the span up to time t is given by e−λt(λt)0 p(0; λt) = = e−λt. 0! 55 We can now make use of the above and let X be the time to the first Poisson event. The probability that the length of time until the first event will exceed x is the same as the probability that no Poisson events will occur in x. The latter, of course, is given by e−λt. As a result, P (X > x) = e−λx. Thus the cumulative distribution function for X is given by F (x) = P (0 ≤ X ≤ x) = 1 − e−λx. Now, in order that we recognize the presence of the exponential distribution, we may differentiate the cumulative distribution function above to obtain the density function d F (x) = λe−λx, f (x) = dx which is the density function of the exponential distribution with λ = 1/β. Example 14 Let X equal the number of bad records in each 50 feet of a used computer tape. Assume that X has a Poisson distribution with mean 1. Let W equal the number of feet before the first bad record is found. 1. Give the mean and variance of the number of flaws per foot. 2. How is W distributed? 3. Give the mean and variance of W . 4. Find P (W > 1001|W > 951). 56 Solution: 1. Let Y be the number of bad records per foot. 1 E(Y ) = Var(Y ) = . 50 2. W ∼ Exp(β = 50). That is, W has a probability density function 1 −w/50 f (w) = e , 50 if w > 0. 3. E(Y ) = 50, Var(Y ) = 2500. 4. Using memoryless property of an exponential distribution, we have P (W > 1001|W > 951) = P (W > 50) = e−1. Application of Gamma and Exponential Distributions 57 Application of Gamma and Exponential Distributions The application of the exponential distribution are in ”time to arrival” or time to Poisson event problems. Notice that the mean of the exponential distribution is the parameter β, the reciprocal of the parameter in the Poisson distribution. Recall: the Poisson distribution has no memory, implying the occurrences in successive time periods are independent. The important parameter β is the mean time between events. In reliability theory, where equipment failure often conforms to this Poisson process, β is called mean time between failures. Many equipment breakdowns do follow the Poisson process, and thus the exponential distribution does apply. Other applications include survival times in biomedical experiments and computer response time. 58 Example 15 Suppose that a system contains a certain type of component whose time, in years, to failure is given by T . The random variable T is modeled nicely by the exponential distribution with mean time to failure β = 5. If 5 of these components are installed in different systems, what is the probability that at least 2 are still functioning at the end of 8 years? Solution: The probability that a given component is still functioning after 8 years is given by P (T > 8) = Z ∞ 8 e−t/5dt = e−8/5 ≈ 0.2. If X represent the number of components functioning after 8 years. Then using the binomial distribution P (X ≥ 2) = 5 X b(x; 5, 0.2) = 0.2627. x=2 59 Memoryless Property for Exponential Distribution A nonnegative random variable X is called memoryless if for all s, t ≥ 0, P (X ≥ t) = P (X ≥ s + t|X ≥ s) The probability of lasting an additional t time units is the same as the probability of lasting t time units. The fact that it hasn’t happened yet, tells us nothing about how much longer it will take before it does happen. 60 To show that an exponential distribution is memoryless. Solution: P (X ≥ s + t, X ≥ s) P (X ≥ s) P (X ≥ s + t) = P (X ≥ s) e−(s+t)λ = e−sλ = e−tλ P (X ≥ s + t|X ≥ s) = = P (X ≥ t) 61 Example 16 The lifetime of a TV tube (in years) is an exponential random variable with mean 10. If Jim bought his TV set 10 years ago, what is the probability that its tube will last another 10 years? Solution: Let X be the lifetime of a TV tube. X ∼ Exp(β = 10). P (X > 20|X > 10) = P (X > 10) = e−1. 62 Example 17 Suppose that telephone calls arriving at a particular switchboard follow a Poisson process with an average of 5 calls coming per minute. What is the probability that up to a minute will elapse until 2 calls have come in to the switchboard? Solution: The poisson process applies with time until Poisson events following a gamma distribution with β = 1/5 and α = 2. Denote by X the time in minutes that transpires before 2 calls come. In other words, X ∼ Γ(2, 1/5). The required probability is given by Z 1 1 −x/β P (X ≤ 1) = e dx = 1 − e−5 ≈ 0.96. 0 β 63 Example 18 Based on extensive testing it is determined that the time Y in years before a major repair is required for a certain washing machine is characterized by the density function ( f (x) = 1 , y ≥ 0; 4 0, elsewhere. Note this ia na exponential with µ = 4 years. The machine is considered a bargain if it is unlikely to require a major repair before the sixth year. Thus, what is the probability P (Y > 6)? Also, what is the probability that a major repair occurs in the first year? 64 Solution: Consider the cumulative distribution function F (y) for the exponential distribution Z 1 y −t/β F (y) = e = 1 − e−y/β . β 0 So P (Y > 6) = e−3/2 = 0.2331. Thus, the probability that it will require major repair after year six is 0.233. Of course, it will require the repair before year six with probability 0.777. Thus, one might conclude the machine is not really a bargain. The probability that a major repair occurs in the first year is P (Y < 1) = 1 − e−1/4 = 0.221. 65 Chi-Squared Distribution 66 Chi-Squared Distribution The continuous random variable X has a chi-squared distribution, with υ degrees of freedom, if its density function is given by f (x; α, β) = 1 xυ/2−1e−x/2, x > 0; υ/2 2 Γ(υ/2) 0, elsewhere. where υ is a positive integer. This is very important special case of the gamma distribution is obtained by letting α = υ/2 and β = 2, where υ is a positive integer. E(X) = υ and Var(X) = 2υ. 67 Lognormal Distribution 68 Lognormal Distribution The continuous random variable X has a lognormal distribution if the random variable Y = ln(X) has a normal distribution with mean µ and standard deviation σ. The resulting density function of X is f (x; µ, σ) = − 12 [ln(x)−µ]2 1 √ e 2σ , x ≥ 0; σx 2π 0, 2 E(X) = eµ+σ /2 and x < 0. 2 2 Var(X) = e2µ+σ (eσ − 1). 69 Example 19 Concentration of pollutants produced by chemical plants historically are known to exhibit that resembles a lognormal distribution. This is important when one considers issues regarding compliance to government regulations. Suppose it is assumed that the concentration of a certain pollutant, in parts per million, has a lognormal distribution with parameters µ = 3.2 and σ = 1. What is the probability that the concentration exceeds 8 parts per million? 70 Solution: Let the random variable X be pollutant concentration per million. Then X ∼ LogN(µ = 3.2, σ 2), so the probability of interest is P (X > 8) = 1 − P (X ≤ 8) Since ln(X) has a normal distribution with mean µ = 3.2 and standard deviation σ = 1, ln(X) − 3.2 8 − 3.2 P (X ≤ 8) = P ≤ 1 1 = P (Z ≤ −1.12) = Φ(−1.12) = 0.1314. ! Here, we use the Φ notation to denote the cumulative distribution function of the standard normal distribution. As a result, the probability that the pollutant concentration exceeds 8 parts per million is 0.1314. 71 Example 20 The life, in thousands of miles, of a certain type of electronic control for locomotives has an approximate lognormal distribution with µ = 5.149 and σ = 0.737. Find the 5th percentile of the life of such locomotive. Solution: Since P (Z < −1.645) = 0.05. Denote by X the life of of such a locomotive. Since ln(X) has a normal distribution with mean µ = 5.149 and σ = 0.737, the 5th percentile of X can be calculated ln(x) = 5.149 + 0.737 · (−1.645) = 3.937. Hence, x = 51.265. This means that only 5% of the locomotives will have lifetime less than 51,265 mile. 72
© Copyright 2024 ExpyDoc