Tail negative dependence and its applications for aggregate loss modeling Lei Hua∗ October 19, 2014 Abstract. Tail order of copulas can be used to describe the strength of dependence in the tails of a joint distribution. When the value of tail order is larger than the dimension, it may lead to tail negative dependence. First of all, we prove results on conditions that lead to tail negative dependence for Archimedean copulas. Then we construct new copulas that possess upper tail negative dependence. In particular, a copula based on a scale mixture with a generalized gamma random variable (GGS copula) is useful for modeling asymmetric tail negative dependence structures. Finally, we apply mixed copula regression based on the GGS copula to aggregate loss modeling for a medical expenditure panel survey dataset. We find that there exists upper tail negative dependence between loss frequency and loss severity for this dataset, and the introduction of tail negative dependence structures significantly improves the aggregate loss modeling. Key words: Tail order, scale mixture, loss frequency, loss severity, MEPS data, Archimedean copula, GGS copula. 1 Introduction As more data becomes available, many new meaningful dependence patterns are present. One may find that existing statistical models may not be able to well capture those new dependence structures any more. To this end, copula provides a very flexible tool. Then the challenge is often to create a new copula family that is not only suitable for describing new dependence patterns, but also computable and easy to implement. Motivated by a unique asymmetric tail negative dependence structure appears in a medical expenditure dataset, we find the necessity of developing new statistical models to account for such ∗ [email protected], Division of Statistics, Northern Illinois University, DeKalb, IL, 60115, United States. 1 dependence structures. In actuarial science, aggregate loss modeling has been a very important question. How to model them appropriately is extremely important for insurers or governments to assess and predict the associated costs. During a policy term, say one year, there may be several loss events associated with an insurance policy; for each loss event, there is an amount of expense. The former corresponds to loss frequency and the latter corresponds to loss severity. There are some ways to do aggregate loss modeling by considering loss frequency and loss severity separately. In what follows, we list two methods, and we refer to Frees (2010) for more details about regression analysis for insurance applications. Let a random variable Y be loss severity and a random variable N be loss frequency. One way is to model the loss frequency N and the conditional loss severity Y |N > 0 separately by regression models such as generalized linear models, and then their product is treated as the aggregate loss. Another way is to model the frequency and severity simultaneously by a mixture model, say the Tweedie model if the data for aggregate losses is available. It is assumed in the first way that N and Y |N > 0 are independent, and in the second way that N and Y are independent. Recently, Gschl¨oßl and Czado (2007) finds that the independence assumption that is often assumed in the literature between loss severity and frequency may not hold, and then copula models have been employed to account for the dependence structure between loss frequency and severity in Czado et al. (2012) and Kr¨amer et al. (2013). The latter two papers incorporate some commonly used parametric copula families into respective regression models for loss frequency and severity, and then they claim that there is a moderate positive dependence between loss severity and frequency in a German auto insurance claim dataset. The method aforementioned can also be applied to other datasets as long as there is a suitable parametric copula family used to capture the dependence structure. Based on a Medical Expenditure Panel Survey (MEPS) dataset1 from Agency for Healthcare Research and Quality (AHRQ), we find a unique dependence pattern between loss severity and frequency. In general, average expenses per visit are independent with the number of visits during each year. However, when patients use medical services more frequently, the average costs per visit and the number of visits tend to be more negatively dependent. To the best of our knowledge, all the commonly used copulas can not capture this unique dependence pattern. Therefore, we need to develop new copulas that can capture different degrees of upper tail negative dependence and keep the rest parts approximately independent. After a suitable copula being constructed, we will be able to incorporate the new copula into the mixed copula regression model developed in Czado et al. (2012) and Kr¨amer et al. (2013) for the aggregate loss modeling. We refer the interested reader to the first panel of Figure 4 for an impression of the upper tail negative dependence pattern in the MEPS dataset. 1 http://meps.ahrq.gov/mepsweb/index.jsp 2 After considering the families of extreme value copula, elliptical copula and Archimedean copula, we find that only Archimedean copula is suitable for our purpose. Moreover, in order to get tail negative dependence for an Archimedean copula, we shall use a scale mixture approach studied in McNeil and Neˇslehov´a (2009) for constructing the Archimedean copula. Tail behavior of Archimedean copulas has been studied in Charpentier and Segers (2009), Hua and Joe (2011, 2013) and Larsson and Neˇslehov´a (2011), but none of them give the conditions for tail negative dependence. Our main contributions in this paper are the following: first of all, we prove general conditions that lead to upper tail negative dependence for an Archimedean copula, which also generalize some results in Hua and Joe (2011, 2013); secondly, we construct some new Archimedean copulas and study their properties, and one of these copulas is very useful in modeling the unique asymmetric tail negative dependence pattern appears in the MEPS dataset; finally, we implement our new copula into the mixed copula regression analysis and conduct a data analysis for the medical expenditure dataset, and find that the new copula can significantly improve on the aggregate loss modeling. In what follows, we first briefly introduce some basic concepts and notation in Section 2. We will explore in Section 3 how to construct a desirable asymmetric tail negative dependence structure based on the notion of tail order. Some parametric copulas will be constructed. In particular, a new two-parameters Archimedean copula family based on generalized Gamma simplex mixtures will be studied, and it is useful for modeling such an asymmetric upper tail negative dependence structure. In Chapter 4, we implement the new copula into the mixed copula regression model and conduct aggregate loss modeling for a medical expenditure dataset from the United States. By the introduction of the new tail negative dependent copula, the methodology can significantly improve the aggregate loss modeling. Finally, we conclude the paper in Section 5. 2 Preliminaries Due to its growing popularity in the last decade and its flexibility in modeling non-Gaussian dependence structures, the notion of copula has been used widely in the actuarial literature. A copula C : [0, 1]d → [0, 1] for a d-dimensional random vector can be defined as C(u1 , . . . , ud ) = F (F1−1 (u1 ), . . . , Fd−1 (ud )), where F is the joint cumulative distribution function (cdf), Fi is the univariate cdf for the ith marginal, and Fi−1 is the generalized inverse function defined as Fi−1 (u) = inf{x : Fi (x) ≥ u}. We refer to Joe (1997) and Nelsen (2006) for references of copulas. For a copula C, the lower tail order of C is defined as a constant κL such that C(u, . . . , u) ∼ κL u `(u) as u → 0+ , where the notation g(x) ∼ h(x) as x → x0 means that limx→x0 g(x)/h(x) = 1, and `(x) is a slowly varying function as x → 0+ . For a measurable function g : R+ → R+ , 3 if for any constant r > 0, limx→0+ g(rx)/g(x) = 1, then g is said to be slowly varying at 0+ , denoted as g ∈ RV0 (0+ ); if for any constant r > 0, limx→∞ g(rx)/g(x) = rα , α ∈ R, then g is said to be regularly varying at ∞ with variation exponent α, and is denoted as g ∈ RVα . For a random variable X, we usually use FX to represent the cdf of X. When we say that X is regularly varying at ∞, it actually means that the survival function F X ∈ RVα with some variation exponent α < 0. Similarly, the upper tail order of C is defined as a constant κU such that C(1 − u, . . . , 1 − u) ∼ uκU `(u) as u → 0+ , where C is the survival function of C. Tail order is a flexible quantity to capture the degree of dependence in the tails, and it can be used for upper and lower tails respectively, and can be used to quantify the strength of dependence ranging from tail negative dependence to tail positive dependence. The range of tail order κ is that 1 ≤ κ ≤ ∞, and generally speaking, a smaller κ implies a stronger dependence in the tail. For the bivariate case, if the tail order κ > 2, then there is negative dependence in the tail. We refer to Hua and Joe (2011) for more details about the notion of tail order. In order to capture the reflection asymmetric tail dependence pattern (i.e., the upper and lower tails are different) appears in the left panel of Figure 4, we need to construct a bivariate copula of which the upper tail order is greater than 2, and the lower tail order is allowed to be close to 2. For elliptical copulas, the upper and lower tails are symmetric, so they are not suitable. For a bivariate extreme value copula, the upper tail order is either 1 or 2, so it can not be an upper tail negative dependent copula. For an Archimedean copula, it can be written as the following form C(u1 , . . . , ud ) = ψ(ψ −1 (u1 ) + · · · + ψ −1 (ud )), (1) where ψ −1 is the inverse of the generator ψ. We have mainly two ways of constructing an Archimedean copula. One way is based on the Laplace transform (LT) of a positive random variable. Namely, let the generator ψ in (1) be the LT of a positive random variable. That is, Z ψ(s) = ∞ exp{−sy}FY (dy), s ≥ 0, 0 where Y is a positive random variable. It is well known that such a generator ψ is completely monotone and can be used to construct an Archimedean copula for any dimension. The other way is based on the survival copula for a scale mixture with a uniform distribution on a simplex (see d McNeil and Neˇslehov´a (2009)). More specifically, if a random vector X := (X1 , . . . , Xd ) = R × d (S1 , . . . , Sd ) satisfies some regularity conditions, where = means “equality in distribution”, then the survival copula of X is an Archimedean copula of dimension d. We will use this representation throughout the paper, and a more formal introduction to it is at the beginning of Chapter 3. In Chapter 3, we will prove that the tail behavior of the above random variable R will af4 fect the strength of dependence in the tails for the corresponding Archimedean copula. In order to characterize the tail behavior of a univariate random variable such as R, a well-developed and mathematically tractable way is to consider which maximum domain of attraction (MDA) to which the univariate random variable belongs. For example, a Gamma random variable belongs to the MDA of Gumbel, and a Pareto random variable belongs to the MDA of Fr´echet. More mathematically, a random variable X is said to belong to the MDA of an extreme value distribution H if there exist normalizing constants σn > 0 and µn ∈ R such that d (Mn − µn )/σn → H, n → ∞, where Mn is the first order statistics (i.e., maximum) of a random sample of X with sample size d n, and → means “convergence in distribution”. This is written as X ∈ MDA(H). It is well known that there are only three non-degenerate univariate extreme value distributions: Fr´echet, Gumbel and Weibull. Roughly speaking, MDA of Fr´echet (denoted as MDA(Φα ), where α is the shape parameter of the Fr´echet distribution) includes univariate distributions that have heavier right distributional tails, MDA of Gumbel (denoted as MDA(Λ)) consists of univariate distributions that have lighter right distributional tails, while MDA of Weibull corresponds to bounded random variables that are often irrelevant to actuarial applications. We refer to Embrechts et al. (1997) for a classical reference on the concepts of MDA and general extreme value theory, and relevant applications in insurance and finance. 3 Tail negative dependence It is clear from the brief discussion in Chapter 2 that, extreme value copula and elliptical copula are not suitable for constructing a dependence structure that has asymmetric tail dependence and tail negative dependence simultaneously. So, we will focus on Archimedean copula in this chapter. To provide a parametric Archimedean copula that has a simple form, one often considers ψ to be a LT of a positive random variable. We refer to Joe and Hu (1996) and Joe (1997) for many implementable parametric Archimedean copulas. However, Archimedean copulas generated by such LTs do not provide tail negative dependence. For a bivariate random vector (X1 , X2 ), if P[X1 ≤ x1 , X2 ≤ x2 ] ≥ P[X1 ≤ x1 ]P[X2 ≤ x2 ] for any x1 , x2 ∈ R, then (X1 , X2 ) is said to be positive quadrant dependent (PQD). If a bivariate Archimedean copula is constructed by the LT of a positive random variable, then the copula is PQD and thus positive upper quadrant dependent (PUQD) and positive lower quadrant dependent (PLQD) (see Chapter 2.1.1 of Joe (1997)), and moreover, the tail orders κ ≤ 2 for both upper and lower tails (Proposition 2 of Hua and Joe (2011)). So Archimedean copulas based on the LT of a positive random variable can 5 not be used to construct such an asymmetric upper tail negative dependence structure. However, if an Archimedean copula is derived from the survival copula of a scale mixture with a uniform distribution on the simplex, then we will show that conditions on the mixing random variable can lead to a very flexible tail for the corresponding copula, which can be tail dependent, intermediate tail dependence (Hua and Joe, 2011) and even tail negative dependent. Instead of using a LT of a positive random variable, one can also construct an Archimedean copula by other ways, as long as the generator ψ satisfies certain regularity conditions (see Malov (2001) or McNeil and Neˇslehov´a (2009)). In McNeil and Neˇslehov´a (2009), an Archimedean copula can be the survival copula of a random vector d X := (X1 , . . . , Xd ) = R × (S1 , . . . , Sd ), (2) where R and (S1 , . . . , Sd ) are independent, R is a positive random variable and (S1 , . . . , Sd ) is P uniformly distributed on the simplex {x ∈ Rd+ : i xi = 1}. In this case, ψ can be the Williamson d-transform of the cdf FR with FR (0) = 0. That is, Z ψ(s) = ∞ (1 − s/r)d−1 FR (dr), s ∈ [0, ∞). (3) s In Section 3.1, we will prove that the tail behavior of R will affect the strength of dependence in the tails of the Archimedean copula, and upper tail negative dependence can be derived from the representation (2). 3.1 Conditions In Hua and Joe (2013), we find that when the right tail of 1/R follows a power law, then a lighter right tail of 1/R tends to increase the upper tail order of the associated Archimedean copula, thus decreasing the degree of positive dependence in the upper tail. In what follows, unless otherwise specified, the tail of a univariate random variable or distribution is always referred to the right distributional tail. From Hua and Joe (2011), we know that if tail order κ > d, where d is the dimension, then the copula may have tail negative dependence. So, in order to get tail negative dependence, we shall decrease the tail heaviness of the random variable 1/R. However, by observing Example 4 of Hua and Joe (2013), even if 1/R has a very light tail, it can not provide tail negative dependence. The reason is not that the tail of 1/R is not sufficiently light, but that the Archimedean copula is constructed by the LT of a positive random variable. So, in the following Proposition 1, we will instead use the scale mixture method to construct Archimedean copulas so that tail negative dependence can be obtained. In this section, all distribution functions and density functions are assumed to be ultimately 6 monotone to the left and right endpoints; this condition is very mild and satisfied by all the commonly used distributions. Since the theoretical results developed in this section are for distributional tails, without loss of generality, we further assume that the cdfs of the marginal distributions are all continuous so that the copula is uniquely determined to avoid cumbersome arguments. Proposition 1 Suppose a random vector X := (X1 , . . . , Xd ) is defined as in (2). If 1/R ∈ MDA(Φα ) and E[1/R] < ∞, then the lower tail order of X is κ = α; that is, the upper tail order of the corresponding Archimedean copula is α. Proof: Let F be the identical univariate cdf for Xi ’s, and C be the copula for X. Since the survival copula for X is an Archimedean copula, in order to study the upper tail of the Archimedean copula, it suffices to study the lower tail for X. Due to Equation (1) of Hua and Joe (2013), the upper tail order κ of the Archimedean copula can be derived as κ = lim+ u→0 log(C(F (x), . . . , F (x))) log(P[X1 ≤ x, . . . , Xd ≤ x]) log C(u, . . . , u) = lim+ = lim+ . x→0 x→0 log(u) log(F (x)) log(P[X1 ≤ x]) Letting T := 1/R, y = 1/x and s∗ = max{s1 , . . . , sd }, then P[X1 ≤ x, . . . , Xd ≤ x] Z P[T ≥ s∗ /x]FS (ds1 , . . . , dsd ). = P[RS1 ≤ x, . . . , RSd ≤ x] = s≥0,||s||1 =1 Since T ∈ MDA(Φα ), P[T ≥ ·] ∈ RV−α and there exists a slowly varying function `(·) such that P[T ≥ t] = t−α `(t). Therefore, log(P[X1 ≤ x, . . . , Xd ≤ x]) x→0 log(P[X1 ≤ x]) R log P[T ≥ y] × s≥0,||s||1 =1 P[T ≥ s∗ y]/P[T ≥ y]FS (ds1 , . . . , dsd ) R = lim 1 y→∞ log 0 P[T ≥ s1 y]FS1 (ds1 ) R log (P[T ≥ y]) + log s≥0,||s||1 =1 P[T ≥ s∗ y]/P[T ≥ y]FS (ds1 , . . . , dsd ) Ry = lim y→∞ − log(y) − log(B(1, d − 1)) + log 0 P[T ≥ x](1 − x/y)d−2 dx R −α log y + log(`(y)) + log s≥0,||s||1 =1 P[T ≥ s∗ y]/P[T ≥ y]FS (ds1 , . . . , dsd ) Ry = lim y→∞ − log(y) − log(B(1, d − 1)) + log 0 P[T ≥ x](1 − x/y)d−2 dx κ = lim+ = α, (4) (5) where B(·, ·) is a Beta function. Equation (4) is implied by the fact that univariate marginals of 7 S is distributed as Beta(1, d − 1) (Ferguson, 1973). Equation (5) holds due to the following: (a) 0 < limy→∞ condition; Ry 0 P[T ≥ x](1 − x/y)d−2 dx ≤ limy→∞ Ry 0 P[T ≥ x]dx = E[T ] < ∞ from the (b) by Proposition 1.3.6 (i) of Bingham et al. (1987), limy→∞ log(`(y))/ log(y) = 0; (c) since 1/d ≤ s∗ ≤ 1, and as y → ∞, P[T ≥ s∗ y]/P[T ≥ y] → s−α uniformly in s∗ ∈ [1/d, 1], ∗ Z P[T ≥ s∗ y] lim log FS (ds1 , . . . , dsd ) y→∞ s≥0,||s||1 =1 P[T ≥ y] Z −α = log s∗ FS (ds1 , . . . , dsd ) ≤ α log (d) < ∞. s≥0,||s||1 =1 That is, κ = α, which completes the proof. Remark 1 In Proposition 1, the condition on the random variable R has the following equivalent relationships: 1/R ∈ MDA(Φα ) ⇐⇒ F 1/R ∈ RV−α ⇐⇒ FR ∈ RVα (0+ ) Remark 2 Proposition 1 generalizes Proposition 6 in Hua and Joe (2013), where only intermediate tail dependence has been studied. Moreover, we use a different method to prove Proposition 1 in this paper and the proof is shorter than that in Hua and Joe (2013). From Example 4 in Hua and Joe (2013), we notice that, for a d-dimensional Archimedean copula constructed by the LT of an inverse Gamma random variable with the shape parameter α, the corresponding R in the sense of (2) can not satisfy that FR ∈ RVα (0+ ) for α > d. This is a reason why this copula does not have tail negative dependence. More generally, we want to know whether multivariate Archimedean copula constructed by the LT of a positive random variable can have upper tail order that is larger than the dimension of the copula. As discussed in Section 1, this is not true for a bivariate Archimedean copula constructed by the LT of a positive random variable, as such a copula is PQD and thus PUQD and positive lower quadrant dependent (PLQD). So, for the bivariate case, both upper and lower tails of such an Archimedean copula can not have tail negative dependence. For the multivariate case with dimension d ≥ 2, we know that an Archimedean copula constructed by the LT of a positive random variable is positive lower orthant dependent (PLOD) (see Corollary 4.6.3 of Nelsen (2006)), therefore, by Proposition 2 of Hua and Joe (2011), the lower tail order must be less than or equal to the dimension d for an Archimedean copula constructed by the LT of a positive random variable. However, for the upper tail, we do not know whether such an multivariate Archimedean copula is positive upper orthant dependent (PUOD). The following result implies 8 that the upper tail order of a d-dimensional Archimedean copula constructed by the LT of a positive random variable must be less than or equal to d. Corollary 2 Let C be a d-dimensional Archimedean copula constructed as (1) with ψ being the LT of a positive random variable Y . 1. If F Y = o(F W ), where W ∼ Inverse-Gamma(d, 1), then the upper tail order κU of the copula C exists, and κU = d. 2. If F Y ∈ RV−α for an α such that d ≥ α > 1, then the upper tail order κU of C exists, and κU = α. 3. If F Y ∈ RV−1 and E[Y ] < ∞, then the upper tail order κU of C exists, and κU = 1. Proof: From McNeil and Neˇslehov´a (2009), we know that for an Archimedean copula constructed by the LT of a positive random variable, one can also write it as the survival copula for the random vector in (2) with a scaling random variable R, and the relationship between R and Y is d that R and Y are independent and R = Gamma(d, 1)/Y , or equivalently 1 1 d =Y × =: Y W, R Gamma(d, 1) where W is distributed as Inverse-Gamma(d, 1). Since Z ∞ F W (w) = w 1 1 −d−1 x exp{−1/x}dx = Γ(d) Γ(d) 1 ∼ w−d , dΓ(d) w → ∞, Z 0 1/w td−1 exp{−t}dt = 1 γ(d, 1/w) Γ(d) (6) where Γ(·) is the Gamma function, γ(·, ·) is the lower incomplete Gamma function and the asymptotic equivalence is referred to Abramowitz and Stegun (1964), F W ∈ RV−d . To prove 1, since F Y = o(F W ) and F W ∈ RV−d , by the corollary of Theorem 3 in Embrechts and Goldie (1980), F 1/R ∈ RV−d and thus E[1/R] < ∞ as d = 2, 3, . . . . Then, by Proposition 1, κU = d. To prove 2, if F Y ∈ RV−d , then by the corollary of Theorem 3 in Embrechts and Goldie (1980), F 1/R ∈ RV−d , so the claim is proved. If F Y ∈ RV−α with d > α > 1, then clearly F W = o(F Y ) and thus F 1/R ∈ RV−α and E[1/R] < ∞. Proposition 1 leads to the claim. To prove 3, it is similar to the second case, but we need the extra condition E[Y ] < ∞ so that E[1/R] = E[Y ]E[W ] < ∞, which completes the proof. 9 Remark 3 Corollary 2 supplements Proposition 4 in Hua and Joe (2011) where the conditions are proposed on the LT of Y instead of the survival function F Y of Y . However, we shall note that the condition in case 3 of Corollary 2 is only a sufficient but not a necessary condition for the upper tail order being equal to 1. Based on Proposition 4 in Hua and Joe (2011) and Proposition 3 of Hua and Joe (2012), even when the right tail of F Y is heavier so that E[Y ] does not exist, one may still get that κU = 1. From Proposition 1, we find that when the right tail of 1/R becomes lighter, the upper tail order of the corresponding Archimedean copula becomes larger, and thus the dependence in the upper tail becomes weaker and even negative dependence. When 1/R ∈ MDA(Λ), the right tail of 1/R becomes even lighter. In this case, we may naturally expect that the upper tail order of the corresponding Archimedean copula may be even larger or infinite. The following is the result in this sense. Proposition 3 Suppose a random vector X := (X1 , . . . , Xd ) is defined as in (2). If 1/R ∈ MDA(Λ), then the lower tail order of X is κ = ∞; that is, the upper tail order of the corresponding Archimedean copula is κU = ∞. Proof: Let T = 1/R, and thus E[T ] < ∞. From the proof of Proposition 1, we can write P[T ≥ s y]/P[T ≥ y]F (ds , . . . , ds ) ∗ S 1 d s≥0,||s||1 =1 Ry κ = lim y→∞ − log(y) − log(B(1, d − 1)) + log 0 P[T ≥ x](1 − x/y)d−2 dx R log (P[T ≥ y]) + log s≥0,||s||1 =1 1FS (ds1 , . . . , dsd ) Ry ≥ lim y→∞ − log(y) − log(B(1, d − 1)) + log P[T ≥ x](1 − x/y)d−2 dx 0 log(P[T ≥ y]) = lim = ∞, y→∞ − log(y) log (P[T ≥ y]) + log R which completes the proof. 3.2 Examples After proving the above propositions, we will have many choices of parametric distributions for the random variable R, because MDA(Fr´echet) and MDA(Gumbel) are very large classes of distributions (Embrechts et al., 1997). In this section, we give some examples of parametric copulas that have upper tail negative dependence. d Example 1 (Inverse-Pareto - Simplex copula, aka, IPS copula) Let Xi = RSi , i = 1, 2, (S1 , S2 ) be uniformly distributed on {x ≥ 0 : x1 + x2 = 1}, and T := 1/R follow a Pareto distribution 10 with cdf F (x) = 1 − (1 + x)−α , x ≥ 0, α > 1. Then the generator as defined in (3) for the Archimedean copula is ψ(s) = s 1 − (1 + 1/s)−α+1 + 1, 1−α s ≥ 0, α > 1. Clearly, F 1/R ∈ RV−α , α > 1. Therefore, the upper tail order of the survival copula for (X1 , X2 ) is κU = α. Depending on the value of α > 1, this upper tail ranges from intermediate tail dependence to tail negative dependence as the dependence parameter α becomes larger. Figure 1 shows some contour plots for the IPS copula. It is clear that: (1) when 1 < α < 2, the IPS copula has intermediate tail dependence in the upper tail; (2) when α = 2, the upper tail looks like independence; (3) when α > 2, the upper tail appears to be negatively dependent, and a larger α indicates stronger negative dependence. For the lower tail, since ψ ∈ RV−1 , by Proposition 6 of Hua and Joe (2011), the IPS copula always has lower tail order κL = 1, which is also consistent to the contour plots in Figure 1. Figure 1: Normalized contour plots of the IPS copula α = 1.1 α = 1.5 0.04 1 2 −2 −1 0.1 4 2 −2 −1 0 0.02 2 1 −1 0 1 2 4 −1 0.1 0.08 0.04 −2 0.04 −2 0.1 0 0.1 0.1 0 −1 0.1 0.08 −2 1 2 0.12 4 0.12 6 0.14 0.04 1 0.06 0.06 0.08 0 2 0.02 2 0.02 1 1 0 1 α = 20 0.1 −1 −1 0 α = 10 6 0.1 −2 0.04 α=5 0.12 −1 0.1 0.08 −2 0.04 −2 −2 2 0 0 4 −1 0.1 0.08 0.06 −2 0.16 0.1 0 0.16 0 −1 8 0.06 0.12 0.16 0.12 14 0. 0.0 1 1 1 0.1 2 −1 0.06 0.06 1 0. −2 0.02 2 0.02 2 2 0.02 α=2 1 2 −2 −1 0 However, in order to be useful for analyzing the expenditure dataset, the candidate copulas should possess not only upper tail negative dependence but also a lower tail that is close to independence. Further investigation will be conducted to seek simple forms of the corresponding Williamson’s d-transform ψ that can lead to such an Archimedean copula. The effect of R on 11 the lower tail of an Archimedean copula is referred to Larsson and Neˇslehov´a (2011). After considering the upper and lower tails together, we construct a copula in Example 2. d Example 2 (Generalized-Gamma - Simplex mixture, aka, GGS copula) Let Xi = RSi , i = 1, 2, (S1 , S2 ) be uniformly distributed on {x ≥ 0 : x1 + x2 = 1}, and R1/β follow a Gamma distribution with shape parameter α so that 1 FR (x) = βΓ(α) Z x sα/β−1 exp{−s1/β }ds α > 0, β > 0, (7) 0 and the Archimedean generator is Z ∞ (1 − s/r)FR (dr) = ψ(s) = s 1 Γ(α, s1/β ) − sΓ(α − β, s1/β ) , Γ(α) (8) where Γ(·, ·) is an upper incomplete gamma function. Note that, although Γ(0) = ∞, the case with α = β is also implementable. Some contour plots of the GGS copula are illustrated in Figure 2. 1 Clearly, as x → 0+ , FR (x) ∼ αΓ(α) xα/β . Then by Proposition 1, the upper tail order of the corresponding Archimedean copula is κU = max{α/β, 1}; note that, Corollary 2 of Larsson and Neˇslehov´a (2011) shows that there is upper tail dependence if α < β. If R ∈ MDA(Λ) with an auxiliary function a(·), then by Proposition 7 of Larsson and Neˇslehov´a (2011), the lower tail order κL = 21−γ , where γ is the index such that the auxiliary function a ∈ RVγ . Since a Weibull distribution with cdf 1 − exp{−x1/β }, β > 0 belongs to MDA of Gumbel, and an auxiliary function is a∗ (x) = βx1−1/β (Embrechts et al., 1997), and a∗ ∈ RV1−1/β . Then by Lemma 1 of Larsson and Neˇslehov´a (2011), a∗ can also be an auxiliary function for the survival function of R. Therefore, γ = 1 − 1/β, and κL = 21/β . Therefore, this copula can provide a very flexible upper and lower tails, ranging from positive to negative dependence. Note that, when α = 2 and β = 1, κL = κU = 2, and moreover, ψ(s) = Γ(2, s) − sΓ(1, s) = exp(−s), which is the generator of the independence copula. So the independence copula is a special case of the GGS copula. We can also re-parameterize the GGS copula by the upper and lower tail orders; that is, α = κU ln(2)/ ln(κL ), and β = ln(2)/ ln(κL ). d Example 3 (Inverse-Gamma-simplex, Example 2 of McNeil and Neˇslehov´a (2010)) Let Xi = RSi , i = 1, 2, (S1 , S2 ) be uniformly distributed on {x ≥ 0 : x1 + x2 = 1}, and 1/R follow a Gamma distribution with shape parameter α and scale parameter 1 so that the generator of the Archimedean copula is ψ(s) = γ(θ, 1/s) sγ(θ + 1, 1/s) − . Γ(θ) Γ(θ) 12 Figure 2: Normalized contour plots of the GGS copula 2 −2 −1 1 −2 0 6 0 0.1 −1 1 2 −1 0 2 0.05 5 0.1 2 5 2 0. 3 0. 0. 14 0 6 0 1 α = 2; β = 6 0.04 0. 0.1 −2 0.1 14 6 0 0 0.14 2 8 0.0 1 1 0.08 2 0.1 1 0 α = 2; β = 3 2 2 0.08 1 −1 12 0.04 −2 −2 2 0.04 0.04 0. 1 1 0 0.06 0.02 α = 2; β = 2 0. .08 2 1 0.1 0. 1 0 α = 2; β = 1 0.14 −1 −1 −2 0.04 4 1 0. 1 0 0 −1 −2 16 0. 12 0. −1 0.1 0.12 0.14 0.0 8 0.1 0.06 0.08 0 1 1 0. 1 0.25 0.2 0.02 0.04 0.06 0.15 α = 30; β = 10 2 2 0.02 0.05 −2 α = 30; β = 4 2 α = 30; β = 2 2 α = 30; β = 1 −2 −1 0 1 2 0.02 −2 −1 0 1 2 0.02 −2 0.2 0.1 6 −2 0.02 0.0 −2 −2 −2 0.06 −1 0.1 0.06 0.1 −1 −1 −1 0.12 −1 0 1 2 −2 −1 0 1 2 By Proposition 3, since 1/R ∈ MDA(Gumbel), the upper tail order is κU = ∞, which implies that the upper tail is always negatively dependent. For the lower tail, since R follows an inverse Gamma distribution with shape parameter α and scale parameter 1, due to (6), F R ∈ RV−α . Therefore, by Theorem 1 (a) of Larsson and Neˇslehov´a (2011), the corresponding generator ψ ∈ RV−α , which implies lower tail dependence (see Hua and Joe (2011); Larsson and Neˇslehov´a (2011)). The limitation of this copula is that there are no parameters to control the upper tail order that always has tail negative dependence with κU = ∞. 4 4.1 Aggregate loss - data analysis Introduction The dataset we are analyzing is based on Panel 14 and Panel 15 for the calendar year of 2010 from the 2010 Full Year Consolidated Data File. The dataset was collected on a nationally representative sample of the civilian noninstitutionalized population of the United States. To illustrate the empirical observation of upper tail negative dependence, we now consider the variables of the number of outpatient department visits to physicians in 2010 (OPDRV10) and the associated facility expenses (OPVEXP10). The average expense per visit used in the data analysis is calculated by the ratio between OPVEXP10 and OPDRV10. We use the average expense as loss severity and the number of visits as loss frequency. There are 32,846 individuals totally, and 2,263 of them 13 have positive number of outpatient visits to physicians and positive facility expenses. Descriptive statistics of the variables are in Table 1. The scatter plot on the original scale for average expense and number of visits is Figure 3, which may suggest an independence structure between the two variables. However, the dependence pattern is not clear using the original scale. Table 1: Summary of the variables Min 1st quantile Median Mean 3rd quantile Max OPVEXP10 3 187 704 2373 2356 68370 Average Expense 3 132 406 1460 1573 36680 Age 0 29.5 50 46.44 64 85 Number of Visits 1 2 3 4 5 6 7 8 9 10 OPDRV10 (#obs) 1461 394 144 73 58 26 29 9 13 9 11 12 13 14 15 16 17 18 19 20 5 5 5 2 4 1 1 2 2 1 21 22 23 25 28 29 31 32 33 35 1 1 1 1 2 1 1 1 1 1 38 40 42 46 48 65 98 1 1 2 1 1 1 1 Insurance Coverage Any Private (1) Public Only (2) Uninsured (3) INSCOV10 (#obs) 1356 757 150 Race Hispanic (1) Black (2) Asian (3) Other (4) RACETHNX (#obs) 433 440 91 1299 In order to visualize the dependence pattern more intuitively, we add tiny random noises (Normal(0, 1)/1000) on the numbers of visits to make them continuous, and then transform the expenses and continuitized number of visits respectively into normal scores that are distributed as a standard Normal distribution. One can also use some other techniques such as jitters or the technique used in getting the normalized QQ plot for discrete variables in Figure 5. Then, their dependence pattern is illustrated in the left panel of Figure 4. Although the plot is not based on the original data, the pattern of upper tail negative dependence suggests that there may be such a pattern in the original data as well. It seems that there is tail negative dependence in the upper tail, and independence in the other parts. The reason could be that there may be some flat fees or overhead charges so that the more frequent the visits the lower the average costs per visit. Following the approach proposed in Czado et al. (2012) and Kr¨amer et al. (2013), we now use a mixed copula model to conduct a regression analysis for the aggregate losses. We use the Zipf’s distribution (Zipf, 1932) to model the loss frequency, a lognormal model for the loss severity, and the GGS copula to model the dependence structure between loss severity and loss frequency. Since our proposed copula model is able to account for the unique upper tail negative dependence 14 Figure 3: Scatter plot on the original scale. Scatter plot 30000 ● ● ● ● 10000 20000 ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●● ●● ●●● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ●● ●●● ●● ●● ●● ●● ● ●● ●● ● ●● ●● ● ●●●●●●●● ●● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● 0 Average expense per visit ● 0 20 40 ● ● 60 80 100 Number of visits per year Figure 4: Asymmetric upper tail negative dependence between loss frequency and severity. In the left plot, the marginals are transformed to standard normals; in the right plot, the pseudo data is fitted by the GGS copula. It is clear that when the number of visits is larger, the relation between number of visits and average expenses becomes more negatively dependent, while the rest parts seem to be independent. That is, there is upper tail negative dependence between the loss frequency and severity data. GGS copula fitted −3 −2 −1 0 1 2 2 ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●●●●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●●● ● ●● ●● ● ●● ● ● ●●● ● ● ●●● ● ● ●●●● ●● ●● ● ● ● ● ● ●● ● ●●● ● ● ●●●● ●● ●● ●● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ● ● ● ●●● ●● ● ●●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ●●● ● ●●● ● ●●● ●●● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ●● ●● ● ●●●●●● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ●●●●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ●● ● ● ●● ●●●● ● ● ●● ●● ● ●● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ● ● ●● ● ●●● ●● ●●● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ●● ●●●● ●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ●● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ●●● ● ●●● ●● ●● ● ●●● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ● ●●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ●● ● ● ●●● ●● ●● ● ● ●● ● ● ● ● ●●●● ● ● ●● ●● ● ●● ●● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● 1 0.08 0 0.12 0.14 −1 0.1 0.06 0.04 −2 2 1 0 −1 −2 −3 Normalized average expense 3 Normalized scatter plot 3 0.02 −2 normalized number of visits per year 15 −1 0 1 2 pattern, where larger losses occur, the regression analysis based on the GGS copula is expected to outperform the analysis based on the existing copulas. For both loss frequency (N ) and loss severity (Y ), we have considered ages, incomes, sex, education, insurance coverage and races for the covariates. We have tried several marginal regression models such as gamma and lognormal distributions for Y and zero-truncated Poisson and Zipf’s distributions for N . The right tail of the gamma distribution is too light for the average losses Y , and the right tail of the zero-truncated Poisson is also too light for capturing the heavy tails of the loss frequencies. Based on preliminary data analysis, we finally chose age, insurance coverages and races as the covariates for both N and Y . The other covariates were either non-significant or leading to relatively worse AICs. Note that, for the mixed copula method, the covariates for frequency and severity are not necessarily the same, although we chose the same set of covariates for our data analysis. We have not compared all the distributions for the marginal regression models, but the lognormal and Zipf’s models are good enough to capture the main features in the dataset. In the following, we discuss the two marginal regression analysis, respectively. 4.2 Marginal regression Now, denote a covariate vector as xi , i = 1, . . . , 2263. The marginal regression model for the loss frequency using Zipf’s distribution can be written as n−s fN (n|s, m) = P[N = n|s, m] = Pm −s , i=1 i n = 1, 2, . . . , m, s > 0, (9) where s > 0, m ∈ {1, 2, 3, . . . } are the parameters of the Zipf’s distribution, and m is the maximum value of N ; for this dataset, we chose m = 98, the maximum number of visits in the dataset. Zipf’s distribution has a power law, which means that the right tail of the distribution is heavier than the commmonly-used Poisson distribution. The Zipf’s distribution can be looked at as a discretized Pareto distribution, and the value of s determines the degree of tail heaviness: a larger s corresponds to a lighter right tail of the distribution, and vice verse. The covariates are introduced as follows ln(si ) = xT i η, i = 1, . . . , 2263, where η is the regression coefficients for the loss frequency (including the intercept term). We used a linear form in xi to demonstrate how to apply the regression model. However, a nonlinear relation between ln(si ) and xi could lead to a better fitting, for which a transformation can be first applied on xi ; for example, polynomials or splines may be considered here. 16 The lognormal model for the loss severity can be written as (ln y − µ)2 fY (y|µ, σ) = , exp − 2σ 2 σy 2π 1 √ where µ is the location parameter and σ is the scale parameter σ, and the covariates are introduced through the following equation µi = xT i γ, i = 1, . . . , 2263, where γ is the corresponding regression coefficients (including the intercept term). We assume that the scale parameter is the same conditioning on different values of covariates. The maximum likelihood estimates (MLEs) of the regression coefficients and the location parameters are reported in Table 2. In order to diagnose how well the two regression models fits the dataset, we used normalized QQ plots of quantile residuals (Dunn and Smyth, 1996) that are in Figure 5. From Figure 5, we find that the lognormal regression and Zipf regression models fit the dataset quite well. So, in the copula regression model, we will keep using these two distributions for fitting the marginals, while incorporating the GGS copula for capturing the dependence structure. Here we have both continuous and discrete response variables. The procedures of getting normalized QQ plots are different for continuous and for discrete variables. We refer to Dunn and Smyth (1996) for more details, and here we only briefly introduce the main steps as follows. The basic procedure of deriving the normalized QQ plot for the average loss expense is that: (1) derive the corresponding cumulative probability for each response variable from the fitted regression model; (2) transform the cumulative probabilities obtained from step (1) by Φ−1 , the quantile function of the standard normal distribution, to get the sample quantiles; (3) plot the sample quantiles obtained from step (2) against the theoretical quantiles of a standard normal distributions with those quantiles being calculated based on the ranks of the sample quantiles. For the normalized QQ plot for the discrete variable (number of visits), the procedure is that: (1) derive the corresponding cumulative probabilities for each response variable y and for y − 1, respectively; (2) randomly sample a probability from a uniform distribution with the endpoints of the uniform distribution being the corresponding cumulative probabilities for y and y − 1; (3) transform the probability obtained from step (2) by Φ−1 to get the sample quantiles; (4) plot the sample quantiles against the theoretical quantiles that can be obtained following the same procedure of deriving those for continuous variables. 17 Figure 5: Normalized QQ plots of quantile residuals for regression on marginals: the left panel is the plot for lognormal regression on average expenses, and the right panel is the plot for Zipf regression on numbers of visits per year. 4 Normal QQ plot for number of visits 4 Normal QQ plot for average costs 2 0 Sample quantiles 0 −2 −2 ● −4 ● −4 Sample quantiles 2 ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● −4 −2 0 2 4 −4 −2 Theoretical quantiles 0 2 Theoretical quantiles Table 2: Estimates of marginal and copula regression models Marginal Frequency Intercept 0.855 age -0.002 ins(2) -0.114 ins(3) -0.085 race(2) 0.044 race(3) 0.138 race(4) 0.155 Severity Intercept 5.911 age 0.005 ins(2) -0.707 ins(3) -0.541 race(2) 0.115 race(3) 0.025 race(4) 0.387 ln(σ) 0.418 Dependence ln(α) ln(β) - 18 s.e. 0.039 0.001 0.029 0.053 0.041 0.075 0.036 0.096 0.001 0.070 0.132 0.104 0.177 0.089 0.015 - GGS 0.869 -0.002 -0.108 -0.118 0.026 0.133 0.133 5.889 0.005 -0.685 -0.447 0.168 0.064 0.412 0.421 5.476 2.430 s.e. 0.038 0.001 0.028 0.052 0.040 0.072 0.035 0.094 0.001 0.069 0.130 0.101 0.172 0.087 0.015 0.011 0.022 4 4.3 Copula regression For a bivariate Archimedean copula C(u, v) = ψ(ψ −1 (u) + ψ −1 (v)), the density function can be written as c(u, v) = ψ 00 (ψ −1 (u) + ψ −1 (v)) . ψ 0 (ψ −1 (u))ψ 0 (ψ −1 (v)) (10) For the GGS copula, based on (8), ψ 0 (x) = −Γ(α − β, x1/β )/Γ(α); ψ 00 (x) = xα/β−2 exp{−x1/β }/(βΓ(α)). Then we can derive the density function of the copula in (10), where the inverse ψ −1 can be obtained by a numerical method. Since here ψ is a strict monotone function, a numerical method usually works very well for deriving the inverse ψ −1 . In order to avoid calculating Γ(α) for a large α (e.g., Γ(α) can not be calculated in the R software (R Development Core Team, 2012) when α ≥ 172 on a Windows 7 32-bit operating system), we use the following Equation (11) for calculating Γ(α, x)/Γ(α) so that α can be ≥ 172. The calculation actually can be done for α ∈ R. Note that Γ(α, x) = (α − 1)Γ(α − 1, x) + xα−1 e−x ; Γ(α) = (α − 1)Γ(α − 1); γ(α, x) = (α − 1)γ(α − 1, x) − xα−1 e−x , where α ∈ R and α ∈ / {−1, −2, . . . }; we refer to Fisher et al. (2003) for a relevant discussion for the cases where α can be a negative integer. The following are approaches of how we deal with numerical issues such as large values of parameters. Letting [α] := max{z integer : z < α}, and ξ := α − [α], Γ(ξ, x) Γ(α, x) = Γ(α) Γ(ξ) −x e xξ x x x x + × × 1+ 1+ 1 + ··· + 1+ . Γ(ξ) ξ 1+ξ 2+ξ [α] − 2 + ξ [α] − 1 + ξ 19 (11) To calculate Γ(α − β, x)/Γ(α), we can write Γ(α − β, x) Γ(α − β, x) Γ(α − β) = × Γ(α) Γ(α − β) Γ(α) Γ(α − β, x) (α − β − 1) × · · · × (α − β − [α − β]) × Γ(α − β − [α − β]) × , = Γ(α − β) (α − 1) × · · · × (α − [α]) × Γ(α − [α]) (12) and then the first term in (12) can be calculated using Equation (11) again by replacing α by α − β. For calculating ψ 00 (x), when x is too large, there could be numerical errors for calculating xα/β−2 . So, we need to find ln(ψ 00 (x)) first; that is, ln(ψ 00 (x)) = (α/β − 2) ln(x) − x1/β − ln(β) − ln(Γ(α)). Due to limitations of space, we refer to Kr¨amer et al. (2013) for a reference on how to incorporate a copula into the two marginal regression models so that we can obtain the MLEs for those regression coefficients (η, γ) and for the dependence parameters (α, β). The MLEs are reported in Table 2, where the likelihood was based on the overall model with marginals and dependence being fitted simultaneously. Based on the MLEs of the dependence parameters, we can get the estimated upper and lower tail orders are respectively κ ˆ U = 21.02 and κ ˆ L = 1.06. It is clear that the upper tail order is greater than 2, the dimension. So, it has upper tail negative dependence, and this is consistent to the preliminary plot as shown in the left panel of Figure 4. While the estimated lower tail order suggests that the dependence structure in the lower tail is positive dependence. In order to estimate the aggregate loss, we have two ways based on two different assumptions. Firstly, we can use the mixed copula approach assuming that loss frequency and loss severity are dependent (Kr¨amer et al., 2013) according to the dependence parameters we have obtained from the GGS copula. Secondly, we may assume that loss frequency and loss severity are independent, and conduct the marginal regressions separately and use the sum of the products between expected number of visits and expected cost per visit as the aggregate loss. In Table 3, we report the estimated aggregate losses based on the two approaches respectively, and the actual aggregate loss is presented as a comparison. The likelihood used to calculate the AIC for the mixed copula approach is the likelihood that was obtained when fitting the regression coefficients and the dependence parameters simultaneously. The likelihood used to get the AIC for the independence assumption is the product of the likelihoods associated with the two marginal regressions respectively. Since the GGS copula includes the independence copula as a special case, we can also let the two dependence parameters of the GGS copula be α = 2 and β = 1 to calculate the expected aggregate loss and the likelihood of the model with the independence assumption, which 20 actually lead to the same results as reported in Table 3. Table 3: Aggregate loss comparisons, where AIC = −2×log likelihood+2×number of parameters. GGS copula Aggregate Loss (USD) 5, 733, 236 AIC 41, 812 Independence Data 8, 153, 765 5, 371, 218 41, 869 – Based on Table 3, it is clear that the mixed copula model based on the GGS copula fits the overall model relatively better with a smaller AIC, and the GGS copula regression significantly outperforms the independence model in assessing the aggregate loss. The empirical analysis suggests that, when the upper tail appears to be negatively dependent, a misspecified independence model that is often used in aggregate loss modeling may overestimate the aggregate loss. Our results also supplement a finding from an auto insurance claim data that was analyzed in Kr¨amer et al. (2013), where a mild positive dependence between loss frequency and loss severity appears, and the independence model underestimates the aggregate loss. The MEPS dataset that was used for empirical analysis actually contains the total expense per year for each individual. For this particular dataset, one can choose to model the aggregate loss and the number of visits jointly as the bivariate response variable, then a copula that can explain positive upper tail dependence can be used. As suggested by the above empirical analysis, the dependence between yearly total expense and the number of visits per year is not linear, and conditioning on different values of covariates there may be different degrees of positive dependence. To this end, one can also consider a full-range tail dependence copula and allow the dependence parameter change along different values of covariates. We refer to Hua and Xia (2014) for more detailed discussion on regression on dependence parameter with full-range tail dependence copulas. 5 Concluding remark From insurance data, one often observes two unique non-Gaussian phenomena. Firstly, the univariate marginals are often skewed and the right distributional tails could be light or heavy. Secondly, the dependence structure between univariate marginals or between their transformed forms often can not be well captured by a covariance matrix. To this end, copula has proved to be a very useful tool in dealing with these situations. Moreover, statistical inference on high-risk scenarios often plays a critical role, and like the case we studied in Section 4, it may influence the overall assessment dramatically. Therefore, when choosing a copula for modeling the dependence structures, we have to be particularly careful about the tail behavior of the candidate copula 21 models. An ideal candidate copula shall be the one that has less number of parameters but wider range of dependence in both upper and lower tails, and the range of dependence of the copula shall be able to cover the actual range of dependence suggested by the observed data and beyond; moreover, implementation of the copula should be achievable. Bearing those criteria in mind, we first narrow down the families of copulas and consider only Archimedean copulas, because it is suitable for constructing asymmetric dependence structures between upper and lower tails and for incorporating tail negative dependence as well. Some sufficient conditions for upper tail negative dependence have been derived from a scale mixture representation of Archimedean copulas. Through theoretical study on copulas, we have constructed new parametric copulas that have simple forms and desirable properties. We implement the GGS copula into a mixed copula regression model. An upper tail negative dependence pattern between loss severity and loss frequency has been detected and modeled for a medical expenditure dataset. The mixed copula regression with the GGS copula provides a significantly better assessment for total losses. On the other hand, assuming that the loss severity and frequency are independent overestimates the total losses. Since the yearly total expenditure is actually available for the dataset, one may consider using a Tweedie model (Tweedie, 1984) for modeling the aggregate loss directly. A comparison between the Tweedie regression and the mixed copula regression based on the GGS copula would be interesting. Moreover, a more challenging and interesting question is that, can one or two dependence parameters be added in the Tweedie model so that loss frequency, loss severity and their dependence structures can be taken into consideration simultaneously? Acknowledgment This research conducted was partially supported by a grant from the Casualty Actuarial Society through an Individual Grants Competition, and partially supported by the Research and Artistry Grant from Northern Illinois University. The author is thankful to the precious time of the reviewers for the above grants, and to the helpful comments from Professor Edward W. Frees on a preliminary manuscript; the comments lead to improved presentation of the paper. All remaining errors are the author’s own. References Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Number 1972. Dover publications. 22 Bingham, N. H., Goldie, C. M., and Teugels, J. L. (1987). Regular Variation, volume 27 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge. Charpentier, A. and Segers, J. (2009). Tails of multivariate Archimedean copulas. Journal of Multivariate Analysis, 100(7):1521–1537. Czado, C., Kastenmeier, R., Brechmann, E. C., and Min, A. (2012). A mixed copula model for insurance claims and claim sizes. Scandinavian Actuarial Journal, 2012(4):278–305. Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5(3):236–244. Embrechts, P. and Goldie, C. M. (1980). On closure and factorisation properties of subexponential and related distributions. Journal of the Australian Mathematical Society, Series A, 29:243–256. Embrechts, P., Kl¨ uppelberg, C., and Mikosch, T. (1997). Modelling Extremal Events, volume 33. Springer-Verlag, Berlin. Ferguson, T. (1973). A bayesian analysis of some nonparametric problems. The annals of statistics, pages 209–230. Fisher, B., Jolevsaka-Tuneska, B., and KiliC ¸ man, A. (2003). On defining the incomplete gamma function. Integral Transforms and Special Functions, 14(4):293–299. Frees, E. W. (2010). Regression Modeling with Actuarial and Financial Applications. Cambridge University Press. Gschl¨oßl, S. and Czado, C. (2007). Spatial modelling of claim frequency and claim size in non-life insurance. Scandinavian Actuarial Journal, 2007(3):202–225. Hua, L. and Joe, H. (2011). Tail order and intermediate tail dependence of multivariate copulas. Journal of Multivariate Analysis, 102:1454–1471. Hua, L. and Joe, H. (2012). Tail comonotonicity: properties, constructions, and asymptotic additivity of risk measures. Insurance Math. Econom., 51:492–503. Hua, L. and Joe, H. (2013). Intermediate tail dependence: a review and some new results. In Li, H. and Li, X., editors, Stochastic Orders in Reliability and Risk: In Honor of Professor Moshe Shaked, chapter 15, pages 291–311. Springer. Hua, L. and Xia, M. (2014). Assessing high-risk scenarios by full-range tail dependence copulas. North American Actuarial Journal, 18(3):363–378. 23 Joe, H. (1997). Multivariate Models and Dependence Concepts, volume 73 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Joe, H. and Hu, T. (1996). Multivariate distributions from mixtures of max-infinitely divisible distributions. Journal of Multivariate Analysis, 57(2):240–265. Kr¨amer, N., Brechmann, E. C., Silvestrini, D., and Czado, C. (2013). Total loss estimation using copula-based regression models. Insurance: Mathematics and Economics, 53(3):829 – 839. Larsson, M. and Neˇslehov´a, J. (2011). Extremal behavior of archimedean copulas. Advances in Applied Probability, 43(1):195–216. Malov, S. (2001). On finite-dimensional archimedean copulas. In Asymptotic Methods in Probability and Statistics with Applications, pages 19–35. Birkhauser. McNeil, A. J. and Neˇslehov´a, J. (2009). Multivariate Archimedean copulas, d-monotone functions and l1 -norm symmetric distributions. Annals of Statistics, 37(5B):3059–3097. McNeil, A. J. and Neˇslehov´a, J. (2010). From archimedean to liouville copulas. Journal of Multivariate Analysis, 101(8):1772–1790. Nelsen, R. B. (2006). An Introduction to Copulas. Springer Series in Statistics. Springer, New York, second edition. R Development Core Team (2012). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Tweedie, M. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and new directions: Proc. Indian statistical institute golden Jubilee International conference, pages 579–604. Zipf, G. K. (1932). Selected Studies of The Principle of Relative Frequency in Language. Harvard Univ. Press. 24
© Copyright 2024 ExpyDoc