Here - Illinois State University

Tail negative dependence and its applications
for aggregate loss modeling
Lei Hua∗
October 19, 2014
Abstract. Tail order of copulas can be used to describe the strength of dependence in
the tails of a joint distribution. When the value of tail order is larger than the dimension, it
may lead to tail negative dependence. First of all, we prove results on conditions that lead to
tail negative dependence for Archimedean copulas. Then we construct new copulas that possess
upper tail negative dependence. In particular, a copula based on a scale mixture with a generalized
gamma random variable (GGS copula) is useful for modeling asymmetric tail negative dependence
structures. Finally, we apply mixed copula regression based on the GGS copula to aggregate loss
modeling for a medical expenditure panel survey dataset. We find that there exists upper tail
negative dependence between loss frequency and loss severity for this dataset, and the introduction
of tail negative dependence structures significantly improves the aggregate loss modeling.
Key words: Tail order, scale mixture, loss frequency, loss severity, MEPS data,
Archimedean copula, GGS copula.
1
Introduction
As more data becomes available, many new meaningful dependence patterns are present. One
may find that existing statistical models may not be able to well capture those new dependence
structures any more. To this end, copula provides a very flexible tool. Then the challenge is often
to create a new copula family that is not only suitable for describing new dependence patterns,
but also computable and easy to implement.
Motivated by a unique asymmetric tail negative dependence structure appears in a medical
expenditure dataset, we find the necessity of developing new statistical models to account for such
∗
[email protected], Division of Statistics, Northern Illinois University, DeKalb, IL, 60115, United States.
1
dependence structures. In actuarial science, aggregate loss modeling has been a very important
question. How to model them appropriately is extremely important for insurers or governments
to assess and predict the associated costs. During a policy term, say one year, there may be
several loss events associated with an insurance policy; for each loss event, there is an amount of
expense. The former corresponds to loss frequency and the latter corresponds to loss severity.
There are some ways to do aggregate loss modeling by considering loss frequency and loss
severity separately. In what follows, we list two methods, and we refer to Frees (2010) for more
details about regression analysis for insurance applications. Let a random variable Y be loss
severity and a random variable N be loss frequency. One way is to model the loss frequency N
and the conditional loss severity Y |N > 0 separately by regression models such as generalized
linear models, and then their product is treated as the aggregate loss. Another way is to model
the frequency and severity simultaneously by a mixture model, say the Tweedie model if the
data for aggregate losses is available. It is assumed in the first way that N and Y |N > 0 are
independent, and in the second way that N and Y are independent.
Recently, Gschl¨oßl and Czado (2007) finds that the independence assumption that is often
assumed in the literature between loss severity and frequency may not hold, and then copula
models have been employed to account for the dependence structure between loss frequency and
severity in Czado et al. (2012) and Kr¨amer et al. (2013). The latter two papers incorporate some
commonly used parametric copula families into respective regression models for loss frequency
and severity, and then they claim that there is a moderate positive dependence between loss
severity and frequency in a German auto insurance claim dataset.
The method aforementioned can also be applied to other datasets as long as there is a suitable parametric copula family used to capture the dependence structure. Based on a Medical
Expenditure Panel Survey (MEPS) dataset1 from Agency for Healthcare Research and Quality
(AHRQ), we find a unique dependence pattern between loss severity and frequency. In general,
average expenses per visit are independent with the number of visits during each year. However,
when patients use medical services more frequently, the average costs per visit and the number
of visits tend to be more negatively dependent. To the best of our knowledge, all the commonly
used copulas can not capture this unique dependence pattern. Therefore, we need to develop
new copulas that can capture different degrees of upper tail negative dependence and keep the
rest parts approximately independent. After a suitable copula being constructed, we will be able
to incorporate the new copula into the mixed copula regression model developed in Czado et al.
(2012) and Kr¨amer et al. (2013) for the aggregate loss modeling. We refer the interested reader
to the first panel of Figure 4 for an impression of the upper tail negative dependence pattern in
the MEPS dataset.
1
http://meps.ahrq.gov/mepsweb/index.jsp
2
After considering the families of extreme value copula, elliptical copula and Archimedean
copula, we find that only Archimedean copula is suitable for our purpose. Moreover, in order to
get tail negative dependence for an Archimedean copula, we shall use a scale mixture approach
studied in McNeil and Neˇslehov´a (2009) for constructing the Archimedean copula. Tail behavior
of Archimedean copulas has been studied in Charpentier and Segers (2009), Hua and Joe (2011,
2013) and Larsson and Neˇslehov´a (2011), but none of them give the conditions for tail negative
dependence.
Our main contributions in this paper are the following: first of all, we prove general conditions
that lead to upper tail negative dependence for an Archimedean copula, which also generalize some
results in Hua and Joe (2011, 2013); secondly, we construct some new Archimedean copulas and
study their properties, and one of these copulas is very useful in modeling the unique asymmetric
tail negative dependence pattern appears in the MEPS dataset; finally, we implement our new
copula into the mixed copula regression analysis and conduct a data analysis for the medical
expenditure dataset, and find that the new copula can significantly improve on the aggregate loss
modeling.
In what follows, we first briefly introduce some basic concepts and notation in Section 2. We
will explore in Section 3 how to construct a desirable asymmetric tail negative dependence structure based on the notion of tail order. Some parametric copulas will be constructed. In particular,
a new two-parameters Archimedean copula family based on generalized Gamma simplex mixtures
will be studied, and it is useful for modeling such an asymmetric upper tail negative dependence
structure. In Chapter 4, we implement the new copula into the mixed copula regression model
and conduct aggregate loss modeling for a medical expenditure dataset from the United States.
By the introduction of the new tail negative dependent copula, the methodology can significantly
improve the aggregate loss modeling. Finally, we conclude the paper in Section 5.
2
Preliminaries
Due to its growing popularity in the last decade and its flexibility in modeling non-Gaussian
dependence structures, the notion of copula has been used widely in the actuarial literature. A
copula C : [0, 1]d → [0, 1] for a d-dimensional random vector can be defined as C(u1 , . . . , ud ) =
F (F1−1 (u1 ), . . . , Fd−1 (ud )), where F is the joint cumulative distribution function (cdf), Fi is the
univariate cdf for the ith marginal, and Fi−1 is the generalized inverse function defined as Fi−1 (u) =
inf{x : Fi (x) ≥ u}. We refer to Joe (1997) and Nelsen (2006) for references of copulas.
For a copula C, the lower tail order of C is defined as a constant κL such that C(u, . . . , u) ∼
κL
u `(u) as u → 0+ , where the notation g(x) ∼ h(x) as x → x0 means that limx→x0 g(x)/h(x) = 1,
and `(x) is a slowly varying function as x → 0+ . For a measurable function g : R+ → R+ ,
3
if for any constant r > 0, limx→0+ g(rx)/g(x) = 1, then g is said to be slowly varying at 0+ ,
denoted as g ∈ RV0 (0+ ); if for any constant r > 0, limx→∞ g(rx)/g(x) = rα , α ∈ R, then g
is said to be regularly varying at ∞ with variation exponent α, and is denoted as g ∈ RVα .
For a random variable X, we usually use FX to represent the cdf of X. When we say that X
is regularly varying at ∞, it actually means that the survival function F X ∈ RVα with some
variation exponent α < 0. Similarly, the upper tail order of C is defined as a constant κU such
that C(1 − u, . . . , 1 − u) ∼ uκU `(u) as u → 0+ , where C is the survival function of C. Tail order is
a flexible quantity to capture the degree of dependence in the tails, and it can be used for upper
and lower tails respectively, and can be used to quantify the strength of dependence ranging from
tail negative dependence to tail positive dependence. The range of tail order κ is that 1 ≤ κ ≤ ∞,
and generally speaking, a smaller κ implies a stronger dependence in the tail. For the bivariate
case, if the tail order κ > 2, then there is negative dependence in the tail. We refer to Hua and
Joe (2011) for more details about the notion of tail order.
In order to capture the reflection asymmetric tail dependence pattern (i.e., the upper and
lower tails are different) appears in the left panel of Figure 4, we need to construct a bivariate
copula of which the upper tail order is greater than 2, and the lower tail order is allowed to
be close to 2. For elliptical copulas, the upper and lower tails are symmetric, so they are not
suitable. For a bivariate extreme value copula, the upper tail order is either 1 or 2, so it can not
be an upper tail negative dependent copula. For an Archimedean copula, it can be written as the
following form
C(u1 , . . . , ud ) = ψ(ψ −1 (u1 ) + · · · + ψ −1 (ud )),
(1)
where ψ −1 is the inverse of the generator ψ. We have mainly two ways of constructing an
Archimedean copula. One way is based on the Laplace transform (LT) of a positive random
variable. Namely, let the generator ψ in (1) be the LT of a positive random variable. That is,
Z
ψ(s) =
∞
exp{−sy}FY (dy),
s ≥ 0,
0
where Y is a positive random variable. It is well known that such a generator ψ is completely
monotone and can be used to construct an Archimedean copula for any dimension. The other way
is based on the survival copula for a scale mixture with a uniform distribution on a simplex (see
d
McNeil and Neˇslehov´a (2009)). More specifically, if a random vector X := (X1 , . . . , Xd ) = R ×
d
(S1 , . . . , Sd ) satisfies some regularity conditions, where = means “equality in distribution”, then
the survival copula of X is an Archimedean copula of dimension d. We will use this representation
throughout the paper, and a more formal introduction to it is at the beginning of Chapter 3.
In Chapter 3, we will prove that the tail behavior of the above random variable R will af4
fect the strength of dependence in the tails for the corresponding Archimedean copula. In order
to characterize the tail behavior of a univariate random variable such as R, a well-developed
and mathematically tractable way is to consider which maximum domain of attraction (MDA)
to which the univariate random variable belongs. For example, a Gamma random variable belongs to the MDA of Gumbel, and a Pareto random variable belongs to the MDA of Fr´echet.
More mathematically, a random variable X is said to belong to the MDA of an extreme value
distribution H if there exist normalizing constants σn > 0 and µn ∈ R such that
d
(Mn − µn )/σn → H,
n → ∞,
where Mn is the first order statistics (i.e., maximum) of a random sample of X with sample size
d
n, and → means “convergence in distribution”. This is written as X ∈ MDA(H). It is well known
that there are only three non-degenerate univariate extreme value distributions: Fr´echet, Gumbel
and Weibull. Roughly speaking, MDA of Fr´echet (denoted as MDA(Φα ), where α is the shape
parameter of the Fr´echet distribution) includes univariate distributions that have heavier right
distributional tails, MDA of Gumbel (denoted as MDA(Λ)) consists of univariate distributions
that have lighter right distributional tails, while MDA of Weibull corresponds to bounded random
variables that are often irrelevant to actuarial applications. We refer to Embrechts et al. (1997)
for a classical reference on the concepts of MDA and general extreme value theory, and relevant
applications in insurance and finance.
3
Tail negative dependence
It is clear from the brief discussion in Chapter 2 that, extreme value copula and elliptical copula
are not suitable for constructing a dependence structure that has asymmetric tail dependence
and tail negative dependence simultaneously. So, we will focus on Archimedean copula in this
chapter.
To provide a parametric Archimedean copula that has a simple form, one often considers ψ to
be a LT of a positive random variable. We refer to Joe and Hu (1996) and Joe (1997) for many
implementable parametric Archimedean copulas. However, Archimedean copulas generated by
such LTs do not provide tail negative dependence. For a bivariate random vector (X1 , X2 ), if
P[X1 ≤ x1 , X2 ≤ x2 ] ≥ P[X1 ≤ x1 ]P[X2 ≤ x2 ] for any x1 , x2 ∈ R, then (X1 , X2 ) is said to be
positive quadrant dependent (PQD). If a bivariate Archimedean copula is constructed by the
LT of a positive random variable, then the copula is PQD and thus positive upper quadrant
dependent (PUQD) and positive lower quadrant dependent (PLQD) (see Chapter 2.1.1 of Joe
(1997)), and moreover, the tail orders κ ≤ 2 for both upper and lower tails (Proposition 2 of Hua
and Joe (2011)). So Archimedean copulas based on the LT of a positive random variable can
5
not be used to construct such an asymmetric upper tail negative dependence structure. However,
if an Archimedean copula is derived from the survival copula of a scale mixture with a uniform
distribution on the simplex, then we will show that conditions on the mixing random variable can
lead to a very flexible tail for the corresponding copula, which can be tail dependent, intermediate
tail dependence (Hua and Joe, 2011) and even tail negative dependent.
Instead of using a LT of a positive random variable, one can also construct an Archimedean
copula by other ways, as long as the generator ψ satisfies certain regularity conditions (see Malov
(2001) or McNeil and Neˇslehov´a (2009)). In McNeil and Neˇslehov´a (2009), an Archimedean
copula can be the survival copula of a random vector
d
X := (X1 , . . . , Xd ) = R × (S1 , . . . , Sd ),
(2)
where R and (S1 , . . . , Sd ) are independent, R is a positive random variable and (S1 , . . . , Sd ) is
P
uniformly distributed on the simplex {x ∈ Rd+ : i xi = 1}. In this case, ψ can be the Williamson
d-transform of the cdf FR with FR (0) = 0. That is,
Z
ψ(s) =
∞
(1 − s/r)d−1 FR (dr),
s ∈ [0, ∞).
(3)
s
In Section 3.1, we will prove that the tail behavior of R will affect the strength of dependence
in the tails of the Archimedean copula, and upper tail negative dependence can be derived from
the representation (2).
3.1
Conditions
In Hua and Joe (2013), we find that when the right tail of 1/R follows a power law, then a lighter
right tail of 1/R tends to increase the upper tail order of the associated Archimedean copula,
thus decreasing the degree of positive dependence in the upper tail. In what follows, unless
otherwise specified, the tail of a univariate random variable or distribution is always referred
to the right distributional tail. From Hua and Joe (2011), we know that if tail order κ > d,
where d is the dimension, then the copula may have tail negative dependence. So, in order to
get tail negative dependence, we shall decrease the tail heaviness of the random variable 1/R.
However, by observing Example 4 of Hua and Joe (2013), even if 1/R has a very light tail, it can
not provide tail negative dependence. The reason is not that the tail of 1/R is not sufficiently
light, but that the Archimedean copula is constructed by the LT of a positive random variable.
So, in the following Proposition 1, we will instead use the scale mixture method to construct
Archimedean copulas so that tail negative dependence can be obtained.
In this section, all distribution functions and density functions are assumed to be ultimately
6
monotone to the left and right endpoints; this condition is very mild and satisfied by all the
commonly used distributions. Since the theoretical results developed in this section are for distributional tails, without loss of generality, we further assume that the cdfs of the marginal
distributions are all continuous so that the copula is uniquely determined to avoid cumbersome
arguments.
Proposition 1 Suppose a random vector X := (X1 , . . . , Xd ) is defined as in (2). If 1/R ∈
MDA(Φα ) and E[1/R] < ∞, then the lower tail order of X is κ = α; that is, the upper tail order
of the corresponding Archimedean copula is α.
Proof: Let F be the identical univariate cdf for Xi ’s, and C be the copula for X. Since the survival
copula for X is an Archimedean copula, in order to study the upper tail of the Archimedean copula,
it suffices to study the lower tail for X. Due to Equation (1) of Hua and Joe (2013), the upper
tail order κ of the Archimedean copula can be derived as
κ = lim+
u→0
log(C(F (x), . . . , F (x)))
log(P[X1 ≤ x, . . . , Xd ≤ x])
log C(u, . . . , u)
= lim+
= lim+
.
x→0
x→0
log(u)
log(F (x))
log(P[X1 ≤ x])
Letting T := 1/R, y = 1/x and s∗ = max{s1 , . . . , sd }, then
P[X1 ≤ x, . . . , Xd ≤ x]
Z
P[T ≥ s∗ /x]FS (ds1 , . . . , dsd ).
= P[RS1 ≤ x, . . . , RSd ≤ x] =
s≥0,||s||1 =1
Since T ∈ MDA(Φα ), P[T ≥ ·] ∈ RV−α and there exists a slowly varying function `(·) such that
P[T ≥ t] = t−α `(t). Therefore,
log(P[X1 ≤ x, . . . , Xd ≤ x])
x→0
log(P[X1 ≤ x])
R
log P[T ≥ y] × s≥0,||s||1 =1 P[T ≥ s∗ y]/P[T ≥ y]FS (ds1 , . . . , dsd )
R
= lim
1
y→∞
log 0 P[T ≥ s1 y]FS1 (ds1 )
R
log (P[T ≥ y]) + log s≥0,||s||1 =1 P[T ≥ s∗ y]/P[T ≥ y]FS (ds1 , . . . , dsd )
Ry
= lim
y→∞
− log(y) − log(B(1, d − 1)) + log 0 P[T ≥ x](1 − x/y)d−2 dx
R
−α log y + log(`(y)) + log s≥0,||s||1 =1 P[T ≥ s∗ y]/P[T ≥ y]FS (ds1 , . . . , dsd )
Ry
= lim
y→∞
− log(y) − log(B(1, d − 1)) + log 0 P[T ≥ x](1 − x/y)d−2 dx
κ = lim+
= α,
(4)
(5)
where B(·, ·) is a Beta function. Equation (4) is implied by the fact that univariate marginals of
7
S is distributed as Beta(1, d − 1) (Ferguson, 1973). Equation (5) holds due to the following:
(a) 0 < limy→∞
condition;
Ry
0
P[T ≥ x](1 − x/y)d−2 dx ≤ limy→∞
Ry
0
P[T ≥ x]dx = E[T ] < ∞ from the
(b) by Proposition 1.3.6 (i) of Bingham et al. (1987), limy→∞ log(`(y))/ log(y) = 0;
(c) since 1/d ≤ s∗ ≤ 1, and as y → ∞, P[T ≥ s∗ y]/P[T ≥ y] → s−α
uniformly in s∗ ∈ [1/d, 1],
∗
Z
P[T ≥ s∗ y]
lim log
FS (ds1 , . . . , dsd )
y→∞
s≥0,||s||1 =1 P[T ≥ y]
Z
−α
= log
s∗ FS (ds1 , . . . , dsd ) ≤ α log (d) < ∞.
s≥0,||s||1 =1
That is, κ = α, which completes the proof.
Remark 1 In Proposition 1, the condition on the random variable R has the following equivalent
relationships: 1/R ∈ MDA(Φα ) ⇐⇒ F 1/R ∈ RV−α ⇐⇒ FR ∈ RVα (0+ )
Remark 2 Proposition 1 generalizes Proposition 6 in Hua and Joe (2013), where only intermediate tail dependence has been studied. Moreover, we use a different method to prove Proposition
1 in this paper and the proof is shorter than that in Hua and Joe (2013).
From Example 4 in Hua and Joe (2013), we notice that, for a d-dimensional Archimedean
copula constructed by the LT of an inverse Gamma random variable with the shape parameter
α, the corresponding R in the sense of (2) can not satisfy that FR ∈ RVα (0+ ) for α > d. This is
a reason why this copula does not have tail negative dependence.
More generally, we want to know whether multivariate Archimedean copula constructed by
the LT of a positive random variable can have upper tail order that is larger than the dimension
of the copula. As discussed in Section 1, this is not true for a bivariate Archimedean copula
constructed by the LT of a positive random variable, as such a copula is PQD and thus PUQD
and positive lower quadrant dependent (PLQD). So, for the bivariate case, both upper and lower
tails of such an Archimedean copula can not have tail negative dependence. For the multivariate
case with dimension d ≥ 2, we know that an Archimedean copula constructed by the LT of
a positive random variable is positive lower orthant dependent (PLOD) (see Corollary 4.6.3 of
Nelsen (2006)), therefore, by Proposition 2 of Hua and Joe (2011), the lower tail order must be less
than or equal to the dimension d for an Archimedean copula constructed by the LT of a positive
random variable. However, for the upper tail, we do not know whether such an multivariate
Archimedean copula is positive upper orthant dependent (PUOD). The following result implies
8
that the upper tail order of a d-dimensional Archimedean copula constructed by the LT of a
positive random variable must be less than or equal to d.
Corollary 2 Let C be a d-dimensional Archimedean copula constructed as (1) with ψ being the
LT of a positive random variable Y .
1. If F Y = o(F W ), where W ∼ Inverse-Gamma(d, 1), then the upper tail order κU of the
copula C exists, and κU = d.
2. If F Y ∈ RV−α for an α such that d ≥ α > 1, then the upper tail order κU of C exists, and
κU = α.
3. If F Y ∈ RV−1 and E[Y ] < ∞, then the upper tail order κU of C exists, and κU = 1.
Proof: From McNeil and Neˇslehov´a (2009), we know that for an Archimedean copula constructed
by the LT of a positive random variable, one can also write it as the survival copula for the
random vector in (2) with a scaling random variable R, and the relationship between R and Y is
d
that R and Y are independent and R = Gamma(d, 1)/Y , or equivalently
1
1 d
=Y ×
=: Y W,
R
Gamma(d, 1)
where W is distributed as Inverse-Gamma(d, 1). Since
Z
∞
F W (w) =
w
1
1 −d−1
x
exp{−1/x}dx =
Γ(d)
Γ(d)
1
∼
w−d ,
dΓ(d)
w → ∞,
Z
0
1/w
td−1 exp{−t}dt =
1
γ(d, 1/w)
Γ(d)
(6)
where Γ(·) is the Gamma function, γ(·, ·) is the lower incomplete Gamma function and the
asymptotic equivalence is referred to Abramowitz and Stegun (1964), F W ∈ RV−d .
To prove 1, since F Y = o(F W ) and F W ∈ RV−d , by the corollary of Theorem 3 in Embrechts
and Goldie (1980), F 1/R ∈ RV−d and thus E[1/R] < ∞ as d = 2, 3, . . . . Then, by Proposition 1,
κU = d.
To prove 2, if F Y ∈ RV−d , then by the corollary of Theorem 3 in Embrechts and Goldie (1980),
F 1/R ∈ RV−d , so the claim is proved. If F Y ∈ RV−α with d > α > 1, then clearly F W = o(F Y )
and thus F 1/R ∈ RV−α and E[1/R] < ∞. Proposition 1 leads to the claim.
To prove 3, it is similar to the second case, but we need the extra condition E[Y ] < ∞ so that
E[1/R] = E[Y ]E[W ] < ∞, which completes the proof.
9
Remark 3 Corollary 2 supplements Proposition 4 in Hua and Joe (2011) where the conditions
are proposed on the LT of Y instead of the survival function F Y of Y . However, we shall note
that the condition in case 3 of Corollary 2 is only a sufficient but not a necessary condition for the
upper tail order being equal to 1. Based on Proposition 4 in Hua and Joe (2011) and Proposition
3 of Hua and Joe (2012), even when the right tail of F Y is heavier so that E[Y ] does not exist,
one may still get that κU = 1.
From Proposition 1, we find that when the right tail of 1/R becomes lighter, the upper tail
order of the corresponding Archimedean copula becomes larger, and thus the dependence in the
upper tail becomes weaker and even negative dependence. When 1/R ∈ MDA(Λ), the right tail
of 1/R becomes even lighter. In this case, we may naturally expect that the upper tail order of
the corresponding Archimedean copula may be even larger or infinite. The following is the result
in this sense.
Proposition 3 Suppose a random vector X := (X1 , . . . , Xd ) is defined as in (2). If 1/R ∈
MDA(Λ), then the lower tail order of X is κ = ∞; that is, the upper tail order of the corresponding
Archimedean copula is κU = ∞.
Proof: Let T = 1/R, and thus E[T ] < ∞. From the proof of Proposition 1, we can write
P[T
≥
s
y]/P[T
≥
y]F
(ds
,
.
.
.
,
ds
)
∗
S
1
d
s≥0,||s||1 =1
Ry
κ = lim
y→∞
− log(y) − log(B(1, d − 1)) + log 0 P[T ≥ x](1 − x/y)d−2 dx
R
log (P[T ≥ y]) + log s≥0,||s||1 =1 1FS (ds1 , . . . , dsd )
Ry
≥ lim
y→∞ − log(y) − log(B(1, d − 1)) + log
P[T ≥ x](1 − x/y)d−2 dx
0
log(P[T ≥ y])
= lim
= ∞,
y→∞
− log(y)
log (P[T ≥ y]) + log
R
which completes the proof.
3.2
Examples
After proving the above propositions, we will have many choices of parametric distributions
for the random variable R, because MDA(Fr´echet) and MDA(Gumbel) are very large classes of
distributions (Embrechts et al., 1997). In this section, we give some examples of parametric
copulas that have upper tail negative dependence.
d
Example 1 (Inverse-Pareto - Simplex copula, aka, IPS copula) Let Xi = RSi , i = 1, 2, (S1 , S2 )
be uniformly distributed on {x ≥ 0 : x1 + x2 = 1}, and T := 1/R follow a Pareto distribution
10
with cdf F (x) = 1 − (1 + x)−α , x ≥ 0, α > 1. Then the generator as defined in (3) for the
Archimedean copula is
ψ(s) =
s 1 − (1 + 1/s)−α+1 + 1,
1−α
s ≥ 0, α > 1.
Clearly, F 1/R ∈ RV−α , α > 1. Therefore, the upper tail order of the survival copula for (X1 , X2 )
is κU = α. Depending on the value of α > 1, this upper tail ranges from intermediate tail
dependence to tail negative dependence as the dependence parameter α becomes larger. Figure 1
shows some contour plots for the IPS copula. It is clear that: (1) when 1 < α < 2, the IPS copula
has intermediate tail dependence in the upper tail; (2) when α = 2, the upper tail looks like
independence; (3) when α > 2, the upper tail appears to be negatively dependent, and a larger α
indicates stronger negative dependence. For the lower tail, since ψ ∈ RV−1 , by Proposition 6 of
Hua and Joe (2011), the IPS copula always has lower tail order κL = 1, which is also consistent
to the contour plots in Figure 1.
Figure 1: Normalized contour plots of the IPS copula
α = 1.1
α = 1.5
0.04
1
2
−2
−1
0.1
4
2
−2
−1
0
0.02
2
1
−1
0
1
2
4
−1
0.1
0.08
0.04
−2
0.04
−2
0.1
0
0.1
0.1
0
−1
0.1
0.08
−2
1
2
0.12
4
0.12
6
0.14
0.04
1
0.06
0.06
0.08
0
2
0.02
2
0.02
1
1
0
1
α = 20
0.1
−1
−1
0
α = 10
6
0.1
−2
0.04
α=5
0.12
−1
0.1
0.08
−2
0.04
−2
−2
2
0
0
4
−1
0.1
0.08
0.06
−2
0.16
0.1
0
0.16
0
−1
8
0.06
0.12
0.16
0.12
14
0.
0.0
1
1
1
0.1
2
−1
0.06
0.06
1
0.
−2
0.02
2
0.02
2
2
0.02
α=2
1
2
−2
−1
0
However, in order to be useful for analyzing the expenditure dataset, the candidate copulas
should possess not only upper tail negative dependence but also a lower tail that is close to
independence. Further investigation will be conducted to seek simple forms of the corresponding
Williamson’s d-transform ψ that can lead to such an Archimedean copula. The effect of R on
11
the lower tail of an Archimedean copula is referred to Larsson and Neˇslehov´a (2011). After
considering the upper and lower tails together, we construct a copula in Example 2.
d
Example 2 (Generalized-Gamma - Simplex mixture, aka, GGS copula) Let Xi = RSi , i = 1, 2,
(S1 , S2 ) be uniformly distributed on {x ≥ 0 : x1 + x2 = 1}, and R1/β follow a Gamma distribution
with shape parameter α so that
1
FR (x) =
βΓ(α)
Z
x
sα/β−1 exp{−s1/β }ds α > 0, β > 0,
(7)
0
and the Archimedean generator is
Z
∞
(1 − s/r)FR (dr) =
ψ(s) =
s
1
Γ(α, s1/β ) − sΓ(α − β, s1/β ) ,
Γ(α)
(8)
where Γ(·, ·) is an upper incomplete gamma function. Note that, although Γ(0) = ∞, the case
with α = β is also implementable. Some contour plots of the GGS copula are illustrated in Figure
2.
1
Clearly, as x → 0+ , FR (x) ∼ αΓ(α)
xα/β . Then by Proposition 1, the upper tail order of the
corresponding Archimedean copula is κU = max{α/β, 1}; note that, Corollary 2 of Larsson and
Neˇslehov´a (2011) shows that there is upper tail dependence if α < β. If R ∈ MDA(Λ) with
an auxiliary function a(·), then by Proposition 7 of Larsson and Neˇslehov´a (2011), the lower
tail order κL = 21−γ , where γ is the index such that the auxiliary function a ∈ RVγ . Since a
Weibull distribution with cdf 1 − exp{−x1/β }, β > 0 belongs to MDA of Gumbel, and an auxiliary
function is a∗ (x) = βx1−1/β (Embrechts et al., 1997), and a∗ ∈ RV1−1/β . Then by Lemma 1 of
Larsson and Neˇslehov´a (2011), a∗ can also be an auxiliary function for the survival function of R.
Therefore, γ = 1 − 1/β, and κL = 21/β . Therefore, this copula can provide a very flexible upper
and lower tails, ranging from positive to negative dependence. Note that, when α = 2 and β = 1,
κL = κU = 2, and moreover, ψ(s) = Γ(2, s) − sΓ(1, s) = exp(−s), which is the generator of the
independence copula. So the independence copula is a special case of the GGS copula.
We can also re-parameterize the GGS copula by the upper and lower tail orders; that is,
α = κU ln(2)/ ln(κL ), and β = ln(2)/ ln(κL ).
d
Example 3 (Inverse-Gamma-simplex, Example 2 of McNeil and Neˇslehov´a (2010)) Let Xi =
RSi , i = 1, 2, (S1 , S2 ) be uniformly distributed on {x ≥ 0 : x1 + x2 = 1}, and 1/R follow a
Gamma distribution with shape parameter α and scale parameter 1 so that the generator of the
Archimedean copula is
ψ(s) =
γ(θ, 1/s) sγ(θ + 1, 1/s)
−
.
Γ(θ)
Γ(θ)
12
Figure 2: Normalized contour plots of the GGS copula
2
−2
−1
1
−2
0
6
0
0.1
−1
1
2
−1
0
2
0.05
5
0.1
2
5
2
0.
3
0.
0.
14
0
6
0
1
α = 2; β = 6
0.04
0.
0.1
−2
0.1
14
6
0
0
0.14
2
8
0.0
1
1
0.08
2
0.1
1
0
α = 2; β = 3
2
2
0.08
1
−1
12
0.04
−2
−2
2
0.04
0.04
0.
1
1
0
0.06
0.02
α = 2; β = 2
0.
.08
2
1
0.1
0.
1
0
α = 2; β = 1
0.14
−1
−1
−2
0.04
4
1
0.
1
0
0
−1
−2
16
0.
12
0.
−1
0.1
0.12
0.14
0.0
8
0.1
0.06
0.08
0
1
1
0.
1
0.25
0.2
0.02
0.04
0.06
0.15
α = 30; β = 10
2
2
0.02
0.05
−2
α = 30; β = 4
2
α = 30; β = 2
2
α = 30; β = 1
−2
−1
0
1
2
0.02
−2
−1
0
1
2
0.02
−2
0.2
0.1
6
−2
0.02
0.0
−2
−2
−2
0.06
−1
0.1
0.06
0.1
−1
−1
−1
0.12
−1
0
1
2
−2
−1
0
1
2
By Proposition 3, since 1/R ∈ MDA(Gumbel), the upper tail order is κU = ∞, which implies that
the upper tail is always negatively dependent. For the lower tail, since R follows an inverse Gamma
distribution with shape parameter α and scale parameter 1, due to (6), F R ∈ RV−α . Therefore,
by Theorem 1 (a) of Larsson and Neˇslehov´a (2011), the corresponding generator ψ ∈ RV−α ,
which implies lower tail dependence (see Hua and Joe (2011); Larsson and Neˇslehov´a (2011)).
The limitation of this copula is that there are no parameters to control the upper tail order that
always has tail negative dependence with κU = ∞.
4
4.1
Aggregate loss - data analysis
Introduction
The dataset we are analyzing is based on Panel 14 and Panel 15 for the calendar year of 2010 from
the 2010 Full Year Consolidated Data File. The dataset was collected on a nationally representative sample of the civilian noninstitutionalized population of the United States. To illustrate
the empirical observation of upper tail negative dependence, we now consider the variables of
the number of outpatient department visits to physicians in 2010 (OPDRV10) and the associated
facility expenses (OPVEXP10). The average expense per visit used in the data analysis is calculated
by the ratio between OPVEXP10 and OPDRV10. We use the average expense as loss severity and
the number of visits as loss frequency. There are 32,846 individuals totally, and 2,263 of them
13
have positive number of outpatient visits to physicians and positive facility expenses. Descriptive
statistics of the variables are in Table 1. The scatter plot on the original scale for average expense
and number of visits is Figure 3, which may suggest an independence structure between the two
variables. However, the dependence pattern is not clear using the original scale.
Table 1: Summary of the variables
Min 1st quantile Median Mean 3rd quantile
Max
OPVEXP10
3
187
704 2373
2356 68370
Average Expense
3
132
406 1460
1573 36680
Age
0
29.5
50 46.44
64
85
Number of Visits
1
2
3 4 5 6 7 8 9 10
OPDRV10 (#obs) 1461 394 144 73 58 26 29 9 13 9
11 12 13 14 15 16 17 18 19 20
5
5
5 2 4 1 1 2 2 1
21 22 23 25 28 29 31 32 33 35
1
1
1 1 2 1 1 1 1 1
38 40 42 46 48 65 98
1
1
2 1 1 1 1
Insurance Coverage Any Private (1) Public Only (2) Uninsured (3)
INSCOV10 (#obs)
1356
757
150
Race
Hispanic (1) Black (2) Asian (3) Other (4)
RACETHNX (#obs)
433
440
91
1299
In order to visualize the dependence pattern more intuitively, we add tiny random noises
(Normal(0, 1)/1000) on the numbers of visits to make them continuous, and then transform the
expenses and continuitized number of visits respectively into normal scores that are distributed
as a standard Normal distribution. One can also use some other techniques such as jitters or the
technique used in getting the normalized QQ plot for discrete variables in Figure 5. Then, their
dependence pattern is illustrated in the left panel of Figure 4. Although the plot is not based on
the original data, the pattern of upper tail negative dependence suggests that there may be such
a pattern in the original data as well.
It seems that there is tail negative dependence in the upper tail, and independence in the
other parts. The reason could be that there may be some flat fees or overhead charges so that
the more frequent the visits the lower the average costs per visit.
Following the approach proposed in Czado et al. (2012) and Kr¨amer et al. (2013), we now use
a mixed copula model to conduct a regression analysis for the aggregate losses. We use the Zipf’s
distribution (Zipf, 1932) to model the loss frequency, a lognormal model for the loss severity,
and the GGS copula to model the dependence structure between loss severity and loss frequency.
Since our proposed copula model is able to account for the unique upper tail negative dependence
14
Figure 3: Scatter plot on the original scale.
Scatter plot
30000
●
●
●
●
10000
20000
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●●●
● ●
●
●●
●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●●●
●
●
●
●
●
●● ●● ●
●●
●
●
●
●
●
●●●
●●
●●● ●● ● ● ● ●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●●
●
●
●●
●●●
●●
●●
●●
●●
●
●●
●●
●
●●
●●
●
●●●●●●●● ●● ●●● ● ●
● ●
●
●
●
●
●●
●
●●
●
●
●
●
0
Average expense per visit
●
0
20
40
●
●
60
80
100
Number of visits per year
Figure 4: Asymmetric upper tail negative dependence between loss frequency and severity. In
the left plot, the marginals are transformed to standard normals; in the right plot, the pseudo
data is fitted by the GGS copula. It is clear that when the number of visits is larger, the relation
between number of visits and average expenses becomes more negatively dependent, while the
rest parts seem to be independent. That is, there is upper tail negative dependence between the
loss frequency and severity data.
GGS copula fitted
−3
−2
−1
0
1
2
2
●
●
●
●
●
●
●
●
●
●● ●
●
●
●●
●
●
● ●
● ● ●●●●●● ●
●●
● ●
●
● ● ●●
●
●
●
●
●
● ●
●●●●
● ●
●
●
●
●●●
● ●●
●● ●
●●
●
●
●●●
●
● ●●●
●
● ●●●●
●●
●●
●
●
●
●
● ●●
● ●●●
●
●
●●●● ●●
●● ●●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
● ●●
●● ●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●●
●
●
● ●●●
●●
●
●●●
●●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●● ● ●●●
●
●●●
●●●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●●●
●
●
●
●● ● ● ●
●
● ●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
● ●●
●●●
●●
●●
●
●●●●●●
●
●
●
●●
●●
●
●
● ●●● ● ●
●
●●●●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
● ● ● ●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
● ●●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●●
●
●
●●
●●●●
●
●
●●
●●
●
●●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●
●
● ●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ● ● ●●●●
●
●
●
●
●● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●●●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●● ● ●
●●
●
●●●
●●
●●●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●●
●●●● ●●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●
●
●● ●
●
●
●●
●
●
●
● ●
●●
●
●
●
●
●
● ●●● ●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●●
●
●●● ●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●
● ● ●●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●● ●
●
●●
●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●● ●●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●●●
● ●●● ●●
●●
●
●●●
●●
●● ●
●
●
● ●
●● ●
●
●
●●
●
●●
●
●
●
●●
●
●
● ● ●
●
● ●●
●●●
● ●●● ●●
●
● ●
●
●
●
●
●●
●
●● ●
● ●
●●●
●
●●
●
●
●●●
●●
●●
●
● ●● ● ● ●
●
●●●●
●
● ●● ●●
● ●● ●●
●
●● ●●
●● ●●
● ●
● ●● ● ●
● ●●●●●
●
●
●
● ● ● ●
● ●●
●●
●● ●●
● ●
●
●●
● ● ●
●
●
●● ●
●
●
●
●
●●
●
●
1
0.08
0
0.12
0.14
−1
0.1
0.06
0.04
−2
2
1
0
−1
−2
−3
Normalized average expense
3
Normalized scatter plot
3
0.02
−2
normalized number of visits per year
15
−1
0
1
2
pattern, where larger losses occur, the regression analysis based on the GGS copula is expected
to outperform the analysis based on the existing copulas.
For both loss frequency (N ) and loss severity (Y ), we have considered ages, incomes, sex, education, insurance coverage and races for the covariates. We have tried several marginal regression
models such as gamma and lognormal distributions for Y and zero-truncated Poisson and Zipf’s
distributions for N . The right tail of the gamma distribution is too light for the average losses Y ,
and the right tail of the zero-truncated Poisson is also too light for capturing the heavy tails of
the loss frequencies. Based on preliminary data analysis, we finally chose age, insurance coverages
and races as the covariates for both N and Y . The other covariates were either non-significant
or leading to relatively worse AICs. Note that, for the mixed copula method, the covariates for
frequency and severity are not necessarily the same, although we chose the same set of covariates
for our data analysis. We have not compared all the distributions for the marginal regression
models, but the lognormal and Zipf’s models are good enough to capture the main features in
the dataset. In the following, we discuss the two marginal regression analysis, respectively.
4.2
Marginal regression
Now, denote a covariate vector as xi , i = 1, . . . , 2263. The marginal regression model for the loss
frequency using Zipf’s distribution can be written as
n−s
fN (n|s, m) = P[N = n|s, m] = Pm −s ,
i=1 i
n = 1, 2, . . . , m,
s > 0,
(9)
where s > 0, m ∈ {1, 2, 3, . . . } are the parameters of the Zipf’s distribution, and m is the maximum
value of N ; for this dataset, we chose m = 98, the maximum number of visits in the dataset.
Zipf’s distribution has a power law, which means that the right tail of the distribution is heavier
than the commmonly-used Poisson distribution. The Zipf’s distribution can be looked at as a
discretized Pareto distribution, and the value of s determines the degree of tail heaviness: a
larger s corresponds to a lighter right tail of the distribution, and vice verse. The covariates are
introduced as follows
ln(si ) = xT
i η,
i = 1, . . . , 2263,
where η is the regression coefficients for the loss frequency (including the intercept term). We
used a linear form in xi to demonstrate how to apply the regression model. However, a nonlinear
relation between ln(si ) and xi could lead to a better fitting, for which a transformation can be
first applied on xi ; for example, polynomials or splines may be considered here.
16
The lognormal model for the loss severity can be written as
(ln y − µ)2
fY (y|µ, σ) =
,
exp −
2σ 2
σy 2π
1
√
where µ is the location parameter and σ is the scale parameter σ, and the covariates are introduced
through the following equation
µi = xT
i γ,
i = 1, . . . , 2263,
where γ is the corresponding regression coefficients (including the intercept term). We assume
that the scale parameter is the same conditioning on different values of covariates.
The maximum likelihood estimates (MLEs) of the regression coefficients and the location
parameters are reported in Table 2. In order to diagnose how well the two regression models
fits the dataset, we used normalized QQ plots of quantile residuals (Dunn and Smyth, 1996)
that are in Figure 5. From Figure 5, we find that the lognormal regression and Zipf regression
models fit the dataset quite well. So, in the copula regression model, we will keep using these
two distributions for fitting the marginals, while incorporating the GGS copula for capturing the
dependence structure.
Here we have both continuous and discrete response variables. The procedures of getting
normalized QQ plots are different for continuous and for discrete variables. We refer to Dunn
and Smyth (1996) for more details, and here we only briefly introduce the main steps as follows.
The basic procedure of deriving the normalized QQ plot for the average loss expense is that:
(1) derive the corresponding cumulative probability for each response variable from the fitted
regression model; (2) transform the cumulative probabilities obtained from step (1) by Φ−1 , the
quantile function of the standard normal distribution, to get the sample quantiles; (3) plot the
sample quantiles obtained from step (2) against the theoretical quantiles of a standard normal
distributions with those quantiles being calculated based on the ranks of the sample quantiles.
For the normalized QQ plot for the discrete variable (number of visits), the procedure is that:
(1) derive the corresponding cumulative probabilities for each response variable y and for y − 1,
respectively; (2) randomly sample a probability from a uniform distribution with the endpoints
of the uniform distribution being the corresponding cumulative probabilities for y and y − 1; (3)
transform the probability obtained from step (2) by Φ−1 to get the sample quantiles; (4) plot
the sample quantiles against the theoretical quantiles that can be obtained following the same
procedure of deriving those for continuous variables.
17
Figure 5: Normalized QQ plots of quantile residuals for regression on marginals: the left panel
is the plot for lognormal regression on average expenses, and the right panel is the plot for Zipf
regression on numbers of visits per year.
4
Normal QQ plot for number of visits
4
Normal QQ plot for average costs
2
0
Sample quantiles
0
−2
−2
●
−4
●
−4
Sample quantiles
2
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
−4
−2
0
2
4
−4
−2
Theoretical quantiles
0
2
Theoretical quantiles
Table 2: Estimates of marginal and copula regression models
Marginal
Frequency Intercept
0.855
age
-0.002
ins(2)
-0.114
ins(3)
-0.085
race(2)
0.044
race(3)
0.138
race(4)
0.155
Severity
Intercept
5.911
age
0.005
ins(2)
-0.707
ins(3)
-0.541
race(2)
0.115
race(3)
0.025
race(4)
0.387
ln(σ)
0.418
Dependence ln(α)
ln(β)
-
18
s.e.
0.039
0.001
0.029
0.053
0.041
0.075
0.036
0.096
0.001
0.070
0.132
0.104
0.177
0.089
0.015
-
GGS
0.869
-0.002
-0.108
-0.118
0.026
0.133
0.133
5.889
0.005
-0.685
-0.447
0.168
0.064
0.412
0.421
5.476
2.430
s.e.
0.038
0.001
0.028
0.052
0.040
0.072
0.035
0.094
0.001
0.069
0.130
0.101
0.172
0.087
0.015
0.011
0.022
4
4.3
Copula regression
For a bivariate Archimedean copula C(u, v) = ψ(ψ −1 (u) + ψ −1 (v)), the density function can be
written as
c(u, v) =
ψ 00 (ψ −1 (u) + ψ −1 (v))
.
ψ 0 (ψ −1 (u))ψ 0 (ψ −1 (v))
(10)
For the GGS copula, based on (8),
ψ 0 (x) = −Γ(α − β, x1/β )/Γ(α);
ψ 00 (x) = xα/β−2 exp{−x1/β }/(βΓ(α)).
Then we can derive the density function of the copula in (10), where the inverse ψ −1 can be
obtained by a numerical method. Since here ψ is a strict monotone function, a numerical method
usually works very well for deriving the inverse ψ −1 . In order to avoid calculating Γ(α) for a
large α (e.g., Γ(α) can not be calculated in the R software (R Development Core Team, 2012)
when α ≥ 172 on a Windows 7 32-bit operating system), we use the following Equation (11) for
calculating Γ(α, x)/Γ(α) so that α can be ≥ 172. The calculation actually can be done for α ∈ R.
Note that
Γ(α, x) = (α − 1)Γ(α − 1, x) + xα−1 e−x ;
Γ(α) = (α − 1)Γ(α − 1);
γ(α, x) = (α − 1)γ(α − 1, x) − xα−1 e−x ,
where α ∈ R and α ∈
/ {−1, −2, . . . }; we refer to Fisher et al. (2003) for a relevant discussion for
the cases where α can be a negative integer.
The following are approaches of how we deal with numerical issues such as large values of
parameters. Letting [α] := max{z integer : z < α}, and ξ := α − [α],
Γ(ξ, x)
Γ(α, x)
=
Γ(α)
Γ(ξ)
−x
e
xξ
x
x
x
x
+
×
× 1+
1+
1 + ··· +
1+
.
Γ(ξ)
ξ
1+ξ
2+ξ
[α] − 2 + ξ
[α] − 1 + ξ
19
(11)
To calculate Γ(α − β, x)/Γ(α), we can write
Γ(α − β, x)
Γ(α − β, x) Γ(α − β)
=
×
Γ(α)
Γ(α − β)
Γ(α)
Γ(α − β, x) (α − β − 1) × · · · × (α − β − [α − β]) × Γ(α − β − [α − β])
×
,
=
Γ(α − β)
(α − 1) × · · · × (α − [α]) × Γ(α − [α])
(12)
and then the first term in (12) can be calculated using Equation (11) again by replacing α by
α − β.
For calculating ψ 00 (x), when x is too large, there could be numerical errors for calculating
xα/β−2 . So, we need to find ln(ψ 00 (x)) first; that is,
ln(ψ 00 (x)) = (α/β − 2) ln(x) − x1/β − ln(β) − ln(Γ(α)).
Due to limitations of space, we refer to Kr¨amer et al. (2013) for a reference on how to incorporate a copula into the two marginal regression models so that we can obtain the MLEs for those
regression coefficients (η, γ) and for the dependence parameters (α, β). The MLEs are reported
in Table 2, where the likelihood was based on the overall model with marginals and dependence
being fitted simultaneously.
Based on the MLEs of the dependence parameters, we can get the estimated upper and lower
tail orders are respectively κ
ˆ U = 21.02 and κ
ˆ L = 1.06. It is clear that the upper tail order is
greater than 2, the dimension. So, it has upper tail negative dependence, and this is consistent to
the preliminary plot as shown in the left panel of Figure 4. While the estimated lower tail order
suggests that the dependence structure in the lower tail is positive dependence.
In order to estimate the aggregate loss, we have two ways based on two different assumptions.
Firstly, we can use the mixed copula approach assuming that loss frequency and loss severity are
dependent (Kr¨amer et al., 2013) according to the dependence parameters we have obtained from
the GGS copula. Secondly, we may assume that loss frequency and loss severity are independent,
and conduct the marginal regressions separately and use the sum of the products between expected number of visits and expected cost per visit as the aggregate loss. In Table 3, we report
the estimated aggregate losses based on the two approaches respectively, and the actual aggregate loss is presented as a comparison. The likelihood used to calculate the AIC for the mixed
copula approach is the likelihood that was obtained when fitting the regression coefficients and
the dependence parameters simultaneously. The likelihood used to get the AIC for the independence assumption is the product of the likelihoods associated with the two marginal regressions
respectively. Since the GGS copula includes the independence copula as a special case, we can
also let the two dependence parameters of the GGS copula be α = 2 and β = 1 to calculate the
expected aggregate loss and the likelihood of the model with the independence assumption, which
20
actually lead to the same results as reported in Table 3.
Table 3: Aggregate loss comparisons, where AIC = −2×log likelihood+2×number of parameters.
GGS copula
Aggregate Loss (USD)
5, 733, 236
AIC
41, 812
Independence
Data
8, 153, 765 5, 371, 218
41, 869
–
Based on Table 3, it is clear that the mixed copula model based on the GGS copula fits the
overall model relatively better with a smaller AIC, and the GGS copula regression significantly
outperforms the independence model in assessing the aggregate loss. The empirical analysis suggests that, when the upper tail appears to be negatively dependent, a misspecified independence
model that is often used in aggregate loss modeling may overestimate the aggregate loss. Our
results also supplement a finding from an auto insurance claim data that was analyzed in Kr¨amer
et al. (2013), where a mild positive dependence between loss frequency and loss severity appears,
and the independence model underestimates the aggregate loss.
The MEPS dataset that was used for empirical analysis actually contains the total expense
per year for each individual. For this particular dataset, one can choose to model the aggregate loss and the number of visits jointly as the bivariate response variable, then a copula that
can explain positive upper tail dependence can be used. As suggested by the above empirical
analysis, the dependence between yearly total expense and the number of visits per year is not
linear, and conditioning on different values of covariates there may be different degrees of positive
dependence. To this end, one can also consider a full-range tail dependence copula and allow
the dependence parameter change along different values of covariates. We refer to Hua and Xia
(2014) for more detailed discussion on regression on dependence parameter with full-range tail
dependence copulas.
5
Concluding remark
From insurance data, one often observes two unique non-Gaussian phenomena. Firstly, the univariate marginals are often skewed and the right distributional tails could be light or heavy.
Secondly, the dependence structure between univariate marginals or between their transformed
forms often can not be well captured by a covariance matrix. To this end, copula has proved to
be a very useful tool in dealing with these situations. Moreover, statistical inference on high-risk
scenarios often plays a critical role, and like the case we studied in Section 4, it may influence the
overall assessment dramatically. Therefore, when choosing a copula for modeling the dependence
structures, we have to be particularly careful about the tail behavior of the candidate copula
21
models. An ideal candidate copula shall be the one that has less number of parameters but wider
range of dependence in both upper and lower tails, and the range of dependence of the copula
shall be able to cover the actual range of dependence suggested by the observed data and beyond;
moreover, implementation of the copula should be achievable.
Bearing those criteria in mind, we first narrow down the families of copulas and consider only
Archimedean copulas, because it is suitable for constructing asymmetric dependence structures
between upper and lower tails and for incorporating tail negative dependence as well. Some
sufficient conditions for upper tail negative dependence have been derived from a scale mixture
representation of Archimedean copulas. Through theoretical study on copulas, we have constructed new parametric copulas that have simple forms and desirable properties. We implement
the GGS copula into a mixed copula regression model. An upper tail negative dependence pattern
between loss severity and loss frequency has been detected and modeled for a medical expenditure dataset. The mixed copula regression with the GGS copula provides a significantly better
assessment for total losses. On the other hand, assuming that the loss severity and frequency are
independent overestimates the total losses.
Since the yearly total expenditure is actually available for the dataset, one may consider
using a Tweedie model (Tweedie, 1984) for modeling the aggregate loss directly. A comparison
between the Tweedie regression and the mixed copula regression based on the GGS copula would
be interesting. Moreover, a more challenging and interesting question is that, can one or two
dependence parameters be added in the Tweedie model so that loss frequency, loss severity and
their dependence structures can be taken into consideration simultaneously?
Acknowledgment
This research conducted was partially supported by a grant from the Casualty Actuarial Society
through an Individual Grants Competition, and partially supported by the Research and Artistry
Grant from Northern Illinois University. The author is thankful to the precious time of the
reviewers for the above grants, and to the helpful comments from Professor Edward W. Frees
on a preliminary manuscript; the comments lead to improved presentation of the paper. All
remaining errors are the author’s own.
References
Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas,
Graphs, and Mathematical Tables. Number 1972. Dover publications.
22
Bingham, N. H., Goldie, C. M., and Teugels, J. L. (1987). Regular Variation, volume 27 of
Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge.
Charpentier, A. and Segers, J. (2009). Tails of multivariate Archimedean copulas. Journal of
Multivariate Analysis, 100(7):1521–1537.
Czado, C., Kastenmeier, R., Brechmann, E. C., and Min, A. (2012). A mixed copula model for
insurance claims and claim sizes. Scandinavian Actuarial Journal, 2012(4):278–305.
Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational
and Graphical Statistics, 5(3):236–244.
Embrechts, P. and Goldie, C. M. (1980). On closure and factorisation properties of subexponential
and related distributions. Journal of the Australian Mathematical Society, Series A, 29:243–256.
Embrechts, P., Kl¨
uppelberg, C., and Mikosch, T. (1997). Modelling Extremal Events, volume 33.
Springer-Verlag, Berlin.
Ferguson, T. (1973). A bayesian analysis of some nonparametric problems. The annals of statistics, pages 209–230.
Fisher, B., Jolevsaka-Tuneska, B., and KiliC
¸ man, A. (2003). On defining the incomplete gamma
function. Integral Transforms and Special Functions, 14(4):293–299.
Frees, E. W. (2010). Regression Modeling with Actuarial and Financial Applications. Cambridge
University Press.
Gschl¨oßl, S. and Czado, C. (2007). Spatial modelling of claim frequency and claim size in non-life
insurance. Scandinavian Actuarial Journal, 2007(3):202–225.
Hua, L. and Joe, H. (2011). Tail order and intermediate tail dependence of multivariate copulas.
Journal of Multivariate Analysis, 102:1454–1471.
Hua, L. and Joe, H. (2012). Tail comonotonicity: properties, constructions, and asymptotic
additivity of risk measures. Insurance Math. Econom., 51:492–503.
Hua, L. and Joe, H. (2013). Intermediate tail dependence: a review and some new results. In Li,
H. and Li, X., editors, Stochastic Orders in Reliability and Risk: In Honor of Professor Moshe
Shaked, chapter 15, pages 291–311. Springer.
Hua, L. and Xia, M. (2014). Assessing high-risk scenarios by full-range tail dependence copulas.
North American Actuarial Journal, 18(3):363–378.
23
Joe, H. (1997). Multivariate Models and Dependence Concepts, volume 73 of Monographs on
Statistics and Applied Probability. Chapman & Hall, London.
Joe, H. and Hu, T. (1996). Multivariate distributions from mixtures of max-infinitely divisible
distributions. Journal of Multivariate Analysis, 57(2):240–265.
Kr¨amer, N., Brechmann, E. C., Silvestrini, D., and Czado, C. (2013). Total loss estimation using
copula-based regression models. Insurance: Mathematics and Economics, 53(3):829 – 839.
Larsson, M. and Neˇslehov´a, J. (2011). Extremal behavior of archimedean copulas. Advances in
Applied Probability, 43(1):195–216.
Malov, S. (2001). On finite-dimensional archimedean copulas. In Asymptotic Methods in Probability and Statistics with Applications, pages 19–35. Birkhauser.
McNeil, A. J. and Neˇslehov´a, J. (2009). Multivariate Archimedean copulas, d-monotone functions
and l1 -norm symmetric distributions. Annals of Statistics, 37(5B):3059–3097.
McNeil, A. J. and Neˇslehov´a, J. (2010). From archimedean to liouville copulas. Journal of
Multivariate Analysis, 101(8):1772–1790.
Nelsen, R. B. (2006). An Introduction to Copulas. Springer Series in Statistics. Springer, New
York, second edition.
R Development Core Team (2012). R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Tweedie, M. (1984). An index which distinguishes between some important exponential families.
In Statistics: Applications and new directions: Proc. Indian statistical institute golden Jubilee
International conference, pages 579–604.
Zipf, G. K. (1932). Selected Studies of The Principle of Relative Frequency in Language. Harvard
Univ. Press.
24