Professor, Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Hannover, Germany
Introduction
Many disorders in animals are observed more frequently in certain breeds and within breeds more often in the same families. Familiarity is assumed for a disorder when families are observed with more than one affected family member. Familial disorders may have a genetic contribution. The same is often claimed for disorders which show a breed disposition. On the other hand, genetically caused diseases may not necessarily lead to breed differences in incidence but will contribute to variation among families within breeds. A useful starting point for answering the question whether a disorder is inherited is by drawing pedigrees to provide an initial impression of the distribution of affected and non-affected animals and how frequently the disorder is transmitted from one generation to the next. General evidence for genetic contribution to a disorder is given when environmental factors can be excluded as the only responsible causes for a disorder and a significant proportion of the phenotypic variation of a disorder can be explained by genetic models. With increasing molecular genetic data, the type of gene action based on known DNA sequence variation can be characterized by individual genes and the nature of complex genetic traits can be understood much better.
The presentation will give an overview on the model components included in estimation of the mode of inheritance based on phenotypic data and further developments for incorporation of molecular genetic data into the analyses.
Segregation Analysis
Segregation analysis is employed to determine whether familial data for particular disorders or other traits are compatible with specific modes of inheritance. Modes of inheritance tested in segregation analyses include monogenic (Mendelian), digenic or polygenic models. In addition, age of onset, sex effects and sampling scheme can be taken into account besides the specific genetic hypothesis under consideration. Simple segregation analysis tests the segregation parameter θ under a specified sampling scheme and mating type. Pedigrees used for segregation analysis may be from specifically planned matings or randomly sampled pedigrees with arbitrary structure or sampled through ascertained cases in clinics or veterinary practice. Arranged matings among animals can be more easily tested for specific modes of inheritance than pedigrees with arbitrary structure, missing data and many inbred animals. In the case of a rare disease and an autosomal dominant hypothesis, the segregation ratio θ is assumed to be 0.5 as families segregating for the trait are most likely composed by matings of heterozygous affecteds and homozygous non-carriers. As far as the segregation ratio is not significantly different from θ = 0.5, this mode of inheritance is accepted. Different methods for estimating θ have been developed and are easily applied (Singles Method, Weinberg's General Proband Method). These simple approaches to segregation analyses often encounter problems when different mating types have to be considered and several hypotheses are more or less likely. Complex segregation analyses have been developed to allow for more factors to vary and to reduce the restrictions on assumptions to be made for the model tested. Methods used to solve the likelihood functions are based on maximum likelihood or Markov chain Monte Carlo approaches (Gibbs sampling).
Complex Segregation Analysis
Complex segregation analysis is based on a mathematical model that incorporates several, functionally independent components to accommodate for arbitrary mating types, different modes of monogenic or oligogenic inheritance (major genes), to allow for polygenic variation and non-genetic variation in addition to major genes and different data types such as binary, categorical and continuous data. In addition, age of onset of a disease and sampling scheme (random pedigrees versus non-randomly selected pedigrees) can be modeled. The basic model as formulated in the Elston-Stewart algorithm was the basis for the more complex models. The Elston-Stewart algorithm included a component describing the joint distribution of genotypes of mating individuals whereby these genotypic distributions stem from a single locus with two alleles (monogenotype), a few loci with each two alleles (oligogenotype) or from a polygenotypic distribution with an infinite number of genotypes (polygenotype). The second component of the Elston-Stewart algorithm specified the relationship between the genotypes and phenotypes, separately for each genotype (penetrance function). Mathematically, the phenotype investigated is modeled as a conditional probability on the genotype underlying the model used. The simplest genetic model for a dichotomous trait and a monogenic autosomal inheritance of two alleles is then completely defined by the following genotype to phenotype relationships: gAA(1) = gAa(1) = 1, gaa(1) = 0 and gAA(0) = gAa(0) = 0, gaa(0) = 1, where the conditional probability equals unity when for the genotypes AA and Aa the phenotypic outcome is affected (=1) and for the genotype aa the phenotypic status is unaffected (=0). Similarly, if a completely penetrant recessive trait is assumed, we have the following conditional distributions: gaa(1) = 1, gAa(1) = gAA(1) = 0, gaa(0) = 0, gAa(0) = gAA(0) = 1. Two- or three-locus models give raise to much more models (phenogrammes) how the oligogenotype is related with the phenotype. If we do not wish to assume complete penetrance we can introduce for each distinct genotype or groups of genotypes a specific penetrance. For X-linked loci, the conditional distributions of phenotypes have to be defined for males and females separately. Furthermore, traits only expressed in males or females can be modeled via the penetrance parameter allowing fully expressed traits only for one sex. Just as the phenotypic distribution may be sex-dependent, so the disorder considered has a variable age of onset and thus the observation whether the disorder is expressed, depends upon the age at examination of each individual. Then the probability that an individual with a genotype AA, Aa or aa is affected by a specific age depends of the age-related susceptibility of the genotype to the disorder. When we turn to polygenotypes, we use normal distribution functions. In the case of a binary or categorical phenotype, this model corresponds to the threshold or liability model. The polygenotypes are normally distributed with genetic variance σ2G and residual variance σ2E. An individual is affected or mildly/severely affected whose liability is greater than the threshold. The threshold may also depend upon the genotype of an additional monogenic locus.
The mode of inheritance can be described how the genetic variability is passed on from one generation to the next and is summarized mathematically by the genotypic distributions of the offspring in dependence upon the parental genotypes. Let us assume that an individual has parents with genotypes s and t, then the conditional probabilities for the genotypes of this individual can be viewed as elements of a stochastic matrix called the genetic transition matrix, probability (P) for the individual genotype given genotypes of parents s and t, P(gi|gF,gM). All types of monogenic and oligogenic inheritance can be parameterized in terms of transmission probabilities. In the autosomal monogenic model with alleles A and B, the transmission probabilities are the probabilities that an individual with genotype AA, AB or BB transmits the allele A to offspring. Using the definitions for the transmission probabilities τAA=1, τAB=0.5 and τBB=0, the probabilities for the genotype AA of the individual with parents s and t are equal to τsτt, the probabilities for the genotype AB with parents s and t are equal to τs(1-τt) + τt(1-τs) and the probabilities for the genotype BB with parents s and t are equal to (1-τs)(1-τt). Extension to several unlinked loci and linked loci is straightforward. Linked loci require recombination rates among loci as further parameters. Polygenic inheritance using an additive model can be modeled through the transmission of the gametic values being 0.5 for any polygenotype. The polygenotypes of offspring are produced by the mid-parents´ values of their polygenotypic effects with variance σ2G/2.
Sampling scheme describes the way how individuals were selected from the population for study. Random sampling means that we take a random sample of individuals from a population and then augment this sample by including all or a random sample of relatives up to a certain degree of relatedness. When well designed recording schemes are introduced, random samples of progeny or sibships with their ancestors can be collected. These samples can be collected in a specific geographic area which is not critical as long as individuals outside this area are not selected according to their phenotype or genotype. Rare conditions are hardly studied in random samples hence many uninformative families are collected. Typically for this situation, families are included in the study because at least one member of the family is affected. The kind of the non-random sampling procedure is characterized by the type of ascertainment. Complete ascertainment is given when a sibship enters the sample independently of the number of additional affected members. The opposite extreme to complete ascertainment is single ascertainment. The probability for an affected individual tends to be zero to be brought into the study when there is not more than one affected family member. Incomplete multiple ascertainment is the situation between single and complete ascertainment. To ensure a valid segregation analysis, the kind of ascertainment should be identified. Methods of estimation of the segregation ratio depend on how the families have been brought into the study. A likelihood function based on the components of the segregation analysis model can be derived and maximized for the data observed. Since the likelihood function includes the different types of genetic models as well non-genetic factors, submodels can be tested against the most general model. Inferences can be performed for both continuously and categorically distributed data and genetic models that include monogenic, digenic, polygenic and mixtures of monogenic and polygenic as well as oligogenic and polygenic models. A genetic background of a trait analysed is given when the model explaining only non-genetic factors can be rejected and models including genetic components explain a significant proportion of the phenotypic variation.
A likelihood ratio test statistic is used to compare a specific null hypothesis (H0) defined by a specific model (restricted model) against a most general (not restricted) model. The test statistic asymptotically follows a X2-distribution, and significance levels can be obtained by using this distribution. Degrees of freedom are given by the difference of independently estimated parameters for the models compared. The information criterion of Akaike (AIC) can be used as an additional measure to choose the sparsest model with the best fit to the data. The model with the smallest AIC fits the data best with a minimum number of parameters but all hypotheses that cannot be rejected against the most general model using the likelihood ratio test must also be considered as possible. The AIC criterion cannot be used to exclude a hypothesis if this model was not rejected against the most general model by using the likelihood ratio test.
Conclusions
Complex segregation analysis is a powerful tool to detect major gene variation. Quantitative genetic models rely on the assumption of many (infinite) loci with very small and equal effects. This model is severely compromised in the presence of segregation of major genes. Extensions and improvements of algorithms made to the simple segregation models allow to estimate major genotype effects in the framework of the methodology developed for quantitative genetic analysis. Gibbs sampling can be employed to estimate non-genetic effects, genotype frequencies and their associated genotypic effects and quantitative genetic variation including all relationships of the animals. When information for genetic markers in population-wide linkage disequilibrium or mutations of genes associated with trait variation can be included in the analysis, the genotypic distributions need no longer to be estimated and inferences on the genotypic effects are much more precise. Such genetic polymorphisms enable us to model the gene actions and their interactions in networks for complex genetic traits.