Practical Application of Sample Size Determination Models for Practical Application of Sample Size Determination Models for Assessment of Mortality Outcomes in Swine Field Trials Assessment of Mortality Outcomes in Swine Field Trials

,


Summary
Mortality within livestock production has a substantial impact on economic sustainability of enterprises. The livestock industry has a unique opportunity to maximize animal welfare and production efficiency by minimizing morbidity and mortality. To do so, investigators must appropriately balance the utilization of limited resources with the quantity necessary for robust research. The objective of the study was to illustrate the use of readily implementable statistical models to determine sample size necessary to detect statistically significant differences between groups with varying levels of mortality. To this end, a series of examples were created where the unit to which a treatment is independently applied (experimental unit, EU) is either half of a 1,200 pig barn (group-level) or an individual pen (pen-level, 25 pigs per pen, 48 total pens) within a 1,200 pig barn. These examples and corresponding models can be readily adapted and implemented using SAS software to meet individual needs. Model inputs include the number of pigs, barns, and pens when appropriate, and model output is the calculated statistical power. When the EU is half-barn and mortality is measured on a group-level, 7 barns would be necessary to detect a 1 percentage unit difference in mortality (2% vs. 1% for two groups, respectively). As the observed difference in mortality between groups increases, the number of barns needed to detect the respective difference decreases and vice versa. When the EU is the individual pen containing 25 pigs, and mortality is measured on the pen level, a total of 240 pens or 5 barns would be necessary to detect the same 1 percentage unit reduction in mortality from 2% to 1%. Comparing the results derived from group or pen-level mortality, a relatively close number of pigs is necessary to detect differences. For example, if comparing group mortalities of 3% vs. 4%, the number of barns necessary for group-level mortality is 11 and for pen-level mortality is 9. The models currently proposed incorporate appropriate design structure features for a series of different study designs and assume that observed mortality follows a binomial distribution. Performing proper sample size calculations

Introduction
Mortality represents a significant loss and inefficiency within livestock production systems. The livestock industry has a unique opportunity to maximize animal welfare and production efficiency by minimizing morbidity and mortality using scientifically rigorous methods. Historically, the primary outcome of interest in many animal experiments was growth performance and feed efficiency. As such, experiments were designed using power calculations with these primary outcomes in mind. Because mortality has historically been a secondary outcome of interest, most studies are not powerful enough to detect the observed differences. Moreover, power calculations for observed responses not following Gaussian (normal) distributions, such as mortality, are more complicated than standard power calculations, and accounting for data structure, including blocking and subsampling further complicates this challenge. In the current report, we set out to describe how a practical tool called "What Would Fisher Do?" (WWFD) tables, described by Stroup, 4 can be used to visualize the key aspects of an experimental design and provides a clear link for transferring to a statistical model. Once a clear layout of design and treatment structure has been created, the study design can be incorporated into power calculation models using statistical software and be tailored to the specific research question of interest. Our objective was to illustrate the use of readily implementable models to determine the sample size necessary to detect statistically significant differences between groups with varying levels of mortality. Using these models as a base, the reader can tailor to the specific research question and determine the number of animals necessary to detect the anticipated difference in mortality or other responses, which may be assumed to follow a binomial distribution.

Procedures
Scenarios Two primary scenarios are described in the current report: 1) treatment is randomly applied to half of a barn with the other half of the barn serving as the control, and 2) treatment is randomly applied to different pens within a barn. In scenario 1, the halfbarn (group) serves as both the experimental unit (EU) and the observational unit (OU) with 1,200 pigs per barn as the default barn size. In scenario 2, pen serves as both Swine Day 2020 the EU and OU with 1,200 pigs per barn and 48 pens per barn as the default parameters. In both cases, mortality is represented as a proportion of dead pigs to the number at the beginning of the experiment per EU. In such scenarios, a binomial distribution is often fit to model the response distribution. When using a binomial response, mu is the number of events (e.g., number of animals showing the characteristic of interest (numerator)) each asking a yes/no question, and the number of trials (N; denominator) corresponds to the number of independent observations for each EU. 5 For example, if pen serves as EU with 25 pigs in each pen, mu represents the number of pigs that died (or inversely can be expressed as number surviving) and N will consist of 25 independent trials (N = 25). Using a binomial outcome distribution, the amount of information differs based on the number of trials per EU as well as the number of EU.

Development of Power Calculation Models
All models currently presented use a binomial distribution of observed mortality. Further discussion regarding model development theory is provided by Stroup. 2 As a brief overview of the process, an example dataset is generated randomly that reflects the design features of the experiment and the expected mean responses. From there, the example data are analyzed holding variance components constant to calculate the critical F-test statistic assuming α = 0.05 (probability of rejecting the null hypothesis given the null hypothesis is true). This along with the numerator degrees of freedom and denominator degrees of freedom, is used to calculate the non-centrality parameter. Then the calculation of power (1 -β), using the critical F-test statistic, numerator degrees of freedom, denominator degrees of freedom, and non-centrality parameter is performed. These examples and corresponding models can be readily adapted and implemented using SAS software. All analyses were performed using SAS version 9.4 (SAS Institute, Inc., Cary, NC, USA) using the GLIMMIX procedure.
The inputs in the models provided include the number of units (barns, pens, and pigs) and the anticipated difference in mortality percentage between the two groups. The output of the model is the calculated power (probability of rejecting null hypothesis given alternate hypothesis is true). The models can be used to evaluate how varying inputs can result in different statistical power. If using the models to determine how many units ("sample size") are necessary to have sufficient power given a mortality difference provided, the number of units can be increased until calculated power is at a level deemed appropriate by the researcher (often 0.80 or 80%). For group-level mortality, the unit of change implemented in the model is increasing the number of barns by 1 until the desired power is achieved. For pen-level mortality, if 48 pens or fewer are necessary to detect the desired difference, the unit of change is by two pens at a time to ensure that there are equal numbers of pens on each treatment. For pen-level mortality combinations where using 48 pens (1 barn) does not result in a calculated power of 80% or greater, multiple barns must be used with barn serving as the blocking factor to reach the desired threshold. In these scenarios, the unit of change implemented by the user is the number of whole barns. Thus, if 48 pens are not sufficient to detect desired differences with desired power, using the multi-barn pen-level model would next go to 2 barns (96 pens), or 3 barns (144 pens), until the desired power is achieved. This scenario is constructed as a generalized randomized block design, Swine Day 2020 meaning that there are multiple observations (pens) assigned to each treatment within block (barn). Such a design assumes no block × treatment interaction, or simply put, that all treatments respond similarly across all barns. The models can easily be adapted to a non-generalized, randomized complete block design with subsampling (conventional RCBD with subsampling) with the addition of a block × treatment random effect and corresponding estimate of variance. Doing so requires an accurate estimation of block × treatment, resulting in reduced degrees of freedom available for treatment effect interpretation compared to a generalized randomized block design. For both the group-level and multi-barn pen level models, barn serves as a blocking factor, and an estimate of variability must be included in the model. For the baseline models, a value of 0.1162 is used based on a commercial, multi-site trial in which the between-site variance was 0.1162 (unpublished data). Thus, this value is used as our baseline between-block variance within the models. The specific value used, however, does not substantially affect the model outcome because all treatment comparisons are made within block. This can be demonstrated by using either model and inputting a wide variety of values, and the calculated power only makes minor changes without affecting interpretation. We therefore use 0.1162 within the model as the starting value used within the calculations, but recognize the specific value has little impact on calculated mortality in these models.

Interpretation of "What would Fisher Do?" Tables
The use of WWFD tables was described by Stroup 2 and practical implementation was demonstrated by Bello and Renter. 6 Creation of WWFD tables is a useful tool to help connect a visual plot plan outlining the process that generated the data and the treatment structure to the specification of a linear predictor that can easily be translated into statistical software. Implementation of WWFD tables can be equally useful for experimental and observational studies. As a brief summary of how to construct and interpret, all aspects pertaining to the design structure, or process that generated the data, reside in the series of columns below "design structure," whereas components of treatment effects or predictor variables reside below "treatment structure." Within the design structure column, the sum of degrees of freedom (df) must equal the number of observations -1. Additional key concepts underlying WWFD tables include the fact that blocking factors occur at the row above the treatment effect for which the blocking effect serves. Subsequently, the EU for each effect in the treatment column resides in the row just below the respective treatment effect. The bottom row of the WWFD table denotes the OU. In designs with no subsampling, repeated measures, split-plot, or stripplot designs, the EU and OU can be the same physical entity. The presence of more complex design structures, however, can result in situations where the EU and OU are different entities. This is visualized as the EU for appropriate treatment effect resides on a different line than the unit of observation in this design. Once a WWFD table is properly constructed, components included in the design structure column can be converted to "random" effects, and components of the treatment structure columns can be included as "fixed" effects following mixed model theory. Greater detail regarding the development and implementation of WWFD tables can be found elsewhere. 2,4

Results and Discussion
When treatment is applied to a half-barn, or group, the number of barns needed to detect significant differences in mortality between two treatments is provided in Table 1. To detect statistical significance between two groups with a mortality of 2% and 1%, respectively, a total of 7 barns would be needed. As the difference in observed mortality between groups increases (moving towards the right side of Table 1 within the row), the number of barns needed to detect the respective difference decreases. Also, when the difference in mortality percentage between groups decreases (move down within column), the number of barns necessary to detect differences increases.
When treatment is applied to the individual pen, the number of pens (and barns if more than 1 barn is necessary to detect the desired difference) is provided in the lower section of Table 1. If using the same mortality levels of 2% and 1%, respectively, to the group-level mortality example above, a total of 240 pens (5 barns) would be necessary if using 1,200 pigs per barn and 48 pens per barn with half of the pens within each barn randomly allotted to either treatment or control.
Comparing the results derived from group or pen-level mortality would result in relatively similar numbers of pigs necessary to detect differences. For example, if comparing mortalities between treatments with 3% vs. 4% mortality, respectively, the number of barns necessary, if half-barn was the EU, is 11. In the same 3% vs. 4% mortality comparison, the number of barns necessary is 9 when pen is the EU. Between these two approaches (group or pen-level mortality), the number of EU differs greatly (11 for half-barn and 216 for pen), but the amount of information within each EU also differs greatly (Table 2). For group-level mortality, a smaller number of EU are present, but each EU contains a large amount of information represented as the number of individual pigs or trials per EU. Pen-level mortality has more EU, but each EU does not carry as much information. This results in a relatively close number of total pigs necessary (13,200 and 10,800 pigs for group and pen-level, respectively). Calculated power for binomial processes is a function of both the number of EU and the amount of information per EU.
The models are described in WWFD tables (Tables 3 to 5) to illustrate the data structure, and corresponding statistical models and code are provided in Figures 1 to 3. These models can be tailored to the specific question of interest by changing the highlighted areas, granted users understand statistical concepts and notions of study treatment and design features highlighted in this report. This information can be used by researchers to determine the most appropriate study design and sample size necessary to detect differences of interest. Performing proper sample size calculations prior to the initiation of research trials is critical to determine the necessary size of the trial. This ensures that the trial is large enough to detect statistical differences between the anticipated differences of the groups while optimizing the use of limited resources.
It is very important to recognize that proper use of this information requires background knowledge regarding statistical analysis, including the understanding of EU, OU, and design features such as blocking, subsampling, other hierarchical levels (farms, sites, states, etc.), repeated measures, and adjustments for multiple comparisons. When these assumptions change based on the experiment being planned, appropriate modifi-

Swine Day 2020
cations must be made to the models. It is always recommended to engage a statistician in the early stages of experimental design to ensure all facets of the proposed experiment are appropriately accounted for, to reduce the risk of making inappropriate production decisions. When utilizing these models, it is necessary to have a basic level of proficiency using SAS software to ensure that the models are appropriately implemented and accurately reflect the characteristics of the study. Lastly, it is not recommended that these models be used in a manner that assumes one size fits all. Rather, we provide the information as a foundation from which the reader can use appropriate content expertise and statistical principles to assist with designing experiments.  Barn  11  22  600  13,200  Pen-level  Pen  9  432  25 10,800 1 Assumptions include each barn contains 1,200 pigs. For group-level mortality, pigs are randomly assigned to one of two treatments in a randomized complete block design with barn serving as blocking factor and the experimental unit and observational unit are both assumed to be group within barn (1/2 of barn). For pen-level mortality, each barn contains 48 pens (25 pigs/pen). Pens of pigs are randomly assigned to 1 of 2 treatment groups (treatment and control). The experimental unit and observational unit are assumed to be pen, indicating a generalized randomized block design. Between-block variance is estimated to be 0.1162 for both scenarios. 2 EU = experimental unit. OU = observational unit. N = number of trials (pigs) per EU. 3 Assumes that treatment is randomly applied to half-barn. An example would be two feed lines within a barn with each feeding half of the pigs within barn.     Tables 1  and 4. Highlighted areas can be modified for situation-specific application. 1 Generalized randomized block design. Assumes no block × treatment interaction or that treatment behaves similarly across all blocks. Barn serves as blocking factor and pen is assumed to be the experimental and observational unit. 2 In this example, 5 barns would be used [degrees of freedom (df) calculated as # of barns -1] with 2 treatments (df calculated as # of treatments -1). Pen nested within barn is the experimental unit for treatment and observational unit, so df are calculated as (# of pens within each barn -1) × number of barns. The df for pen within cross product of barn and treatment is calculated by subtracting df for treatment from df for pen nested within barn. Total number of observations is 240 in this example and total df equals (# of observations -1). data Fpower; set tests3; alpha= 0.05 ; /*95% confidence*/ Fcrit=Finv(1-alpha, numdf, dendf,0); noncent_parm=numdf*Fvalue; Power=1-ProbF(Fcrit,numdf,dendf,noncent_parm); run; proc print data=Fpower; run;  Tables 1  and 5. Highlighted areas can be modified for situation-specific application.