Effect of sampling method on the accuracy and precision of Effect of sampling method on the accuracy and precision of estimating the mean pig weight of the population estimating the mean pig weight of the population

Summary Producers have adopted marketing strategies such as topping to help reduce economic losses from weight discounts at the processing plant. Despite adopting these strategies, producers are still missing target weights and incurring discounts. One contributing factor is the error of sampling methods that producers use to estimate the mean weight of the population to determine the optimal time to top pigs. The standard sample size that has been adopted by many producers is 30 pigs. Our objective was to determine the best method for selecting 30 pigs to improve the accuracy and precision of estimating the mean pig weight of the population. Using a computer program developed in R (R Foundation for Statistical Computing, Vienna, Austria), we were able to generate 10,000 sample means for different sampling procedures on 3 different datasets. Using this program we evaluated taking: (1) a completely random sample of 30 pigs from the barn, (2) a varying number of pigs per pen to achieve a total sample size of 30 pigs, (3) selecting the heaviest and lightest pig (determined visually) from 15 pens and calculating the mean from those pigs, and (4) calculating the median of the selected pigs. Among the 3 datasets, taking a completely random sample of 30 pigs from the barn resulted in a range between the upper and lower confidence interval as high as 23 lb. Increasing the number of pens sampled while keeping the sample size constant reduced the range between the upper and lower confidence interval; however, the confidence interval (range where 95% of weight estimates would fall) was still as high as 24 lb (241 to 265 lb) when only 30 pigs were sampled. Although the range was reduced, it was not enough to make increasing the number of pens sampled a practical means of estimating mean pig weight of the barn. Selecting the heaviest and lightest pigs in 15 pens and taking the mean of the sample resulted in a reduction of the range between the upper and lower confidence


Summary
Producers have adopted marketing strategies such as topping to help reduce economic losses from weight discounts at the processing plant.Despite adopting these strategies, producers are still missing target weights and incurring discounts.One contributing factor is the error of sampling methods that producers use to estimate the mean weight of the population to determine the optimal time to top pigs.The standard sample size that has been adopted by many producers is 30 pigs.Our objective was to determine the best method for selecting 30 pigs to improve the accuracy and precision of estimating the mean pig weight of the population.Using a computer program developed in R (R Foundation for Statistical Computing, Vienna, Austria), we were able to generate 10,000 sample means for different sampling procedures on 3 different datasets.Using this program we evaluated taking: (1) a completely random sample of 30 pigs from the barn, (2) a varying number of pigs per pen to achieve a total sample size of 30 pigs, (3) selecting the heaviest and lightest pig (determined visually) from 15 pens and calculating the mean from those pigs, and (4) calculating the median of the selected pigs.
Among the 3 datasets, taking a completely random sample of 30 pigs from the barn resulted in a range between the upper and lower confidence interval as high as 23 lb.Increasing the number of pens sampled while keeping the sample size constant reduced the range between the upper and lower confidence interval; however, the confidence interval (range where 95% of weight estimates would fall) was still as high as 24 lb (241 to 265 lb) when only 30 pigs were sampled.Although the range was reduced, it was not enough to make increasing the number of pens sampled a practical means of estimating mean pig weight of the barn.Selecting the heaviest and lightest pigs in 15 pens and taking the mean of the sample resulted in a reduction of the range between the upper and lower confidence interval from 31 to 53%.Although the precision of the sample was improved, accuracy of the sampling method decreased, with the mean of the 10,000 simulations up to 8 lb lighter than the mean of the population.
Selecting the heaviest and lightest pigs can be a valuable method for improving the precision in estimating the mean of the population, but adjustments to the sampling procedure need to be developed to improve its accuracy.

Introduction
Swine producers must meet processing plant requirements for specific weights of pigs as well as weight ranges to avoid economic penalties.In attempts to reduce these economic penalties, producers have adopted marketing practices such as topping or marketing the heaviest pigs several weeks before the expected barn closeout.Because pig BW typically approximates a normal distribution, subsampling methods to predict the average weight of pigs in the barn can be used to model distributions of BW within the barn.The standard sample size that has been adopted by many producers is 30 pigs.Previous data from Kansas State University reported that for a set sample size, increasing the number of pens sampled could reduce the error in estimating the mean pig weight of the population (Paulk et al., 2011 6 ).To maximize economic return when marketing pigs, the precision of sampling pigs needs further improvement; therefore, our objective was to determine the best method of selecting 30 pigs to improve the accuracy and precision of estimating the mean pig weight of the population.

Procedures
A total of 3 datasets (A, B, and C) were used to evaluate sampling method on the accuracy and precision of estimating the pig mean weight in the barn.The first sampling method tested was a completely random sample of 30 pigs from the barn, disregarding pen arrangements.The second sampling method tested compared the number of pigs (1 to 30 pigs) sampled from an increasing number of pens to achieve a total sample size of 30 pigs.The third and fourth sampling methods tested consisted of selecting the heaviest and lightest pig (determined visually) from 15 pens (30 pigs total) and calculating the mean and median of the selected pigs, respectively.
Dataset A was derived from Groesbeck et al. (2007 7 ).Dataset A (Figure 1) comprised a total of 1,260 pigs in 48 pens with 23 to 28 pigs per pen.The mean, median, standard deviation and CV of the population were 253.0 lb, 254 lb, 32.8 lb, and 13.0%, respectively.Datasets B and C were obtained for the purposes of this experiment.Dataset B was obtained from a commercial finishing site in northern Iowa.Pigs (PIC C42 × PIC 359) weighed for Dataset B were from a single barn that was classified as healthy by the attending veterinarian.The barn was filled with pigs over a 1-wk period, and pigs were gate cut as they came off the truck to randomly place them in pens.Dataset B (Figure 2) contained a total of 1,261 pigs weighed (population mean = 213.5 lb, median = 214 lb, standard deviation = 21.5 lb, and CV = 10.1%) and housed in 19 pens with 56 to 81 pigs per pen.Dataset C was derived from a different commercial site in northern Iowa that consisted of pigs (Genetiporc F25 × G performer boar) that were weaned during a porcine reproductive and respiratory syndrome (PRRS) outbreak at the sow farm.The barn was filled with pigs over a 1-wk period, and pigs were gate cut as they came off the truck.Dataset C (Figure 3) comprised a total of 1,069 pigs weighed (population mean = 222.4lb, median = 224 lb, standard deviation = 32.0 lb, and CV = 14.4%) from 40 pens with 20 to 35 pigs per pen.
A program was coded using R (R Foundation for Statistical Computing, Vienna, Austria) to demonstrate the error associated with varying sampling methods when estimating the mean weight of the population.For the first sampling method, the program was designed to take a completely random sample of the designated sample size, disregarding pen arrangements, and calculate the mean of this sample.The program conducted this sampling technique 10,000 times, generating 10,000 sample means.The 10,000 sample means were sorted from least to greatest, and a 95% confidence interval (CI) was generated by selecting the 9,751 st observation (upper CI) and the 250 th observation (lower CI).The distances between the upper and lower CI represent the range of the mean estimations.A similar analysis was conducted using R for the remaining sampling methods.For sampling methods 3 and 4, marketers provided by Suidae Health and Production, Algona, IA, were used to select the heaviest and lightest pigs in each pen.One marketer, marketer 1, was provided for Dataset B and two marketers, marketers 2 and 3, were provided for Dataset C. The percentages of accurately selected pigs for each dataset are presented in Table 1.Selection accuracy was incorporated into sampling methods 3 and 4 for Dataset A based on the selection accuracy of the 2 marketers from Dataset C. The probability for selecting the 1 st , 2 nd , 3 rd , 4 th , or 5 th heaviest pig was 50, 25, 15, 5, and 5%, respectively, and the probability for selecting the 1 st , 2 nd , 3 rd , 4 th , or 5 th lightest pig was 70, 15, 5, 5, and 5%, respectively.These were chosen because dataset A and C had similar pen arrangements.To account for selection accuracy in the simulations, a rank was assigned to the heaviest and lightest pig selected by the marketer in each pen.Next, these were combined into a list for both groups of selected pigs, the heaviest and lightest pigs.For each pen selected, a rank was randomly selected; therefore, for Dataset A, if the 1 st pen randomly selected were pen 8, one pig selected from pen 8 would have a 50, 25, 15, 5, and 5% chance of being either the 1 st , 2 nd , 3 rd , 4 th , or 5 th heaviest pig, and the other pig selected would have a 70, 15, 5, 5, and 5% chance of being either the 1 st , 2 nd , 3 rd , 4 th , or 5 th lightest pig, respectively.

Results and Discussion
Notably, random samples were generated using a computer program and samples taken from the barn are not truly random unless pigs are individually identified and preselected, rather than selected by the marketer.
When asked to identify the heaviest pig in the pen, marketers 1, 2, and 3 identified the heaviest pig in 47.4, 43.5, and 55.0% of the pens and the 2 nd heaviest pig in 5.3, 35.0, and 25.0% of the pens, respectively (Figures 2, 3, and 4; Table 1).The pigs identified by marketers 1, 2, and 3 were within the actual 5 heaviest pigs in 68, 100, and 95% of the pens, respectively.When asked to select the lightest pig, marketers 1, 2, and 3 identified the lightest pig in 57.9, 75.0, and 68.4% of the pens and the 2 nd lightest pig in 21.1, 17.5, and 10.5% of the pens, respectively (Figures 2, 3, and 4; Table 1).The pigs identified by marketers 1, 2, and 3 were within the actual 5 heaviest pigs in 79.5, 100, and 100% of the pens, respectively.
When taking a completely random sample of 30 pigs from datasets A, B, and C, the range between the upper and lower CI was 23.0, 15.0, and 22.5 lb, respectively.For Datasets A and C, when sampling 15 pigs from 2 pens, the estimated range between the upper and lower CI was 32.0 and 47.8 lb, respectively, but when sampling 1 pig from 30 pens the ranges between the upper and lower CI were 23.1 and 20.3 lb, respec-tively (Table 2).For Dataset B, when sampling 30 pigs from 1 pen, the estimated range between the upper and lower CI was 38.3 lb, but when sampling 2 pigs from 15 pens, the range between the upper and lower CI was 14.8 lb; therefore, increasing the number of pens used to sample 30 pigs can improve the range between the upper and lower CI by 28, 61, and 58% in Datasets A, B, and C, respectively.
Selecting the heaviest and lightest pigs in 15 pens and taking the mean of the sample resulted in a reduction of the range between the upper and lower CI from 31 to 53%, but because specific pigs were selected, bias was introduced into the sampling procedure.This bias resulted in increased systematic error or reduced accuracy, with the mean of the 10,000 simulations being less than the actual mean of the perspective population.When pigs were selected based on the estimated selection (Dataset A), marketer 1 (Dataset B), marketer 2 (Dataset C), and marketer 3 (Dataset C), the means of the 10,000 simulations were 245.0, 207.7, 219.8, 221.8, respectively, whereas the actual means of Datasets A, B, and C were 253.0, 213.5, 222.4 lb, respectively.The deviation in accuracy of the mean can be influenced by the shape of the population distribution and the accuracy of the marketer when selecting both the heaviest and lightest pigs.Taking the median of the selected pigs did not further improve the range between the upper and lower 95% CI.
Sample size, method, variation, and distribution of pigs within a barn can substantially affect the precision of estimating the mean weight of all pigs in the barn.It is important for producers to take this into consideration when weighing pigs prior to topping to make marketing decisions.Calculating the mean of the selected heaviest and lightest pigs in each pen can improve the precision of estimating the mean; however, adjustments to the sampling method need to be determined to improve its accuracy.A total of 1,260 pigs were used (mean = 253.0lb, median = 254 lb, standard deviation = 32.8lb, and CV = 12.98%) with 23 to 28 pigs per pen and a total of 48 pens. 2 30 pigs were randomly selected from the barn. 3The number of random pigs selected from the number of randomly selected pens. 4Selecting the heaviest and lightest pig (determined visually) from 15 pens and calculating the mean from those two pigs.The 15 pen means were averaged to obtain an estimated weight of the barn. 5Selecting the heaviest and lightest pig (determined visually) from 15 pens and calculating the median from those pigs Effect of Sampling Method on the Accuracy and Precision of Estimating the Mean Pig Weight of the Population 1,2 C. B. Paulk, G. L. Highland 3 , M. D. Tokach, J. L. Nelssen, S. S. Dritz 4 , R. D. Goodband, J. M. DeRouchey, and K. D. Haydon 5

Figure 3 .
Figure 3. Histogram of Dataset C and marketer 2's selections.A total of 1,069 pigs were weighed (population mean = 222.4lb, median = 224 lb, standard deviation = 32.0 lb, and CV = 14.4%) with 40 pens and 20 to 35 pigs per pen.The marketer selected the heaviest and lightest pig in each pen.The 2 histograms of the marketer's selections are imposed on top of the population histogram.

Figure 4 .
Figure 4. Histogram of Dataset C and marketer 3's selections.A total of 1,069 pigs were weighed (population mean = 222.4lb, median = 224 lb, standard deviation = 32.0 lb, and CV = 14.4%) with 40 pens and 20 to 35 pigs per pen.The marketer selected the heaviest and lightest pig in each pen.The 2 histograms of the marketer's selections are imposed on top of the population histogram.

Table 1 .
The percentage of the selected pigs as the actual n heaviest or lightest pig 1

Table 2 .
The resulting mean, upper 95% confidence interval (CI), lower 95% CI, and range for the various sampling methods to give a total sample size of 30 pigs

Table 2 .
The resulting mean, upper 95% confidence interval (CI), lower 95% CI, and range for the various sampling methods to give a total sample size of 30 pigs