A Simulation Study for Performance Comparison between Generalized Linear Mixed Modeling (GLMM) and Generalized Regression Neural Networks (GRNN) in Joint Modeling of Mixed Responses

Joint modeling of mixed responses has become a popular research area due to its applicability in many disciplines. When there is an association between two responses, a joint model will provide improved results than modeling the responses separately. In this study, the joint modeling of survival and count variables was carried out using Generalized Linear Mixed Modeling (GLMM) and Generalized Regression Neural Network (GRNN) to compare their performances under the setting of clustered data. A joint model of survival and count variables that was developed by joining the Discrete Time Hazard Model (DTHM) and Poisson Regression model was used in this study as the GLMM model. A simulation study was carried out under three different sample sizes; n = 20, 100, and 500, and for three levels of correlations between two responses: low (r = 0.30), moderate low (r = 0.30), and high (r = 0.30). The root mean square error, absolute mean error and correlation coefficient between actual and predicted response data were calculated to compare the performances of GLMM and GRNN models. The results revealed that the GRNN has a better fit in general, but under large sample sizes and high correlations between response variables, GLMM outperformed the GRNN. Application of these methods to a data set from poultry industry further confirmed the suitability of Generalized Regression Neural Network fit for joint modeling of real-world data.


INTRODUCTION
Joint modeling of data with mixed responses has become a popular research area in recent times. Many researchers have used joint modeling techniques in various disciplines (Rizopoulos, 2012;Gardiner, 2013;Elashoff et al., 2016;Hapugoda and Sooriyarachchi, 2017a;2017b;2018). When there is an association between the two responses, a joint model will provide interesting and improved results than modeling the responses separately.
In the literature, many approaches of joint modeling could be identified (Rizopoulos, 2012;Gardiner, 2013). Two such methods include mixed modeling approach and artificial neural network approach (Hapugoda and Sooriyarachchi, 2017b). It will be beneficial to be aware of the performances of these approaches in joint modeling, while concluding the better fit of the models to the data.
Many researchers have been carried out on comparative studies to identify the performances of different models using different techniques (Comrie, 1997;Yay and Akinci, 2009;Hapugoda and Sooriyarachchi, 2017a). Survival time data (i.e. the duration of time between the initial point up to the occurrence of a particular event) and count data (i.e. the number of observations) are some of the commonly recorded data in many fields of studies. In many cases these two variables can be correlated (Hapugoda and Sooriyarachchi, 2017b). In addition, the hierarchical nature (i.e. in the form of clusters) of data could also be seen in many fields of studies. If a set of data consists of all the abovementioned forms, joint modeling of these two variables can be carried out considering the clustering effect. In that case, it is also better to identify which approach can be utilized to obtain more accurate results.
Thus, the objective of this study was to compare the performances of the models obtained by using the mixed modeling approach and artificial neural network approach for such data. A simulation study was carried out for this purpose. This concept was then applied to some data from the poultry industry of Sri Lanka.
The overview of the paper is as follows: the methodologies of the two approaches used in this study are explained in the 'Methodology' section, the details of the simulation study carried out are explained in the 'Simulation study' section, and the application of the concept to the poultry industry is described with a discussion in the 'Results and Discussion' section.

Generalized Linear Mixed Modeling Approach
A joint model of survival and count was developed by joining the Discrete Time Hazard Model (DTHM) and Poisson Regression model by the same authors of this article (Hapugoda and Sooriyarachchi, 2017a;2017b;2018). The DTHM was used to convert continuous survival times with the censoring information combined to it into a binary response variable with time indicators as explanatory variables. The underlying theory of the joint model is as follows: The responses of analysis were: Yij1 (binarysurvived/ not survived status of the batch of eggs) and Yij2 (count -number of eggs), where i stands for the batch of breeders. Here, the hatchability occurs in time interval j for the i th batch, given that the eggs from i th batch survived in interval j-1 (hazard at time Tj), and xij are covariates at the batch level i where Tij contains the unit-coded time indicators for the time periods assessed in the study (for an observation at time j, Tij = 1 and for a time k where k ≠ j, Tk = 0). Variables that impact Y = (Y1, Y2) are the explanatory variables denoted by Xij (i=1: N). To formulate a joint model, Generalized Linear Mixed Models (GLMM) can be used to form marginal models for each response by considering mean E(Yl/Xij) and variance Var(Yl/Xij) where l=1,2. The approach to link the responses is by structuring a covariance matrix Var(Yl/Xij) to include potential correlations.
In GLM, lk(E(Yijk/Xij)) = Xijk'βk , k=1,2 and lk is the link function. Here, l1(u) is the logit link and l2(u) is the log link. A structural formulation of the model is given as: and l2(y'ij2)=log (µi 2) = log ( ) + 0 + where 0 = 0+ u0j and u0j ~ N(0,σu 2 ) For simplicity, we assume that both random effects are the same (u0j) and have the same variance (σu 2 ). The joint model variance matrix is as follows: | 51 GLIMMIX will structure the variance matrix of Y = (Y1, Y2) as follows: where Ri is a user specified 2x2 covariance structure and Ai is the diagonal matrix of the variances of (Y1, Y2).

Neural Network Approach
Artificial Neural Networks (ANNs) technique is widely used as an alternative approach for conventional statistical techniques, due to its remarkable ability to derive meaning from complicated data by extracting patterns and detecting trends that are too complex to be noticed by either humans or other computer intensive techniques (Sehrawat et al., 2015) Among the different types of neural network techniques present, it is vital to identify the most appropriate technique to model the data under consideration. As the response variables are continuous (as the original continuous survival variable was considered in neural network) and discrete (count), the ANN technique that can address such variables should be chosen.
Generalized Regression Neural Network (GRNN) approach was examined to study whether it is a better approach to model survival and count variables together.
GRNN was proposed by Donald F. Specht in 1990 and falls into the category of probabilistic neural networks (Specht, 1990). The probability density function used in GRNN is the Normal Distribution. For each training sample, Xi is used as the mean of a Normal Distribution for n number of samples. The distance, Di, between the training sample and the point of prediction, is used as a measure of how well each training sample can represent the position of prediction, X. (See equations (5) and (6)). (5) The smoothness parameter () is the network parameter of this procedure (i.e. the parameter of the Gaussian curves). It was suggested to use the holdout method to select a good value of  (Specht, 1990).
The method to obtain the best neural network is based on selecting the best parameters and best input combination. It was identified that the tenfold stratified cross validation method is a better approach to select the best model in terms of the input combination (Kohavi, 1995;Cui et al., 2008). Hence, the same method has been used in this study.

Performance Comparison
Measuring performance of the models is required to compare the two models and conclude the best model. There are several statistical measures which are used to measure the model performance.
According to literature, the measures such as root mean square error (RMSE), absolute mean error (AME) and correlation coefficient between actual response data with predicted response data (R) to compare the performances of two or more models, when at least one model under consideration is based on neural network approach (Yay and Akinci, 2009;Hapugoda et al., 2013;Hapugoda and Sooriyarachchi, 2017a). Equations (7) to (9)

Simulation Study
To assess the performance of the two models, a simulation study was carried out and the details are presented in this section.
A joint model should be developed when there exists any kind of association/ correlation between the response variables. A correlated survival time and count variables were simulated using correlated Exponential and Poisson variables based on Multivariate Normal distribution (Smart, 2015). Later the continuous survival data was categorized in a biologically plausible way. Here, to represent the cluster effect in the simulated data, each dataset was divided into three equal sections to represent three clusters. The data generation for this simulation study was based on the previous work by the same authors (Hapugoda and Sooriyarachchi, 2018).
In different scenarios, the correlation among the two response variables can be low, medium, or high. However, to generalize the developed model for any situation, the simulation study was carried out under these three cases for three different sample sizes (n = 20, 100, and 500).
The correlations considered were, r = 0.3, 0.5 and 0.75 for low, moderate, and high correlations respectively. The significances of the correlation coefficients with respect to the sample size were tested at 5% significance level, which reveals significance in all cases. Under each case, 1000 simulated data sets were generated using R software. PROC GLIMMIX procedure in SAS software was used to model the data using mixed modeling approach while Matlab software was used to model the data using a neural network approach.
RMSE, MAE and correlation coefficient (R) between actual response data with predicted response data were recorded for both models in each dataset. Here, the model based on generalized linear mixed modelling approach was named as GLMM model and the model based on the neural network approach was named as GRNN model. The results of these are presented in Tables 1 and 2.  Given high correlations between the responses for all sample sizes, GRNN model recorded a better performance in significantly higher number of times. However, when the sample size is largest (500) and correlation is taken to be high, GRNN performed worse than the GLMM a few times. These few cases resulted in extremely large values of RMSE and MAE for GRNN as the mean was larger for these statistics. This shows that under this scenario the GRNN is not dependable.
It can be seen that the GRNN model has a better fit than the GLMM model for cases when sample size is small or moderate, especially when the correlation is high or moderate. For the two larger sample sizes, both models were comparable for a small correlation. However, when the sample size was large, and the correlation was high the GLMM model outperformed the GRNN model according to the RMSE and MAE. The difference in R under this scenario was negligible.
It can be understood that the GRNN model has a better fit than the GLMM model in most of the cases. Though, the differences of mean values of RMSE, MAE and R were notably diverse in the small sample size (20) (in Table 1), both models were comparable for the two larger sample sizes when correlation was small or moderate. But the strength of the GLMM model is indicated for large samples and high correlation. One may argue that this is the situation in which a joint model would really be more efficient than univariate models.

Application to Poultry Data
To perform the comparison study in a real data set, the following analysis was carried out. This work is an extension to previous study by the same authors, based on the data in poultry industry (Hapugoda and Sooriyarachchi, 2017a). The data set consisted of 632 observations.
The data from a broiler grandparent farm from Sri Lanka were available for nine batches of breeders maintained over a two-and half-year period. The variables were breeder age, fertility rate, length of storage period for eggs, the number of hatching eggs and the number of non-hatching eggs for each batch at different time points. The description on the variables of the data are shown in table 3.
In the poultry industry, the accurate prediction of 'the number of chicks that can be obtained by hatching a known number of eggs' (number of hatching eggs) is the most important factor for the economic success of a breeder operation. This prediction is dependent on breeder factors such as breeder age and fertility rate (Reis et al., 1997;Lapao et al., 1999;Tona et al., 2002;Petek and Dikmen, 2006;Hapugoda et al., 2013).
Though the storage of hatching eggs is a necessary part of the commercial incubation, dramatic decline in the number of hatching eggs is observed in broiler breeder eggs stored for extended periods and there is an interaction between length of storage and breeder age (Reis et al., 1997;Lapao et al., 1999;Petek and Dikmen, 2006). In practice, the eggs are hatched based on the demand (selling of one-day-old chicks is the objective of breeder operation). Here, the eggs from older breeders with less fertility are given the priority, irrespective of the length of storage (in days). This is due to the practice of the farm from which the data were obtained, based on the assumption that the hatchability is dependent on breeder age.

Fitting a GLMM Model
If a joint model could be developed to estimate the number of hatching eggs with their length of storage considering breeder factors such as age and fertility rate, it will be beneficial for the success of the operation of poultry industry. Such a model has been developed in a previous study using generalized linear mixed modeling approach by the same authors (Hapugoda and Sooriayarachchi, 2017a). The Spearman correlation coefficient between these 2 variables is 0.60 which indicates that a joint model is more plausible than 2 univariate models. The purpose of this analysis is to compare the performance of the GLMM model with the model developed using neural network.

Fitting a GRNN Model
To achieve this purpose, a GRNN model was fitted to the data in the following manner. The best model could be obtained when the best σ value was considered, in which the sum of square error was minimized. In this analysis, the best σ value was recorded as 0.5.
For the analysis, the dataset was split to two categories, as training set (90% of data) and test set (10% of data). The ten-fold stratified cross validation method was used to identify the best division of data and this was carried out by using different columns of the input matrix to include in the training set and test set. For example, the first combination was 1 to 9 columns of input matrix as the training set while 10 th column was taken as the test set. Likewise, 10 different models were considered to choose the best division of the data. Thus, for different sets of training data and test data divisions used in this analysis, the plots of σ values with their corresponding sum of square error (SSE) values were obtained. The best model was selected by considering the best σ values identified, which generated the smallest sum of squares value. Here, the best division of data was identified as; taking the 1 st column of input matrix as test set, while the rest of the data were considered as the training set. The recorded SSE value for this combination was 38.01.
The obtained neural network consisted of 4 input nodes, 2 output nodes and 2 hidden layers, where the two hidden layers consisted of 568 nodes and 2 nodes, respectively.

Performance Comparison
RMSE, MAE and correlation coefficient between actual response data with predicted response data were recorded for both models for the data. Here again, the model based on generalized linear mixed modelling approach was named as GLMM model and the model based on neural network approach was named as GRNN model. The results of these are presented in table 4. Due to the better fit of the GLMM under ideal conditions and due to the black box nature of the neural networks, Statisticians often prefer to use GLMMs over GRNNs, though Computer Scientists and Data Scientists may have different views. This is also because, interpretations on the effects of variables cannot be carried out using a GRNN model, though it can be found out which variable is more important in defining the model output. The parameter estimates of the final GLMM model can be interpreted using the parameter estimates of the model obtained.

CONCLUSION
This study compared the performance between GLMM and GRNN in joint modeling of mixed responses. A simulation study was carried out to assess the performance using measures of RMSE, MAE and correlation coefficient between actual response data with predicted response data. This simulation study revealed better fit of GRNN compared to GLMM for small to moderate samples under all conditions and for large samples under low and moderate correlation. However, for the ideal case of joint modeling, that is when the samples are large, and the correlation is high the GLMM model outperformed the GRNN model. This could possibly be due to the method of estimation in GLMM which is Maximum Likelihood via Laplace approximation. Maximum Likelihood Estimates are asymptotically unbiased and asymptotically Normally distributed. Regarding high correlation, the bivariate model would really gain maximum superiority over univariate models only when the correlation between responses are high.
These methods were applied to a practical data set having a large sample size and moderate correlation. As expected, based on the simulations, this exercise revealed better fit of GRNN model, under the tested scenario.