Weighted Modelling and Forecasting of Cocoa Production in Ghana: A Multivariate Approach

In this paper, we develop models for forecasting the annual cocoa production in Ghana. Instead of using the ‘best’ model for forecasting; a weighted scheme was applied to all competing models, to obtain a weighted model. The weighted scheme used in this paper is the weighted ranking procedure. Annual production, export earnings, exchange rate and domestic processing of cocoa data from 1970 to 2012 from Ghana were used for this study. Forecast accuracy measured from the weighted vector error correction model (VECM) and that of the “best” vector error correction model was used to validate the model. The forecast value from the weighted forecast approach performed better than that of the “best” model. The weighted predicted values were regressed on the real production values to show whether the weighted VECM was adequate to explain the variations in the annual cocoa production. The adjusted R 2 was 0.952 indicating that, the weighted VECM model explained 95.2% of the annual production variability. Hence, the weighted vector error correction model is a better statistical technique in forecasting cocoa production in Ghana.


INTRODUCTION
Cocoa is one of the most important crops in the economy of Ghana. It contributed about 3.4% to total gross domestic product annually and an average of 29% to total annual export revenue between 1990and 1999(Anonymous, 2001. In terms of employment, the cocoa sector employs about 60% of the national agricultural labour force in the country. In volume of production, Ghana is reported to be the second largest cocoa producer in the world, accounting for about 21% of the total production (International Cocoa Organization, ICCO, 2006).
Production levels of Ghana's cocoa have consistently declined from 568,000 (Mt) in 1965 to its lowest level of 160,000 (Mt) in 1983, (Abekoe et al., 2002). Since the mid-1980s, production levels have risen gradually to an average of 400,000 (Mt) during the late 1990's (Abekoe et al., 2002), which still is relatively less than the production levels attained in the mid-1960s. Generally, productivity of cocoa (yield per hectare) in the country is among the lowest in the world (ICCO, 2005). The highest productivity of cocoa is Malaysia (1800 kg ha -1 ) followed by Ivory Coast (800 kg ha -I ) while it is 360 kg ha -I in Ghana (Abekoe et al., 2002). Thus, a study that will be able to forecast cocoa production in Ghana will be very useful to policy making and decisions.
In literature, there are several econometric models that have been developed for the Ghanaian Cocoa sector since the 1960's (Bulir, 1998). However, all these studies are for the cocoa supply function (Bulir, 2002). None of these researchers has developed a model to show the entire Ghanaian cocoa sector. Bulir (1998) made these interesting remarks about these models. He reported that "Most of the researches to date suffer from the problem associated with the estimation of non-stationary time series and arbitrary choice of lag structures; so, these models have been unable to explain the massive decline in recorded cocoa output".
In time series analysis, one is faced with a challenge of choosing the 'best" model among many candidate models for forecasting. Usually, one has to go through a series of testing to get the "best" model. Our preliminary analysis and available literatures show that, the model preferred by a test or information criterion does not necessarily do better than other competing models in terms of prediction risk. Chatfield (2004) and Hoeting et al. (1999) have used the term 'model uncertainty' to capture the difficulty in identifying the best model. In addressing this challenge, combining forecast was introduced over the past three decades (Bates and Granger, 1969;Clemen, 1989). Various methods have been proposed. Thus, when there is a substantial uncertainty in finding the best model, alternative method, such as combined model should be considered.
Most often, the following weighting schemes have been distinguished: equal weights, Akaike weights, optimized and constrained weights; and Bayesian weights. The weighted scheme used in this paper is the weighted ranking procedure. Economic growth occurs along many dimensions with one single cause often not enough to explain growth (Armah, 2008). Thus, the aim of this study is to forecast cocoa production by considering other influential factors. In this study, we compare the forecast of cocoa production from weighted and single "best" model approaches.

Data source
Annual data of cocoa production, export earnings, exchange rate and domestic processing spanning from 1970 to 2012 were obtained from the Ghana Cocoa Board, Accra.

Error correction model
The error correction model is used when the time series are not stationary and are cointegrated. The concept of configuration is explained below.

Cointegration
In univariate time series models, time series that have a unit root need to be modelled in first differences. In multivariate models, things become more interesting. It is possible for two time series that are non stationary with unit roots to have a linear relationship that produces a stationary disturbance. That is, in a multivariate situation it is possible to remove unit roots without taking differences. It turns out this has important implications for model specification.
Recall that a time series is said to be I (d) if it must be differenced d times to become stationary and invertible. We will restrict our study to I (0) and I (1) time series.

Definition
Two I (1) time series y 1,t and y 2,t are said to be cointegrated if there exists a linear relationship of the form so that the cointegrating relationship is written t t Z y β ′ = , then β is called the cointegrating vector. The cointegrating vector is not unique. Therefore, it is common to choose of the variables to have a coefficient of one in the cointegrating vector, which then uniquely identifies the rest of the vector. This choice of variable is referred to as the normalization of the co-integrating vector.
The cointegrating relationship is often interpreted as being a long run or equilibrium relationship between the variables. Statistically, the idea is that the variables are I (1) and therefore tend to wander randomly over time. However, the cointegrating relationship means there is some relationship from which the variables deviate from only in a stationary manner. In many applications, such statistical relationships are equated with economic equilibrium.
Cointegration can exist in a multivariate time series setting. Suppose If there exists a vector ( ) then y t is cointegrated with cointegrating vector β . However, in multivariate time series it is possible that there is more than one cointegrating vector. These cointegrating vectors are linearly independent, meaning that one is not a linear function of the other. The number of linearly independent cointegrating vectors is called the cointegrating rank. The two common tests to determine the cointegrating rank are the trace and the maximum eigenvalues tests. The hypothesis of the test is H 0 : the number of cointegrating vectors is r, H 1 : the number of cointegrating vectors is (r+1) The two statistics are: Where i λˆis the estimated value for the i th ordered eigenvalue and T is the sample size.

Vector error correction models
The appropriate model for cointegrated time series is called a Vector Error Correction Model (VECM) and is a rearranged restricted form of a VAR. An error correction model is parameterized so that the variables tend to revert back to the equilibrium relationship that is specified by the cointegrating vector. In general, a VAR (p) model is rearranged to give a VECM of the form Note that a VAR of order p translates to a VECM with p -1 lagged differences of y t . A VECM thus consists of a mixture of variables in levels and first difference form. If we applied the univariate modeling strategy of taking first differences of any I (1) time series, and hence fitting a VAR in first differences, the resulting model would be misspecified because of the omitted error correction term. Conversely we cannot use a VAR in levels to model cointegrated time series because the resulting inference in the presence of the nonstationarity would not be valid. In the presence of cointegration, a VECM is required.

Estimation of weighted ranking procedure
The weighted ranking procedure performs better than weighting schemes such as Akaike weight and equal weight. Thus, we considered the weighted ranking procedure in this paper for better forecast.
The basis of the weighted ranking procedure is that, each competing model has the potential of relatively predicting the future value of a series, since the true model is unknown. Thus, we allow each model in the competing set of models to forecast. We therefore, rank each model based on their predictive performance, by ranking the model with the lowest forecast accuracy measure as first and assign the highest rank to that model; and in that order. The weighted ranking procedures are indicated below: 1. Fit a set of competing models to a dataset. The selection criterion of a model into the entire set of competing models is 5 p < , (that is, the lag length of the model should be less than 5). This selection criterion is somewhat subjective, however, its basis is founded on the principle of parsimonious (i.e., model with fewer estimates is desirable).
2. Forecast each model in the entire set of models based on the 'out-of-sample' or 'inof-sample' data. 3. Calculate their respective forecast accuracy measure, e.g., MSFE, MAPE etc. 4. Rank models in the entire set by their forecast accuracy measure. Thus, the lowest forecast accuracy measure model receives the highest rank. 5. Sum the ranks and respectively divide the individual rank by the total of the ranks to get the corresponding model weights.
Thus, we can express the proposed weight as:

Weighted VECM model
Once the weights have been derived, we combine the parameter estimates of the entire set of models by applying their respective model weights. The weighted parameter estimates can be defined as Here, ,ĥi φ denotes the estimator of ˆi φ based on model g h .

RESULTS AND DISCUSSION
Based on the correlation analysis, the exogenous variables for the production variable are export earnings, exchange rate and domestic processing. None of these variables are stationary (Table A.1 and Table A.4). However, these variables became stationary after differencing once.

Test of cointegration
Since these variables are not stationary, an unrestricted cointegration rank test was performed on these variables, in order to know whether they are cointegrated in the long run. The results are reported in Table 1. In Table 1, both the trace statistic and the maximum eigenvalue statistic indicate that there is one cointegration equation at the 5% level of significance. The two conditions for using the vector error correction model are met, thus, a VEC model is fitted to the production variable.

Estimation of weights
As indicated earlier, the selection criterion of a model into the entire set of competing models is 5 p < , (that is, the lag length of the model should be less than 5). Thus, the entire set of competing models will have four VECM, (i.e., VECM (p), where p = 1, 2, 3, 4). We allow each competing model to make a forecast for production; then we rank their performance based on their respective forecast accuracy measure, MAPE. This is illustrated in Table 2. It is obvious that, forecast accuracy measure improves as the lag length of model increases. Several diagnostic testing were performed on each of the four models in Table 2. The various diagnostic tests considered in this section are inverse roots of AR characteristic polynomial, granger causality or block exogeneity Wald tests, normality test and serial correlation. Although, VECM (3) and VECM (4) have lower MAPE but they could not pass other diagnostic tests. Thus, the model VECM (2) was selected as the "best" model according to the conventional method (Table A.3).

Estimation of the weighted model
We multiply the parameter coefficient estimates of the four competing models with their corresponding model weights; these are called the weightage estimates. Thus, we add the weightages across the competing models where they are present and divide by their respective model weights.

Forecast accuracy measure
We derived predicted values for all the observations, that is, from 1970 to 2013. However, we considered observations from 2000 to 2013, for the forecast accuracy measure calculation. Here, we compared the mean absolute percentage error (MAPE) of the combined forecast to that of the 'best' model from the conventional approach. The combined forecast and best model forecast accuracy measures for production are given in Table 3.
The percentage of error of the 'best' model varied from 0.4% to 21.5%; while the percentage error of the combined forecast varied from 0.73% to 20.14% between 2000 and 2012.
However, the combined forecast has an overall minimum forecast accuracy measure, MAPE of 6.6%, which is desirable. The combined forecast value for 2013 is 1,136,353.88, which is relatively higher than the "best" model forecast (i.e., 1,097,086). Thus, combined forecast method which is based on the weighted ranking approach is recommended for forecasting production series in the multivariate modelling.

Combined long run relationship
The cointegration test revealed that there was a single long run relationship between production and export earnings, exchange rate, domestic processing. However, since the cointegration equation is not unique among the VECM models, a weighted cointegration equation from all the VECM models is desirable. We derived the weighted cointegration equation by multiplying the coefficient estimates of each VECM model with their corresponding model weights. The weighted cointegration equation for production is given in Table 4. It is obvious that, the previous domestic processing, exchange rate and export earnings are significantly associated with production in the long run. Thus, the combined long run relationship equation of these variables is given as: In Eviews, each VECM model has two estimates as output: cointegration equation and error correction, with their corresponding R 2 values. Since the above cointegration equation is based on weightage; we therefore produce a weighted adjusted R 2 by multiplying the R 2 adjusted of each VECM model with their respective model weight. Here, domestic processing, exchange rate and export earnings coefficients are all statistically significant at 5%. The adjusted R 2 is 0.753; this means that, in the long run, the model was possible to explain 75.3% of the annual production variability by the variation in exchange rate, domestic processing and export earnings. The partial regression coefficient results suggest that: 1 unit increase in domestic processing leads to 3.576 output decrease in production; again, 1unit increase in export earnings leads to 0.000235 output increase in production; however, 1 unit increase in exchange rate leads to 286791 output decrease in production; given that the other variables are held constant. Thus, the results indicated that, the annual production of cocoa is characterized by the annual exchange rate, export earnings and domestic processing.

Regression model for production
Here, our focus is to construct a regression model using the actual production values against the weighted predicted values of production. The regression model will establish whether the weighted predicted values can explain the actual production values.
Thus, we apply the weights of each model to their corresponding forecast values and then sum the weighted forecast values as the predicted variable. The regression model is given as: Here, the predicted value is significantly associated with the actual production of cocoa. The adjusted R 2 is 0.952; this means that, the model was possible to explain 95.2% of the annual production variability by the variation in the predicted value (which is given by exchange rate, domestic processing and export earnings). Again, the model is statistically significant, since the p-value associated with the F-statistic is 0.00.
The Breusch-Godfrey serial correlation LM Test, [F-statistic = 0.167, p-value = 0.995], failed to reject the null hypothesis that, the residual are uncorrelated; which is a good indication. Again, the heteroscedasticity test: Breusch-Pagan-Godfrey, [F-statistic = 0.00252, p-value = 0.96], suggested that the variance of the residuals are constant, which is good for our model.

CONCLUSION
In this study, the production of cocoa in Ghana was modelled using the weighted ranking procedure and the 'best' model from a multivariate time series approach. It was shown that forecast from the weighted VECM out-performed that of the 'best' VECM. Again, the predicted value of the weighted VECM was possible to explain 95.2% of the annual production variability by the variation in the predicted value. Thus, for accurate forecasting of cocoa production in Ghana, we recommend the use of the weighted ranking procedure.