Unobserved Components Model for Forecasting Sugarcane Production in Sri Lanka

Unobserved Components Model (UCM) is a structural time series model and it can decompose the response series into latent components, such as trend, cycle and seasonal effects and linear and nonlinear regression effects. The UCM combines the capabilities of Autoregressive Integrated Moving Average (ARIMA) model with interpretability of smoothing models. This study was carried out to forecast sugarcane production in Sri Lanka using UCM model. The best fitting model was selected based on Akaike information criterion (AIC) and Bayesian Information Criterion (BIC), followed by residual analysis. The selected model was used to make sample period forecasts (From 1979 - 2013) and post sample period forecasts (From 2014 to 2018). Forecasting accuracy of model was evaluated using the Mean Absolute Percentage Error (MAPE). Linear trend model (adj. R2=77 %) with zero variance slope and two cycles was selected as best among the tested UCM models for cane production data. MAPE was 10.56 % for sample period forecasts and 4.01 % for post-sample period forecasts. Predicted cane production for year 2019 was 813,888  293,891 tons.


ABSTRACT
Unobserved Components Model (UCM) is a structural time series model and it can decompose the response series into latent components, such as trend, cycle and seasonal effects and linear and nonlinear regression effects. The UCM combines the capabilities of Autoregressive Integrated Moving Average (ARIMA) model with interpretability of smoothing models. This study was carried out to forecast sugarcane production in Sri Lanka using UCM model. The best fitting model was selected based on Akaike information criterion (AIC) and Bayesian Information Criterion (BIC), followed by residual analysis. The selected model was used to make sample period forecasts (From 1979(From -2013 and post sample period forecasts (From 2014(From to 2018. Forecasting accuracy of model was evaluated using the Mean Absolute Percentage Error (MAPE). Linear trend model (adj. R 2 =77 %) with zero variance slope and two cycles was selected as best among the tested UCM models for cane production data. MAPE was 10.56 % for sample period forecasts and 4.01 % for post-sample period forecasts. Predicted cane production for year 2019 was 813,888  293,891 tons.

INTRODUCTION
National level agricultural production forecasts are important for making policy decisions.
Therefore, yield-forecasting systems are available for major crops such as coconut, paddy, rubber, and tea in Sri Lanka. However, no such attempts have been reported for forecasting national level sugarcane production in Sri Lanka. Therefore, in this paper, an effort has been made to forecast sugarcane production by using time series analysis.
Time series forecasting refers to the use of statistical models to forecast future events based on known past events. Autoregressive Integrated Moving Average (ARIMA) time series methodology is widely used to model time series in agricultural production. The main limitation of the ARIMA approach is that it can be applied in the situations where either the series to be analysed is stationary, or it can be differenced into a stationary process (Koopman and Ooms, 2010). Also for small data sets, correlogram and partial auto-correlation functions produced by the ARIMA model are less informative resulting in inappropriate model specifications and predictions (Brintha et al., 2014). Moreover, ARIMA methodology is empirical in nature and fails to explain the underlining mechanism (Brintha et al., 2014). Unobserved component models (UCM) can be used as an alternative approach to overcome these problems (Harvey, 1996;Koopman and Ooms, 2010). UCM model analyses and forecasts time series data by breaking down the response series into latent components that are useful in explaining and predicting its behavior such as trends, seasonal factors, cycles, and regression effects due to the predictor series. The UCM combines the skillfulness of the ARIMA model with interpretability of the smoothing model (Koopman and Harvey, 2003).
Some studies have been reported on effective utilization of UCM for forecasting agricultural production. Ravichandran and Muthuraman (2006) forecasted rice production of India by using the UCM approach. Brintha et al. (2014) employed the UCM methodology to forecast annual coconut production of Sri Lanka. Rajarathinam et al. (2016) found that the UCM can effectively model the wheat production of India. Singh et al. (2014) utilized UCM to forecast gram production of India, and discussed the relative merits of structural models compared to ARIMA. Sugarcane production is influenced by cultivated area, environmental factors, technological advancements, changes in management, as well as yield variations due to 6-8 year ratooning cycles, and these underlining factors are not consistent over time. Therefore, structural models that have the capability of capturing latent components would be suitable to model the cane production data. The aim of this study was to model and forecast the annual national sugarcane production in Sri Lanka using UCM.

Data used in the study
Annual sugarcane production data from 1979 to 2018 available in the Sugarcane Research Institute and Central Bank Reports of Sri Lanka were used for the study.

UCM
Basic UCM is consisted of trend, cycle, seasonal and irregular components, and specified of the form (Harvey and Stock, 1993).

Estimating trend effect
Trend effect of the UCM can be modeled as Local liner model (LLM) i.e. Random Walk model (RWM) or Locally Linear Time (LLT) trend (Harvey, 2001). (2009)

Harvey and Koopman
In the local linear trend model, the trend can be modelled as a stochastic component with varying level ( ) and slope ( ). The LLT model can be described by the following equations. Level: The disturbances  and  are assumed to be mutually independent. If   2 is set to zero then the resulting model has a smooth trend.
If variance   2 set equal to zero, then the resulting model has a linear trend with fixed slope. If both variances are zero, then the resulting model has a deterministic linear time trend (equation 5).
Estimating cyclic effect Cyclic effects are similar to seasonal effects but the period is not known and determined from the data. A periodic pattern can be expressed as a sum of cycles. Cyclical fluctuations with time  with frequency  are measured in radians. Period of a cycle is defined as the time taken to go through its complete sequence of values, and it is equal to 2  ⁄ . Depending on two parameters ( & ), cyclical fluctuations can be expressed as a mixture of sine and cosine waves as given in equations 6 and 7 (Harvey and Stock, 1993) Where, √  2 +  2 and arc tan ( / ) represent the amplitude and phase responsively. Following Harvey (1996), cycles can be made stochastic by allowing the parameters  and  to evolve over time. [ Here the correlation coefficient  is the damping factor, where 01. The  and  * are uncorrelated white noise disturbance terms with zero mean and common variance  2 . This results in a damped stochastic cycle that has time varying amplitude and phase. The parameters of the cycle component are the damping factor , the frequency , and the variance  2 of the disturbance term . When the damping factor is less than one,  is stationary. If  is equal to zero or  the model reduces to the first order auto regressive process.

Assessing the model fit
The best fitting model is selected based on Akaike information criterion (AIC) and Bayesian information criterion (BIC) as shown in equations 9 and 10. AIC = −2 log L + 2n (9) BIC = −2 log L + n log T Where, L denotes full likelihood value of the fitted model. The n is the number of freeparameters that are estimated in the chosen model. T is the total number of observations used to estimate the candidate model. These two criteria are useful for discriminating among various competing UCM models. The model that minimizes these two measures was selected as the best fitting model.

Residual analysis
Residual analysis is important to check the model adequacy with independently and identically distributed residuals. The residual diagnostic plots were used to check the normality (histogram and QQ plot) and the whiteness (ACF and PACF) of the residuals.

Forecasting and accuracy checking
After verification of the assumptions of the residuals, the selected model was used to make sample period forecasts (From 1979(From -2013  After evaluating the model accuracy, forecasts were updated by using the full data set (From 1979(From -2018. Final conclusions on the forecasts were made based on the full dataset.

RESULTS AND DISCUSSION
The time series plot of sugarcane production from 1979 to 2018 is depicted in Figure 1. Up to mid-1980's the cane production varied between 200,000 t to 330,000 t with the contribution of the Kantale and Hingurana sugar industries. In mid-1980's the sugar industry was expanded with the establishment of two new sugar industries, Pelwatte and Sevanagala. Therefore, cane production was increased gradually during the period 1986 to1996 with the expansion of cultivated extent due to contribution of four sugar industries. However, downfall of sugar industry was observed after its restructuring during early 1990's due to closure of sugar mills in Kantale in early 1992 and in Hingurana in 1997. Therefore, cane production gradually declined from 1998 to 2011. With restarting of Hingurana sugar industry, cane production showed an upward trend from 2012.

Figure 1: Time series plot of national cane production in Sri Lanka during 1979-2018.
After observing the general pattern, UCM technique was used to detect all possible time series components (level, slope, cycle, and irregular component) in the sugarcane production.
At the first stage, analysis was aimed to detect the existing time varying components in the data. The error variances of the irregular, level slope and cyclic components were called "free parameters" in the model and their estimates are given in the Table 1. These estimates and their corresponding tvalues were used to test the null hypothesis that the corresponding component is nonstochastic.  The results revealed that disturbance variance of the level and slope components were not significant. The slope has recorded the highest p value in the free parameter (Table 1). This suggests that the slope is not time varying and should be made deterministic. Therefore, slope can be treated as a constant, i.e. has a zero variance. Dampening coefficients and frequency of cycles are highly significant; therefore, cyclic term is contributing to the model and should be retained.
At the second stage, the significance analysis of components was carried out and it was used to decide the component retention in the model by testing the following hypothesis using the Chi-square statistics.
H0 : Considered component is not significant H1: Considered component is significant Results of the analysis are given in the table 2 and it revealed that, level and cycles are significant. Therefore, these components should be retained in the model. The irregular component was also retained in the model as a matter of principle since it is a stochastic component. Brintha et al. (2014) employed UCM to analyse coconut production in Sri Lanka and found that the level and slope components have non-stochastic processes. Similar results were reported in their study regarding significance of level and slope components. Rajarathnam et al. analysed area, production and productivity of wheat crops in India using the UCM and they found that UCM with slope variance zero was suitable for forecasting wheat production.
At the next step, data set was re-analyzed by making the slope deterministic. Lower accuracy measures (AIC=977.96 and BIC 991.06), and residual analysis suggested the linear trend model with zero variance slope and two cycles is the best among the tested UCM for cane production time series of Sri Lanka. The likelihood algorithm converged after 28 iterations. Selected model reported an adjusted R 2 value equal to 0.77, which can be considered a good fit.  Summary of cycles is presented in Table 3. The estimated periods of cycles are 8.6 and 31.09 for cycle 1 and the estimated damping factor is equal to 1, implying that the periodic pattern reflected by the cycle 1 is persistent as shown in Figure 2. The damping factor equal to one for the cycle 2 shows that the effect of shock persists in the data. However, it was not possible to detect such persistence from the cyclic pattern depicted by the second cycle ( figure 3), since, it represent only one cycle due to the 31 years long cyclic period (nearly two third fraction of the full data set).  Smooth trend for cane production is depicted in Figure 4. Predictions for cane production show increasing trends. Expected increase may be due to expansion of cultivated area, and higher yields due to new high yielding sugarcane varieties, and other technological advances.

Residual analysis
Panel of residual diagnostic plots for the fitted model is shown in Figure 5. Histogram and quantile plots agree with normality of residuals and it was confirmed by the nonsignificance with Anderson Darling normality test (p=0.9) and Shapiro-Wilks test (p>0.1). ACF and PACF plots also do not exhibit violations of the whiteness assumptions; the correlations at all non-zero lags seem to be non-significant. Actual and predicted values of cane production showed that predicted values are closer to actuals implying the accuracy of the model ( Figure 6). Table 4 presents the forecasted values of the cane production and confidence intervals based on the selected UCM model.