REFERENCE EVAPOTRANSPIRATION ESTIMATED FROM AIR TEMPERATURE USING THE MARS REGRESSION TECHNIQUE EVAPOTRANSPIRAÇÃO DE REFERÊNCIA ESTIMADA A PARTIR DA TEMPERATURA DO AR USANDO A TÉCNICA DE REGRESSÃO MARS

The reference evapotranspiration (ETo) is an important component for determining the water requirements of the crops. In order to estimate this variable accurately, the Food and Agriculture Organization (FAO) proposed the Penman-Monteith equation, however, this demands a large number of meteorological data, which restricts its use. In this context, this study compares the performance of the Penman-Monteith equation using only measured air temperature (PMT) and the Hargreaves-Samani (HS) equation with the performance of the multivariate adaptive regression splines (MARS) technique for the daily ETo estimation with only air temperature data. For the study, daily meteorological data from 2002 to 2016 were used. The data were collected from weather stations located in FlorianópolisSC, Manaus-AM and Petrolina-PE, being these selected in order to capture different climatic conditions. MARS models were developed for each weather station and the PMT e HS equations were locally calibrated. The performances of the original and calibrated equations and MARS models were evaluated based on the statistical indices root mean square error, mean absolute error, mean bias error and coefficient of determination. The ETo estimated by the Penman-Monteith method with full data was used as reference for the development of the MARS models, calibration of the equations and for the performance evaluation of the models under study. The calibration of the HS and PMT equations promoted better performances in relation to the original equations, improving the methods accuracy. The MARS technique presented good performance, outperforming the original and calibrated PMT and HS equations, with lower error values and higher coefficient of determination, and can be considered as an alternative to empirical methods.


INTRODUCTION
The evapotranspiration is an important component for determining the water requirements of the crops. This phenomenon is also a key factor to control several hydrological processes, planning and management of water resources, irrigation water requirements, among others (WEN et al., 2015;YASSIN et al., 2016).
The crop evapotranspiration is usually determined from the reference evapotranspiration (ET o ). In order to estimate this variable accurately, the Food and Agriculture Organization (FAO) proposed the Penman-Monteith (PM) equation (ALLEN et al., 1998), which can be considered as a standard method and shows to be quite efficient for ET o estimation (JHAJHARIA et al., 2014;GAO et al., 2015). In addition, this method can be used without the need for additional adjustments (PEREIRA et al., 2015).
Despite its good performance, the PM equation demands a large number of meteorological data, as air temperature, solar radiation, wind speed and air relative humidity (ESTÉVEZ et al., 2016;HOBBINS, 2016), which restricts its use, given the lack of this data in many locations (ALMOROX et al., 2015), especially for small farmers. Due to the common unavailability of meteorological data, Allen et al. (1998) suggested alternative procedures in order to allow the use of the PM equation without all the meteorological data required in the normal procedures. Thus, it is possible to use the Penman-Monteith equation even with only measured data of air temperature, being the other variables estimated. This approach, using only air temperature, is usually called Penman-Monteith temperature (PMT).
According to Traore et al. (2010) an ideal method to estimate ET o must be selected with as few input variables as possible, without affecting the accuracy of the estimation. Temperature based models are especially interesting since this variable can be easily and widely measured (MENDICINO;SENATORE, 2013;ALMOROX et al., 2015). Thus, several studies have been conducted aiming the ET o estimation with only temperature data (ALMOROX et al., 2015;MEHDIZADEH et al., 2017).
The well-known Hargreaves-Samani (HS) equation can also be used when only air temperature data are available (Allen et al., 1998). Almorox et al. (2015) reported HS equation as one of the best temperature based equations in global scale. According to Noia et al. (2014) the HS equation can be used for irrigation scheduling.
In addition to the empirical equations, soft computing techniques such as artificial neural networks, support vector machines, genetic algorithms and others can be used (GOCIĆ et al., 2015;FENG et al., 2017). These almost always perform better than traditional methods (SHIRI et al., 2014;FENG et al., 2017;MEHDIZADEH et al., 2017). According to Mehdizadeh et al. (2017) these techniques are valid in modeling of complex and nonlinear problems, such as ET o , Among the soft computing methods, the multivariate adaptive regression splines (MARS) is a regression analysis that presents the possibility to use the developed model in the form of an algebraic equation, unlike models such as neural networks, which require the implementation of a specific software for its use. This method is a nonparametric regression initially proposed by Friedman (1991). It is used to study the nonlinear relation between a response variable and a set of predictor variables, and has been successfully applied in several areas of knowledge (GARCÍA NIETO; ÁLVAREZ ANTÓN, 2014;KOC;BOZDOGAN, 2015;KISI;PARMAR, 2016;DEO et al., 2017).
In this context, this study aimed to compare the performance of the HS and PMT equations with the performance of the MARS technique for the daily reference evapotranspiration estimation using only air temperature data.

MATERIAL AND METHODS
For the study, daily meteorological data from 2002 to 2016 obtained from the National Institute of Meteorology (INMET) of Brazil, available at the Meteorological Database for Teaching and Research (BDMEP), were used. The data were collected from 3 Brazilian weather stations (Figure 1), being these selected in order to capture different climatic conditions. The stations, located at Florianópolis-SC, Manaus-AM and Petrolina-PE, belong respectively to the following Köppen classes: Cfa, Am and Bsh.
The collected data were divided into two parts: development/calibration data (2002 to 2011), used to develop the MARS models and to calibrate the equations under study; and test data (2012 to 2016), used to evaluate the performance of the models. According to Shiri et al. (2015) the use of independent data for performance evaluation makes the results more reliable. The climatic characteristics of the study sites for the development/calibration and test periods are available in Table 1. Data of maximum and minimum air temperature (°C), relative humidity (%), sunshine duration (h) and wind speed (m s -1 ) were used. The wind speed, measured at 10 m height was converted to 2 m and the sunshine duration was used to calculate the solar radiation, according to recommendations proposed by Allen et al. (1998).
The data of days with missing or inconsistent data were eliminated. Data were considered to be inconsistent when it presented minimum temperature greater than the maximum temperature, negative sunshine duration or greater than the maximum possible sunshine duration, negative or greater than 100% relative humidity and negative or greater than 20 m s -1 wind speed at 10 m height. All the evaluated weather stations presented good data quality, with less than 5% of errors.
The daily reference evapotranspiration (ET o ) estimation was performed by the Hargreaves-Samani (HS) (Equation 1) and Penman-Monteith equations using only air temperature (PMT) and by the regression technique multivariate adaptive regression splines (MARS).
In order to calculate ET o using the PMT method the Equation 2 was used, however, to overcome the nonavailability of relative humidity, solar radiation and wind speed data the actual vapour pressure and solar radiation were estimated by the Equations 3 and 4, respectively, and the wind speed was set at 2 m s -1 . All the procedures were performed as recommended by Allen et al. (1998).
The HS and PMT equations were used in its original and locally calibrated form. The calibration was carried out by means of simple linear regression with the ET o estimated by the PM method with full data as benchmark, as suggested by Allen et al. (1998). Thus, a linear regression was fitted with the ET o values estimated by PM as the dependent variable and those estimated by the equation to be calibrated as the independent variable. The obtained intercept (a) and slope (b) were used as calibration parameters, according to the following equation: (5) where: ET ocal -reference evapotranspiration calculated by the calibrated equation, mm d -1 ; a and b -adjustment parameters obtained by linear regression; ET o -reference evapotranspiration calculated by equation to be calibrated, mm d -1 .
The multivariate adaptive regression splines (MARS) is a nonparametric regression analysis capable of determining the relations between input and output variables without any assumption, modeling the nonlinearities and interactions, besides of automatically choose the predictors variables of real importance. In MARS base functions are set at different intervals of the independent variables, the initial and final points of these intervals are called knots (MEHDIZADEH et al., 2017). The operation of the base functions occurs according to the following equations.
The building process of a MARS model occurs in two steps, in the first, an over-fitted model is produced, with a large number of knots. In the second, a pruning technique is used to remove the redundant knots (KISI, 2015), creating a simpler model. More details about MARS can be obtained in Cheng and Cao (2014).
The MARS models were developed using the maximum and minimum air temperature and the extraterrestrial radiation as input variables, the same ones used in the HS and PMT methods. The ET o estimated by PM method with full data was adopted as benchmark. The implementation of this technique was performed using the Python programming language (www.python.org) with the py-earth library (contrib.scikit-learn.org/py-earth).
The performances of the studied models were evaluated having the ET o calculated by the PM method with full data as reference. The root mean square error (RMSE), mean absolute error (MAE), mean bias error (MBE) and coefficient of determination (R 2 ) were used, being calculated according to the following equations.

RESULTS AND DISCUSSION
From the analysis of the graphs presented in the Figure 2 it was observed that the behavior of the methods under study varied among the evaluated weather stations. Unlike the PM method, which can be applied to several types of climates with good results due to its physical basis (BERTI et al., 2014), empirical equations and soft computing models with limited climatic data shows variable performance according to the climatic conditions of the sites where it is used (SHIRI et al., 2014;LIU et al., 2017). In general, these tends to present better performances when employed under the same conditions of its development site.
Corroborating with the behavior variability verified in function of the climatic conditions, the MARS models obtained in this study (Table 2) showed different relationship between the climatic parameters of temperature and extraterrestrial radiation with the ET o in different locations. Thus, the use of only local data leads to obtaining models specific to a particular location.
The analysis of Figure 2 allowed to graphically verify the effect of the calibrations on the PMT and HS equations, which can be noted by the models' trend line position in relation to the 1:1 line. It was possible to notice that after the calibration there was, in general, a better agreement between the values predicted by the PMT and HS in relation to those considered as reference (PM). In addition, it was possible to perceive that the values estimated by the MARS models were, in general, better than those estimated by the original and calibrated PMT and HS equations.
The performance of the PMT and HS equations before and after the local calibration, as well as the performance of the MARS models were also evaluated based on the statistical indices presented in Figure 2.
Evaluating the performance of the original PMT and HS methods for the Florianópolis weather station, it was verified that these presented similar performances. However, the HS method presented a slight tendency to overestimate the ET o , which can be verified by the higher value of MBE in relation to that obtained by the PMT method. After locally calibrated, the methods continued with close performances, but with slight superiority of the HS method. It should be noted that the calibration promoted more significant performance improvements only for the HS method.  In Manaus weather station, a better performance of the original PMT equation than the original HS was observed, being that the last one overestimated the ET o , obtaining MBE equal to 0.47. Considering that in Manaus station predominates high relative humidity (RH), the present finding corroborates with Hargreaves and Allen (2003), who suggested the ET o overestimation by the HS equation in cases of high RH.
After the calibration, the HS method obtained a better performance, with reduction of RMSE and MAE values. However, this method started to present a tendency to underestimate ET o , with MBE equal to -0.36. On the other hand, the PMT method presented worse performance after the calibration, also acquiring the tendency to underestimate ET o . Despite the performance decrement, the calibrated PMT method remained slightly better than the calibrated HS method.
In general, calibration tends to improve the results obtained by a given model (Kisi;Zounemat-Kermani 2014). Thus, the occurrence of a contrary situation may be associated to changes in the climatic conditions between the periods used in the calibration and the test processes. In this sense, analyzing Table 1 it was noted that there was a reduction of 5.6% between the annual mean RH of the period used in the calibration (2002 -2011) and the test (2012 -2016), in addition to a small increasing in the mean annual ET o (0.2 mm d -1 ). Possibly, such changes contributed to the PMT equation performance decrement after the calibration.
Unlike the findings for the Florianópolis station, the MARS model developed for the Manaus station was clearly superior to the original and calibrated PMT and HS equations. This method obtained smaller errors (RMSE and MAE) in relation to the others, being important to emphasize its greater precision, since it presented R 2 equal to 0.73, while the best evaluated equation (PMT) obtained R 2 equal to 0.59.
For the Petrolina station, a better performance for the HS equation than the PMT equation was verified before the calibration. However, after being calibrated these equations started to perform very similar to each other. The calibration was responsible for significantly improve the performance of both methods, with RMSE reduced in 51 and 66% and MAE in 40 and 55% for the PMT and HS methods, respectively. It is also emphasized that before the calibration both methods tended to underestimate the ET o , however after the calibration this tendency was canceled.
Corroborating with the present study, Gavilán et al. (2006), using the HS equation, verified underestimation of ET o in cases with wind speed exceeded 1.5 m s -1 , with more evident underestimations with wind speed greater than 2 m s -1 . Although at the Florianópolis station there is also wind speed similar to that of Petrolina, the fact that this station has a high RH, which in turn may promote an overestimation of ET o , tends to compensate the underestimation trend. In addition, this behavior could be related with the higher proximity to ocean of the Florianópolis station.
The MARS model developed for Petrolina station presented higher performance than the HS and PMT equations even after calibrated, presenting RMSE, MAE, MBE and R 2 equal to 0.63, 0.47, -0.09 and 0.73, respectively.
In the present study, improvements were obtained with the local calibration of the PMT and HS equations, mainly in the Petrolina station. The obtaining of improvements after the calibration of empirical methods was also reported by several authors, such as Patel et al. (2015), Djaman et al. (2016) andBorges Júnior et al. (2017). Thus, calibration is an important tool to improve the performance of ET o estimation models, adapting them to local climatic conditions. The use of the MARS technique promoted good performance in all the studied sites, outperforming the original and calibrated PMT and HS methods. Mehdizadehet al. (2017) reported that MARS outperformed several empirical equations in its original and calibrated forms. These results indicate that this technique can be used to estimate the ET o in locations with only air temperature data available.

CONCLUSIONS
The calibration of the HS and PMT equations promoted better performances in relation to the original equations, improving the methods accuracy.
The MARS technique presented good performance, outperforming the original and calibrated PMT and HS equations and can be considered as an alternative to the empirical methods.