Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Regional flood frequency analysis using data-driven models (M5, random forest, and ANFIS) and a multivariate regression method in ungauged catchments

Regional flood frequency analysis using data-driven models (M5, random forest, and ANFIS) and a... Flooding is recognized worldwide joined of the most expensive natural hazards. To adopt proper structural and nonstruc- tural measurements for controlling and mitigating the rising flood risk, the availability of streamflow values along a river is essential. This raises concerns in the hydrological assessment of poorly gauged or ungauged catchments. In this regard, several flood frequency analysis approaches have been conducted in the literature including index flow method (IFM), square grids method (SGM), hybrid method (HM), as well as the conventional multivariate regression method (MRM). While these approaches are often based on assumptions that simplify the complex nature of the hydrological system, they might not be able to address uncertainties associated with the complexity of the system. One of the powerful tools to deal with this issue is data-driven model that can be easily adopted in complex systems. The objective of this research is to utilize three different data-driven models: random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), and M5 decision tree algorithm to predict peak flow associated with various return periods in ungauged catchments. Results from each data-driven model were assessed and compared with the conventional multivariate regression method. Results revealed all the three data-driven models performed better than the multivariate regression method. Among them, the RF model not only demonstrated the superior performance of peak flow prediction compared to the other algorithms but also provided insight into the complexity of the system through delivering a mathematical formulation. Keywords Flood frequency · M5 · RF · Regression · ANFIS Introduction used method for estimating flood risk at target locations in river basins where streamflow measurements are either Floods are among the costliest natural hazards experienced limited or unavailable (Griffis and Stedinger 2007; Leclerc in most of the places in the world, which results in heavy and Ouarda 2007; Zaman et al. 2012, Smith et al. 2015 and losses of life and economic damages (Gao et  al. 2018). Lotfirad et al. 2018) The first RFFA was undertaken by the Regional flood frequency analysis (RFFA) is an especially USGS in New England (Kinnison and Colby 1945). Lately, RFFA has received significant heed for design and manage- ment of water substructures such as dams, reservoirs, and bridges to slight flood risks and hence financial devastations. * Heidar Zarei zareih@scu.ac.ir Much of this attention has occurred in response to water- related hazards such as the flood in regions where minute or Hassan Esmaeili-Gisavandani esmaeili.gisavandani@gmail.com no data is available on peak flows such as the Indus floods in East of Iran. Mohammad Reza Fadaei Tehrani mfadaei@nri.ac.ir RFFA has been used as a crucial tool for many applica- tions for example (i) water substructure, (ii) flood preserva- Department of Hydrology and Water Resources, Faculty tion projects, (iii) land cover planning, and other hydrologic of Water and Environmental Engineering, Shahid Chamran studies. The regional techniques consist of a multivariate University of Ahvaz, Ahvaz, Iran 2 statistical structure derived from catchment characteristics Faculty of Water and Electricity Industry Training Institute, data (Rao and Srinivas 2008). In this process, the region of Tehran, Iran Vol.:(0123456789) 1 3 139 Page 2 of 11 Applied Water Science (2023) 13:139 influence approach can be formed where some catchments being induced into linear patches; these models provide a are pooled together based on vicinity in geographic. Conse- representation that is reproducible and understandable by quently, an optimum region is made based on some objective practitioners. (Solomatine and Xue 2004; Jothiprakash and function (Holmes et al. 2002; Aziz et al. 2010). Kote 2011). The target of RFFA is predicting river peak flow The relationship among flood flow values, physiographi- associated with various return periods in ungauged catch- cal and/or morphological characteristics of a catchment is a ments and also reduce uncertainty to evaluate the flooding fundamental framework for RFFA. Various methods such as (Merz and Blöschl 2008; Zaman et al. 2012; Shu and Ouarda multivariate regression method (MRM), square grids method 2012; Leščešen et al. 2019). The objectives of this study are (SGM) and hybrid method (HM) have been developed in two-fold. Firstly, the aim is to estimate the RFFA utilizing the literature for RFFA; each approach has its advantages the random forest and M5 algorithms. Secondly, the goal is and disadvantages. Among these methods, MRM has been to estimate flood occurrences in data-deficient catchments widely used and presented in previous studies (Golestani within the western region of Iran. et al. 2010; Malekinezhad et al. 2011; Latt et al. 2014). Recently, data-driven models have been widely adopted in hydrological studies. Data-driven models can handle the Materials and methods nonlinearity and uncertainty in hydrological data very effec- tively. These methods are extensively used for rainfall-runoff Out of eighty-nine stream gauges, thirty-two stations were modeling, water resource management modeling, and sus- used due to the availability of data. The data were obtained pended sediment modeling take; for example, Kumar et al. for the period of 1987-2018. The study area, including (2015) provide the details of the application of data-driven thirty-two stream gauges, is located in the west of Iran. From models in regional flood frequency estimation which is the homogeneous stations, twenty-seven stations were used explored. Regional flood frequency relationships are devel- for calibration and five stations for validation of the models oped employing data-driven models viz. ANN and FIS for (Fig. 1b). In fact, five stations were used for validation which lower Godavari subzone 3(f) of India and the same have was not used for the modeling and after making model, each been compared with the regional relationships derived using data-driven model validated with these stations. To approach the L-moments approach. a unique model, the return period was taken into account Recently, the data-driven models such as random forest as an independent factor. The study considered the annual (RF), fuzzy computing techniques and M5 decision tree, maximum instantaneous peak flow. Kolmogorov–Smirnov in complex modeling of the flood, have been developed test in EasyFit software 5.6 was used to estimate peak flow by Solomatine and Xue (2004); Aziz et al. (2014); Sehgal with different return periods based on the best distribution et al (2014); Kumar et al (2015); Latt et al. (2015); Deo function (Shokouhifar et al. 2022). and Şahin (2016); Esmaeili-Gisavandani (2017); Ghum- man et al. (2018); Zahiri and Nezaratian (2020); Desai and Study area Ouarda (2021); Adib et al. (2022); Jahangir et al. (2022). Among the numerous data mining techniques, ANFIS is the The Karkheh River basin is located in the west of Iran most widely used approaches in various water-related areas. (Fig. 1a). The Karkheh River basin covers 51,230 square Being an accurate predictive tool, the ANFIS technique has, kilometers in parts of six Iranian provinces. The Karkheh however, an inherent disadvantage that often results in hesi- river length is approximately 900 km. (Fallah-Mehdipour tating to interpret their outputs. This is because of being et al. 2020). This study considers RFFA using three data- a black box and consequently the nature of their solution driven models (M5, RF and ANFIS) and a multivariate is hazy. There might be a variation between networks of regression method in ungauged catchments. The further the same architecture trained on the same dataset due to detail of Karkheh Basin could be found in many papers (see, the arbitrary nature of the internal representation. (Witten e.g., Gheitasi 2016; Zamani et al. 2015). and Frank 2000). Srinivas et al. (2008); Aziz et al. (2017); Esmaelili-Gisavandani et al. (2017); Sharifi Garmdareh et al. Data (2018), and Zalnezhad et al. (2022) attempted to shed light on the structure of ANFIS using regional flood analysis and The following data were used in this study: (i) annual maxi- the methods of recovering rules. However, few studies have mum instantaneous peak flow that was obtained from the been used the application of the M5 algorithm and random Ministry of Energy of Iran and (ii) the physiographic char- forest in RFFA. Therefore, in this study, random forest acteristics of the catchment were extracted from the ALOS- and M5 algorithms were used to investigate peak flow and PALSAR satellite with a spatial resolution of 12.5 m (https:// compare ANFIS with the multivariate regression method. asf.alask a.edu ). The extraction of physiographic characteris- The most advantage of M5 and RF have been classified by tics was carried out in the ArcGIS software 10.5. 1 3 Applied Water Science (2023) 13:139 Page 3 of 11 139 Fig. 1 Location of the Karkheh basin in Iran Table 1 Descriptive statistics of physiographic characteristics Once trained, data-driven modeling becomes a parametric description of the function. Out of several possible data- MAX MIN Ave Std driven methods, ANFIS is the most widely used ones in A (Km ) 26,187.02 8.17 2501.95 5269.77 water resource applications, whereas less attention has been H (m) 2621.88 1043.75 1737.68 354.91 directed toward the RF and M5 model trees. 70% of the data L (Km) 420.82 4.77 88.18 99.15 were used for training and 30% for testing in all models. A S 37.67 8.14 19.12 6.69 brief description of the methods mentioned above, is sum- marized as follows: M5 model The data utilized in this study, including the annual maximum flood (Q), ranges from 17.5 to 1337.8  m /s. The M5 model is a data-driven model proposed by Quinlan flood discharge was calculated with a return period (T) of 2, 10, 100, 1000 years. Drainage areas (A) range from 8.17 (1992) and mainly employed in the realm of water science (Rahimikhoob 2014; Kisi and Kilic 2016; Kisi and Parmar to 26,187.02  km . The range of the height of each sub-basin (H) is 1043 to 2621 m; the range of stream length (L) is 2016). Continuously, the final structure together with the dependent leaves is shown as a tree in Fig. 2b. The further between 4.77 to 420.82 km and catchment slope (S) varying from 8.14% to 37.67%. Table 1 presents the physiographic detail of M5 model could be found in many papers (e.g., Farajpanah et al. 2020; Adib et al. 2023). characteristics of the studied catchments. Random forest Data‑driven modeling The RF method is nonparametric and belongs to the family Data-driven modeling relies on relationships between measured data without a need for a priori knowledge of of ensemble methods. The RF method consists of a set of regression trees used to reconstruct educational data. Typi- the physical system behavior (Jones et al. 2013; Ashrafza- deh et al. 2020; Biazar et al. 2020; Jafarpour et al. 2022). cally, a set of basic training examples is formed. Combining 1 3 139 Page 4 of 11 Applied Water Science (2023) 13:139 Fig. 2 Schematic of M5Tree: a splitting the input space, b the resultant dendriform (Wang et al. 2017) three parameters in RF is essential. The first is how many adaptive node, is shown in Fig. 3. The further detail of ANFIS trees should be created, the second is how many variables could be found in many books (e.g., Azar 2010; Esmaeili- are involved in creating a node for each network, and the Gisavandani 2017; Adib et al. (2021). third parameter is the size of the node, which indicates the depth of the regression tree created. One of the advantages Evaluation criteria of this method is that there is no need to prune the trees during modeling and classification (Esmaeili-Gisavandani Normal root-mean-square error (NRMSE) and correlation et al. 2022). coefficient (R ) were used to evaluate model performance: (Y − X ) i i ANFIS R = 1 − (1) (X − X ) i i Adaptive neuro-fuzzy inference system (ANFIS) could be a multilayer feed-forward network where each node performs (Y − X ) a selected function on incoming signals (Jang 1993; Heddam i i NRMSE = (2) 2014). An ordinary architecture of an ANFIS, during which n a circle indicates a set node, whereas a square indicates an Fig. 3 Sugeno ANFIS system equivalent to the system 1 3 Applied Water Science (2023) 13:139 Page 5 of 11 139 Table 2 The four combinations of input datasets where X and Y are the observed and estimated values and X i i i are the average values of observation, and n represents the Combinations Inputs Output number of data. A comparison of the correlation coefficient 1 T, L Q and RMSE values recognizes a better performance. The best 2 T, A, L Q model has higher value of R and a smaller value of RMSE. 3 T, H, A, L Q 4 T, A, H, L, S Q QT: high flow in the different return period, T: return period, L: length of streamflow, A: Area, S: slop, H: height Results According to Table 2, four combinations of input data were used in the MRM, ANFIS, M5 and RF models to peak flow Table 3 The ANFIS performance in different models with different return periods for regional flood frequency Combinations Number Membership function NRMSE R2 analysis (RFFA) (Table 3). of rules (Trimf) 0.871 0.72 1 9 (Trapmf) 0.876 0.71 ANFIS results (Gaussmf) 0.875 0.70 (Gauss2mf) 0.875 0.73 To calculate RFFA with the ANFIS model, for any combi- (Trimf) 0.821 0.84 nation, an optimum number of membership functions was 2 27 (Trapmf) 0.826 0.83 specified based on trial and error. The best type of mem- (Gaussmf) 0.885 0.84 bership function was recognized from between bell-shaped (Gauss2mf) 0.884 0.84 (gbellmf), trapezoidal-shaped (tramf), triangular-shaped (Trimf) 0.871 0.88 (trimf), Gaussian (gaussmf) and Gaussian 2 (gauss2mf) by 3 81 (Trapmf) 0.875 0.86 repeated model training and testing based on every member- (Gaussmf) 0.876 0.85 ship function number and type via trial and error. Based on (Gauss2mf) 0.866 0.86 the correlation coefficient (R ) and root-mean-square error (Trimf) 0.851 0.92 (RMSE), combinations 4 (R = 0.92 and NRMSE = 0.851) is 4 243 (Trapmf) 0.856 0.89 better performance than the others (Fig. 4). (Gaussmf) 0.865 0.89 (Gauss2mf) 0.864 0.91 The bold value for combination number indicates the best input com- bination identified by model evaluation criteria Fig. 4 The tree diagram generated by the M5 model for the case study 1 3 139 Page 6 of 11 Applied Water Science (2023) 13:139 Table 4 Evaluation criteria for 2 M5 results Combinations NRMSE R the M5 model 1 0.76 0.88 The M5 model tree does not require to set any user-defined 2 0.61 0.89 parameters. In addition, the M5 model can provide the num- 3 0.69 0.91 ber of linear relations which can be easily used to predict the 4 0.45 0.95 RFFA, as shown in Fig. 5. As shown in Table 4, the M5 model results indicated that The bold value for combina- input combination 4 gave a better performance than the other tion number indicates the best input combination identified by combinations (R = 0.95 and NRMSE = 0.45). The tree rela- model evaluation criteria tionships of the M5 model for the best combination of the inputs are presented in the appendix. As shown in Fig. 5, the peak flow values estimated by each model are compared. RF has the best performance in RF results peak flow estimation, while MRM has the worst. Further - more, most of the models underestimated the peak flow in As shown in Table 5, the RF model results indicated that the 2-year and 10-year return periods, while most overesti- input combination 4 gave a better performance than the other mated the peak flow in the 100-year and 1000-year return combinations (R = 0.96 and NRMSE = 0.223). periods. Fig. 5 Comparison of peak flow estimated by models in the validation phase 1 3 Applied Water Science (2023) 13:139 Page 7 of 11 139 Table 5 Evaluation criteria for 2 model is based on regression, the ANFIS model is based Combinations NRMSE R the M5 model on fuzzy logic, the M5 model is based on classification, 1 0.341 0.88 and the RF model is based on ensemble learning under 2 0.341 0.91 supervision. Models require inputs such as area, stream 3 0.225 0.95 length, basin slope, basin height, and return period num- 4 0.223 0.96 ber, which can all be derived from topography. Also, the best combination of inputs belonged to combination 4 with The bold value for combina- a higher correlation coefficient and lower NRMSE. Based tion number indicates the best input combination identified by on the excellent results obtained in estimating peak flow in model evaluation criteria the calibration and verification stages, particularly using the RF model, it is clear that this study is far more effec- As Fig. 6 illustrates performance of the used models in tive than similar studies whose inputs and modeling pro- calibration (twenty-seven stream gauges) and validation (5 cess were incredibly complex. It is evident from Fig. 6 that stream gauges) stages, according to the Taylor diagrams the models used for estimating peak flow had a favorable (Fig. 6), the performance of the RF model is the best. In the performance, especially the RF model, with better accu- return periods of 2, 10, and 1000 years, the M5 model ranks racy in the short-term return period of 2 and 10 years than second after the RF, but in the 100-year return period, the in the long-term return period (100 and 1000 years). ANFIS model ranks second after RF. RFFA makes a relationship between flood frequency and physiographical characteristics of catchments to esti- mate flood in ungagged regions like Rahman et al. (2020). Discussion In this regard, the performances of the RF and M5 tree network as piecewise linear functions, ANFIS and multi- This study aims to provide a relatively simple method to variate regression method were evaluated to estimate flood estimate peak flow amounts in ungauged region based on frequency in the ungagged sub-catchments like Vafakhah their physiographic characteristics. To achieve this, data- and Bozchaloei (2020). driven models of varying natures were used. The MRM Fig. 6 Performance of RF, M5, ANFIS, and MRM in estimating flood frequency in five validation stream gauges at 2,10,100, and 1000 return periods 1 3 139 Page 8 of 11 Applied Water Science (2023) 13:139 A comparison of the correlation coefficient and root- LM number = 2. mean-squared error values indicated an improved perfor- Q = 0.4087 × T + 0.0992 × A—0.4854 × H + 1.0686 × L + mance obtained from the data-driven model compared to 27.7689 × S + 287.7801. traditional methods such as the multivariate regression If H < = 1767.802 and S > 16.041 and H < = 1639.885 (MRM) Method. However, the performance of the RF model and A < = 72.041 and T < = 17.5 and A < = 35.66 then LM is almost similar to the M5 and ANFIS models. number = 3. LM number = 3. Q = 9.915 × T—0.1856 × A—3.4894 × H + 3.4408 × L + Conclusions 19.681 × S + 5136.1149. If H < = 1767.802 and S > 16.041 and H < = 1639.885 Knowing the magnitude of historical floods in a particular and A < = 72.041 and T < = 17.5 and A > 35.66 then LM area is crucial for designing hydraulic structures. Small and number = 4. medium watersheds often lack ground flow measurement LM number = 4. stations due to the costs involved in building and maintain- Q = 8.2118 × T—1.1938 × A—3.4894 × H + 3.4408 × L + ing them. In contrast, hydraulic structures need to be built 19.681 × S + 5144.5592. on rivers in these areas in order to develop civil and agricul- If H < = 1767.802 and S > 16.041 and H < = 1639.885 tural activities. Therefore, the flood discharge design must and A < = 72.041 and T > 17.5 then LM number = 5. be determined. This study used machine learning models to LM number = 5. estimate the peak flow of ungauged watersheds. Q = 1.6026 × T + 0.045 × A—4.3676 × H + 3.4408 × L + The following model performance was found in this 19.681 × S + 6531.1303. study: If H < = 1767.802 and S > 16.041 and H < = 1639.885 The procreated dendriform structure of multi-linear mod- and A > 72.041 and T < = 17.5 then LM number = 6. els utilized in RF and M5 is comprehensible and straightfor- LM number = 6. ward to grasp for decision-makers. It also provides an hon- Q = 21.6586 × T + 0.0392 × A—3.0971 × H + 3.4408 × L est overview of the relationships between the physiographic + 19.681 × S + 5203.2448. characteristics of the watershed; If H < = 1767.802 and S > 16.041 and H < = 1639.885 The RF and M5 model permits to simply create a family and A > 72.041 and T > 17.5 then LM number = 7. of explainable models with a varied number of component LM number = 7. models and thus varied strength and correctness; Q = 2.7201 × T + 0.0339 × A—3.0971 × H + 3.4408 × L + Modeling with the RF and M5 are the fastest data-driven 19.681 × S + 5622.2802. models (proceeding of data with RF and M5 is faster than If H < = 1767.802 and S > 16.041 and H > 1639.885 and ANFIS); L < = 67.812 then LM number = 8. The information encapsulated in RF and M5 algorithms LM number = 8. can potentially assist in variable selection and the evalua- Q = 0.8159 × T—0.0317 × A—1.332 × H + 4.5635 × L + tion of their relationships when processing data with other 19.681 × S + 1703.4591. models. For instance, M5 can aid users in determining the If H < = 1767.802 and S > 16.041 and H > 1639.885 and sensitivity of the data. L > 67.812 and T < = 17.5 and A < = 6591.094 then LM number = 9. LM number = 9. Appendix Q = 10.8508 × T—0.0197 × A—1.332 × H + 4.0254 × L + 19.681 × S + 1833.1274. Regression expressions of the best M5 model If H < = 1767.802 and S > 16.041 and H > 1639.885 and L > 67.812 and T < = 17.5 and A > 6591.094 then LM Here, the linear regressions of the best results of M5 number = 10. modeling. LM number = 10. If H < = 1767.802 and S < = 16.041 and T < = 17.5 then Q = 12.7516 × T—0.0197 × A—1.332 × H + 4.0254 × L + LM number = 1. 19.681 × S + 1851.5772. LM number = 1. If H < = 1767.802 and S > 16.041 and H > 1639.885 and Q = 0.3532 × T + 0.0288 × A—0.4854 × H + 1.0686 × L + L > 67.812 and T > 17.5 then LM number = 11. 27.7689 × S + 323.3353. LM number = 11. If H < = 1767.802 and S < = 16.041 and T > 17.5 then Q = 1.1691 × T—0.0107 × A—1.332 × H + 4.0254 × L LM number = 2. + 19.681 × S + 1967.2823. 1 3 Applied Water Science (2023) 13:139 Page 9 of 11 139 adaptation, distribution and reproduction in any medium or format, If H > 1767.802 and A < = 157.306 t hen LM as long as you give appropriate credit to the original author(s) and the number = 12. source, provide a link to the Creative Commons licence, and indicate LM number = 12. if changes were made. The images or other third party material in this Q = 0.1579 × T + 0.0084 × A—0.3096 × H + 0.4856 × L article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not + 15.8241 × S + 331.4818. included in the article's Creative Commons licence and your intended If H > 1767.802 and A > 157.306 and S < = 20.719 and use is not permitted by statutory regulation or exceeds the permitted T < = 17.5 then LM number = 13. use, you will need to obtain permission directly from the copyright LM number = 13. holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Q = 0.2033 × T + 0.0109 × A—0.2609 × H + 0.4856 × L + 16.9375 × S + 225.23134818. If H > 1767.802 and A > 157.306 and S < = 20.719 and References T > 17.5 and H < = 1959.692 and A < = 1418.605 then LM number = 14. Adib A, Zaerpour A, Kisi O, Lotfirad M (2021) A rigorous wave- LM number = 14. let-Packet transform to retrieve snow depth from SSMIS data Q = 0.2281 × T + 0.031 × A—0.2609 × H + 0.4856 × L and evaluation of its reliability by uncertainty parameters. + 16.9375 × S + 212.4058. Water Resour Manage 35:2723–2740. https:// doi. org/ 10. 1007/ s11269- 021- 02863-x If H > 1767.802 and A > 157.306 and S < = 20.719 and Adib A, Farajpanah H, Shoushtari MM, Lotfirad M, Saeedpanah I, T > 17.5 and H < = 1959.692 and A > 1418.605 then LM Sasani H (2022) Selection of the best machine learning method number = 15.* for estimation of concentration of different water quality param- LM number = 15. eters. Sustain Water Resour Manag 8(6):172. https://doi. or g/10. 1007/ s40899- 022- 00765-3 Q = 0.2382 × T + 0.031 × A—0.2609 × H + 0.4856 × L Adib A, Kalantarzadeh SSO, Shoushtari MM, Lotfirad M, Liaghat + 16.9375 × S + 218.4885. A, Oulapour M (2023) Sensitive analysis of meteorological data If H > 1767.802 and A > 157.306 and S < = 20.719 and and selecting appropriate machine learning model for estima- T > 17.5 and H > 1959.692 then LM number = 16. tion of reference evapotranspiration. Appl Water Sci 13(3):83. https:// doi. org/ 10. 1007/ s13201- 023- 01895-5 LM number = 16. Ashrafzadeh A, Kişi O, Aghelpour P, Biazar SM, Masouleh MA Q = 0.2111 × T + 0.0109 × A—0.2609 × H + 0.4856 × L (2020) Comparative study of time series models, support vector + 16.9375 × S + 254.9837. machines, and GMDH in forecasting long-term evapotranspira- If H > 1767.802 and A > 157.306 and S > 20.719 and tion rates in northern Iran. J Irrig Drain Eng 146(6):04020010. https:// doi. org/ 10. 1061/ (ASCE) IR. 1943- 4774. 00014 71 T < = 17.5 then LM number = 17. Aziz K, Rahman A, Fang G, Shrestha S (2014) Application of artifi- LM number = 17. cial neural networks in regional flood frequency analysis: a case Q = 0.3026 × T + 0.2288 × A—0.2609 × H + 0.4856 × L study for Australia. Stoch Env Res Risk Assess 28(3):541–554. + 19.3761 × S + 20.518. https:// doi. org/ 10. 1007/ s00477- 013- 0771-5 Aziz K, Haque MM, Rahman A, Shamseldin AY, Shoaib M (2017) If H > 1767.802 and A > 157.306 and S > 20.719 and Flood estimation in ungauged catchments: application of arti- T > 17.5 then LM number = 18. ficial intelligence-based methods for Eastern Australia. Stoch LM number = 18. Env Res Risk Assess 31(6):1499–1514. https://doi. or g/10. 1007/ Q = 0.3336 × T + 0.3017 × A—0.2609 × H + 0.4856 × L s00477- 016- 1272-0 Aziz K, Rahman A, Fang G, Haddad K, & Shrestha S (2010) Design + 19.3761 × S—22.9879. flood estimation for ungauged catchments: application of artificial neural networks for eastern Australia. In: World Environmental and Water Resources Congress 2010: Challenges of Change (pp Authors contributions The authors declare that they have contribution 2841–2850). doi:https:// doi. org/ 10. 1061/ 41114 (371) 293 in the preparation of this manuscript. Biazar SM, Rahmani V, Isazadeh M, Kisi O, Dinpashoh Y (2020) New input selection procedure for machine learning methods in Funding The authors did not receive support from any organization estimating daily global solar radiation. Arab J Geosci 13:1–17. for the submitted work. https:// doi. org/ 10. 1007/ s12517- 020- 05437-0 Deo RC, Şahin M (2016) An extreme learning machine model for the Declarations simulation of monthly mean streamflow water level in eastern Queensland. Environ Monit Assess 188(2):90. https:// doi. org/ Conflict of interest The authors have no conflicts of interest to declare 10. 1007/ s10661- 016- 5094-9 that are relevant to the content of this article. Desai S, Ouarda TB (2021) Regional hydrological frequency analy- sis at ungauged sites with random forest regression. J Hydrol Ethical approval The manuscript is an original work with its own 594:125861. https:// doi. org/ 10. 1016/j. jhydr ol. 2020. 125861 merit, has not been previously published in whole or in part, and is not Esmaeili Gisavandani H (2017) Evaluation of the ability of adap- being considered for publication elsewhere. tive neuro-fuzzy interface system, artificial neural network and regression to regional flood analysis. J Water Soil Conserv 24(3):149–166. https:// doi. org/ 10. 22069/ JWFST. 2017. 11413. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, 1 3 139 Page 10 of 11 Applied Water Science (2023) 13:139 Esmaeili-Gisavandani H, Farajpanah H, Adib A, Kisi O, Riyahi MM, Kisi O, Parmar KS (2016) Application of least square support vector Lotfirad M, Salehpoor J (2022) Evaluating ability of three types of machine and multivariate adaptive regression spline models in discrete wavelet transforms for improving performance of different long term prediction of river water pollution. J Hydrol 534:104– ML models in estimation of daily-suspended sediment load. Arab 112. https:// doi. org/ 10. 1016/j. jhydr ol. 2015. 12. 014 J Geosci 15(1):1–13. https://doi. or g/10. 1007/ s12517- 021- 09282-7 Kumar R, Goel NK, Chatterjee C, Nayak PC (2015) Regional flood Fallah-Mehdipour E, Bozorg-Haddad O, Loáiciga HA (2020) frequency analysis using soft computing techniques. Water Climate-environment-water: integrated and non-integrated Resour Manage 29(6):1965–1978. https:// doi. or g/ 10. 1007/ approaches to reservoir operation. Environ Monit Assess s11269- 015- 0922-1 192(1):60. https:// doi. org/ 10. 1007/ s10661- 019- 8039-2 Latt ZZ (2015) Application of feedforward artificial neural network Farajpanah H, Lotfirad M, Adib A, Esmaeili-Gisavandani H, Kisi Ö, in Muskingum flood routing: a black-box forecasting approach Riyahi MM, Salehpoor J (2020) Ranking of hybrid wavelet-AI for a natural river system. Water Resour Manage 29(14):4995– models by TOPSIS method for estimation of daily flow discharge. 5014. https:// doi. org/ 10. 1007/ s11269- 015- 1100-1 Water Supply 20(8):3156–3171. https://doi. or g/10. 2166/ w s.2020. Latt ZZ, Wittenberg H (2014) Improving flood forecasting in a devel - 211 oping country: a comparative study of stepwise multiple linear Gao W, Shen Q, Zhou Y, Li X (2018) Analysis of flood inundation regression and artificial neural network. Water Resour Manage in ungauged basins based on multi-source remote sensing data. 28(8):2109–2128. https:// doi. org/ 10. 1007/ s11269- 014- 0600-8 Environ Monit Assess 190(3):129. https:// doi. or g/ 10. 1007/ Leclerc M, Ouarda TBMJ (2007) Non stationary regional frequency s10661- 018- 6499-4 analysis at ungaged sites. J Hydrol 343(3):254–265. https://doi. Gheitasi M (2016) Flood frequency analysis of the maximum annual org/ 10. 1016/j. jhydr ol. 2007. 06. 021 discharge of rivers in Lorestan province (case study: Karkheh Leščešen I, Urošev M, Dolinaj D, Pantelić M, Telbisz T, Varga G, watershed in Lorestan province) (Doctoral dissertation, Univer- Milošević D (2019) Regional flood frequency analysis based on sity of Zabol) L-moment approach case study Tisza river basin. Water Resour Ghumman AR, Ahmad S, Hashmi HN (2018) Performance assessment 46(6):853–860. https:// doi. org/ 10. 1134/ S0097 80781 90600 6X of artificial neural networks and support vector regression models Lotfirad M, Adib A, Haghighi A (2018) Estimation of daily runoff for stream flow predictions. Environ Monit Assess 190(12):704. using of the semi-conceptual rainfall-runoff IHACRES model https:// doi. org/ 10. 1007/ s10661- 018- 7012-9 in the Navrood watershed (a watershed in the Gilan province. Golestani M, Kavianpour MR, & Hedayatizade M (2010, November) Iran J Ecohydrol 5(2):449–460 Determination of homogeneous regions case study: South-East Malekinezhad H, Nachtnebel HP, Klik A (2011) Comparing the Urmia Lake Catchment, Iran. In: 2010 2nd International Confer- index-flood and multiple-regression methods using L-moments. ence on Chemical, Biological and Environmental Engineering (pp Phys Chem Earth, Parts a/b/c 36(1–4):54–60. https://doi. or g/10. 71–74). IEEE, doi: https://doi. or g/10. 1109/ ICBEE. 2010. 56489 35 1016/j. pce. 2010. 07. 013 Gris ffi VW, Stedinger JR (2007) The use of GLS regression in regional Merz R, Blöschl G (2008) Flood frequency hydrology: 2. Combin- hydrologic analysis. J Hydrol 344(1):82–95. https:// doi. org/ 10. ing data evidence. Water Resour Res. https:// doi. org/ 10. 1029/ 1016/j. jhydr ol. 2007. 06. 0232007w r0067 44 Heddam S (2014) Modeling hourly dissolved oxygen concentration Quinlan JR (1992, November) Learning with continuous classes. In: (DO) using two different adaptive neuro-fuzzy inference systems 5th Australian joint conference on artificial intelligence (Vol (ANFIS): a comparative study. Environ Monit Assess 186(1):597– 92, pp 343–348) 619. https:// doi. org/ 10. 1007/ s10661- 013- 3402-1 Rahimikhoob A (2014) Comparison between M5 model tree and neu- Holmes MGR, Young AR, Gustard A, Grew R (2002) A region of ral networks for estimating reference evapotranspiration in an influence approach to predicting flow duration curves within arid environment. Water Resour Manage 28(3):657–669. https:// ungauged catchments. Hydrol Earth Syst Sci 6:721–731doi. org/ 10. 1007/ s11269- 013- 0506-x Jafarpour M, Adib A, Lotfirad M (2022) Improving the accuracy Rahman AS, Khan Z, Rahman A (2020) Application of independ- of satellite and reanalysis precipitation data by their ensem- ent component analysis in regional flood frequency analysis: ble usage. Appl Water Sci 12(9):232. h t t p s : / / d o i . o rg / 1 0 . 1 0 0 7 / comparison between quantile regression and parameter regres- s13201- 022- 01750-z sion techniques. J Hydrol 581:124372. https://doi. or g/10. 1016/j. Jahangir MS, Biazar SM, Hah D, Quilty J, Isazadeh M (2022) Investi-jhydr ol. 2019. 124372 gating the impact of input variable selection on daily solar radia- Rao AR, Srinivas VV (2008) Regionalization of watersheds: an tion prediction accuracy using data-driven models: a case study in approach based on cluster analysis. Springer Science and Busi- northern Iran. Stoch Env Res Risk Assess 36(1):225–249. https:// ness Media, Cham doi. org/ 10. 1007/ s00477- 021- 02070-5 Sehgal V, Sahay RR, Chatterjee C (2014) Effect of utilization of Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference sys- discrete wavelet components on flood forecasting perfor- tem. IEEE Trans Syst Man Cybern 23(3):665–685. https:// doi. mance of wavelet based ANFIS models. Water Resour Manage org/ 10. 1109/ 21. 256541 28(6):1733–1749. https:// doi. org/ 10. 1007/ s11269- 014- 0584-4 Jones RM, Liu L, Dorevitch S (2013) Hydrometeorological variables Sharifi Garmdareh E, Vafakhah M, Eslamian SS (2018) Regional predict fecal indicator bacteria densities in freshwater: data- flood frequency analysis using support vector regression in driven methods for variable selection. Environ Monit Assess arid and semi-arid regions of Iran. Hydrol Sci J 63(3):426–440. 185(3):2355–2366. https:// doi. org/ 10. 1007/ s10661- 012- 2716-8https:// doi. org/ 10. 1080/ 02626 667. 2018. 14320 56 Jothiprakash V, Kote AS (2011) Effect of pruning and smoothing while Shokouhifar Y, Lotfirad M, Esmaeili-Gisavandani H, Adib A (2022) using M5 model tree technique for reservoir inflow prediction. J Evaluation of climate change effects on flood frequency in arid Hydrol Eng 16(7):563–574 and semi-arid basins. Water Supply 22(8):6740–6755. https:// Kinnison HB, Colby BR (1945) Flood formulas based on drainage doi. org/ 10. 2166/ ws. 2022. 271 basin characteristics. Trans Am Soc Civ Eng 110(1):849–876 Shu C, Ouarda TB (2012) Improved methods for daily streamflow Kisi O, Kilic Y (2016) An investigation on generalization ability of estimates at ungauged sites. Water Resour Res. https:// doi. org/ artificial neural networks and M5 model tree in modeling refer -10. 1029/ 2011W R0115 01 ence evapotranspiration. Theoret Appl Climatol 126(3–4):413– 425. https:// doi. org/ 10. 1007/ s00704- 015- 1582-z 1 3 Applied Water Science (2023) 13:139 Page 11 of 11 139 Smith A, Sampson C, Bates P (2015) Regional flood frequency analysis approaches. Environ Sci Pollut Res. https:// doi. or g/ 10. 1007/ at the global scale. Water Resour Res 51(1):539–553. https:// doi. s11356- 020- 07802-8 org/ 10. 1002/ 2014W R0158 14 Zalnezhad A, Rahman A, Vafakhah M, Samali B, Ahamed F (2022) Solomatine DP, Xue Y (2004) M5 model trees and neural networks: Regional flood frequency analysis using the FCM-ANFIS algo- application to flood forecasting in the upper reach of the Huai rithm: a case study in South-Eastern Australia. Water 14(10):1608. River in China. J Hydrol Eng 9(6):491–501. https:// doi. org/ 10. https:// doi. org/ 10. 3390/ w1410 1608 1061/ (ASCE) 1084- 0699(2004)9: 6(491) Zaman MA, Rahman A, Haddad K (2012) Regional flood frequency Srinivas VV, Tripathi S, Rao AR, Govindaraju RS (2008) Regional analysis in arid regions: a case study for Australia. J Hydrol flood frequency analysis by combining self-organizing feature 475:74–83. https:// doi. org/ 10. 1016/j. jhydr ol. 2012. 08. 054 map and fuzzy clustering. J Hydrol 348(1–2):148–166. https:// Zamani R, Tabari H, Willems P (2015) Extreme streamflow drought doi. org/ 10. 1016/j. jhydr ol. 2007. 09. 046 in the Karkheh river basin (Iran): probabilistic and regional Vafakhah M, Bozchaloei SK (2020) Regional analysis of flow duration analyses. Nat Hazards 76(1):327–346. https:// doi. org/ 10. 1007/ curves through support vector regression. Water Resour Manage s11069- 014- 1492-x 34(1):283–294. https:// doi. org/ 10. 1007/ s11269- 019- 02445-y Wang L, Kisi O, Zounemat-Kermani M, Zhu Z, Gong W, Niu Z, Liu Publisher's Note Publisher's Note Springer Nature remains neutral Z (2017) Prediction of solar radiation in China using different with regard to jurisdictional claims in published maps and institutional adaptive neuro-fuzzy methods and M5 model tree. Int J Climatol affiliations. 37(3):1141–1155. https:// doi. org/ 10. 1002/ joc. 4762 Zahiri J, Nezaratian H (2020) Estimation of transverse mix- ing coefficient in streams using M5, MARS, GA, and PSO 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Water Science Springer Journals

Regional flood frequency analysis using data-driven models (M5, random forest, and ANFIS) and a multivariate regression method in ungauged catchments

Loading next page...
 
/lp/springer-journals/regional-flood-frequency-analysis-using-data-driven-models-m5-random-qvkFpAjJsg

References (56)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2023
ISSN
2190-5487
eISSN
2190-5495
DOI
10.1007/s13201-023-01940-3
Publisher site
See Article on Publisher Site

Abstract

Flooding is recognized worldwide joined of the most expensive natural hazards. To adopt proper structural and nonstruc- tural measurements for controlling and mitigating the rising flood risk, the availability of streamflow values along a river is essential. This raises concerns in the hydrological assessment of poorly gauged or ungauged catchments. In this regard, several flood frequency analysis approaches have been conducted in the literature including index flow method (IFM), square grids method (SGM), hybrid method (HM), as well as the conventional multivariate regression method (MRM). While these approaches are often based on assumptions that simplify the complex nature of the hydrological system, they might not be able to address uncertainties associated with the complexity of the system. One of the powerful tools to deal with this issue is data-driven model that can be easily adopted in complex systems. The objective of this research is to utilize three different data-driven models: random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), and M5 decision tree algorithm to predict peak flow associated with various return periods in ungauged catchments. Results from each data-driven model were assessed and compared with the conventional multivariate regression method. Results revealed all the three data-driven models performed better than the multivariate regression method. Among them, the RF model not only demonstrated the superior performance of peak flow prediction compared to the other algorithms but also provided insight into the complexity of the system through delivering a mathematical formulation. Keywords Flood frequency · M5 · RF · Regression · ANFIS Introduction used method for estimating flood risk at target locations in river basins where streamflow measurements are either Floods are among the costliest natural hazards experienced limited or unavailable (Griffis and Stedinger 2007; Leclerc in most of the places in the world, which results in heavy and Ouarda 2007; Zaman et al. 2012, Smith et al. 2015 and losses of life and economic damages (Gao et  al. 2018). Lotfirad et al. 2018) The first RFFA was undertaken by the Regional flood frequency analysis (RFFA) is an especially USGS in New England (Kinnison and Colby 1945). Lately, RFFA has received significant heed for design and manage- ment of water substructures such as dams, reservoirs, and bridges to slight flood risks and hence financial devastations. * Heidar Zarei zareih@scu.ac.ir Much of this attention has occurred in response to water- related hazards such as the flood in regions where minute or Hassan Esmaeili-Gisavandani esmaeili.gisavandani@gmail.com no data is available on peak flows such as the Indus floods in East of Iran. Mohammad Reza Fadaei Tehrani mfadaei@nri.ac.ir RFFA has been used as a crucial tool for many applica- tions for example (i) water substructure, (ii) flood preserva- Department of Hydrology and Water Resources, Faculty tion projects, (iii) land cover planning, and other hydrologic of Water and Environmental Engineering, Shahid Chamran studies. The regional techniques consist of a multivariate University of Ahvaz, Ahvaz, Iran 2 statistical structure derived from catchment characteristics Faculty of Water and Electricity Industry Training Institute, data (Rao and Srinivas 2008). In this process, the region of Tehran, Iran Vol.:(0123456789) 1 3 139 Page 2 of 11 Applied Water Science (2023) 13:139 influence approach can be formed where some catchments being induced into linear patches; these models provide a are pooled together based on vicinity in geographic. Conse- representation that is reproducible and understandable by quently, an optimum region is made based on some objective practitioners. (Solomatine and Xue 2004; Jothiprakash and function (Holmes et al. 2002; Aziz et al. 2010). Kote 2011). The target of RFFA is predicting river peak flow The relationship among flood flow values, physiographi- associated with various return periods in ungauged catch- cal and/or morphological characteristics of a catchment is a ments and also reduce uncertainty to evaluate the flooding fundamental framework for RFFA. Various methods such as (Merz and Blöschl 2008; Zaman et al. 2012; Shu and Ouarda multivariate regression method (MRM), square grids method 2012; Leščešen et al. 2019). The objectives of this study are (SGM) and hybrid method (HM) have been developed in two-fold. Firstly, the aim is to estimate the RFFA utilizing the literature for RFFA; each approach has its advantages the random forest and M5 algorithms. Secondly, the goal is and disadvantages. Among these methods, MRM has been to estimate flood occurrences in data-deficient catchments widely used and presented in previous studies (Golestani within the western region of Iran. et al. 2010; Malekinezhad et al. 2011; Latt et al. 2014). Recently, data-driven models have been widely adopted in hydrological studies. Data-driven models can handle the Materials and methods nonlinearity and uncertainty in hydrological data very effec- tively. These methods are extensively used for rainfall-runoff Out of eighty-nine stream gauges, thirty-two stations were modeling, water resource management modeling, and sus- used due to the availability of data. The data were obtained pended sediment modeling take; for example, Kumar et al. for the period of 1987-2018. The study area, including (2015) provide the details of the application of data-driven thirty-two stream gauges, is located in the west of Iran. From models in regional flood frequency estimation which is the homogeneous stations, twenty-seven stations were used explored. Regional flood frequency relationships are devel- for calibration and five stations for validation of the models oped employing data-driven models viz. ANN and FIS for (Fig. 1b). In fact, five stations were used for validation which lower Godavari subzone 3(f) of India and the same have was not used for the modeling and after making model, each been compared with the regional relationships derived using data-driven model validated with these stations. To approach the L-moments approach. a unique model, the return period was taken into account Recently, the data-driven models such as random forest as an independent factor. The study considered the annual (RF), fuzzy computing techniques and M5 decision tree, maximum instantaneous peak flow. Kolmogorov–Smirnov in complex modeling of the flood, have been developed test in EasyFit software 5.6 was used to estimate peak flow by Solomatine and Xue (2004); Aziz et al. (2014); Sehgal with different return periods based on the best distribution et al (2014); Kumar et al (2015); Latt et al. (2015); Deo function (Shokouhifar et al. 2022). and Şahin (2016); Esmaeili-Gisavandani (2017); Ghum- man et al. (2018); Zahiri and Nezaratian (2020); Desai and Study area Ouarda (2021); Adib et al. (2022); Jahangir et al. (2022). Among the numerous data mining techniques, ANFIS is the The Karkheh River basin is located in the west of Iran most widely used approaches in various water-related areas. (Fig. 1a). The Karkheh River basin covers 51,230 square Being an accurate predictive tool, the ANFIS technique has, kilometers in parts of six Iranian provinces. The Karkheh however, an inherent disadvantage that often results in hesi- river length is approximately 900 km. (Fallah-Mehdipour tating to interpret their outputs. This is because of being et al. 2020). This study considers RFFA using three data- a black box and consequently the nature of their solution driven models (M5, RF and ANFIS) and a multivariate is hazy. There might be a variation between networks of regression method in ungauged catchments. The further the same architecture trained on the same dataset due to detail of Karkheh Basin could be found in many papers (see, the arbitrary nature of the internal representation. (Witten e.g., Gheitasi 2016; Zamani et al. 2015). and Frank 2000). Srinivas et al. (2008); Aziz et al. (2017); Esmaelili-Gisavandani et al. (2017); Sharifi Garmdareh et al. Data (2018), and Zalnezhad et al. (2022) attempted to shed light on the structure of ANFIS using regional flood analysis and The following data were used in this study: (i) annual maxi- the methods of recovering rules. However, few studies have mum instantaneous peak flow that was obtained from the been used the application of the M5 algorithm and random Ministry of Energy of Iran and (ii) the physiographic char- forest in RFFA. Therefore, in this study, random forest acteristics of the catchment were extracted from the ALOS- and M5 algorithms were used to investigate peak flow and PALSAR satellite with a spatial resolution of 12.5 m (https:// compare ANFIS with the multivariate regression method. asf.alask a.edu ). The extraction of physiographic characteris- The most advantage of M5 and RF have been classified by tics was carried out in the ArcGIS software 10.5. 1 3 Applied Water Science (2023) 13:139 Page 3 of 11 139 Fig. 1 Location of the Karkheh basin in Iran Table 1 Descriptive statistics of physiographic characteristics Once trained, data-driven modeling becomes a parametric description of the function. Out of several possible data- MAX MIN Ave Std driven methods, ANFIS is the most widely used ones in A (Km ) 26,187.02 8.17 2501.95 5269.77 water resource applications, whereas less attention has been H (m) 2621.88 1043.75 1737.68 354.91 directed toward the RF and M5 model trees. 70% of the data L (Km) 420.82 4.77 88.18 99.15 were used for training and 30% for testing in all models. A S 37.67 8.14 19.12 6.69 brief description of the methods mentioned above, is sum- marized as follows: M5 model The data utilized in this study, including the annual maximum flood (Q), ranges from 17.5 to 1337.8  m /s. The M5 model is a data-driven model proposed by Quinlan flood discharge was calculated with a return period (T) of 2, 10, 100, 1000 years. Drainage areas (A) range from 8.17 (1992) and mainly employed in the realm of water science (Rahimikhoob 2014; Kisi and Kilic 2016; Kisi and Parmar to 26,187.02  km . The range of the height of each sub-basin (H) is 1043 to 2621 m; the range of stream length (L) is 2016). Continuously, the final structure together with the dependent leaves is shown as a tree in Fig. 2b. The further between 4.77 to 420.82 km and catchment slope (S) varying from 8.14% to 37.67%. Table 1 presents the physiographic detail of M5 model could be found in many papers (e.g., Farajpanah et al. 2020; Adib et al. 2023). characteristics of the studied catchments. Random forest Data‑driven modeling The RF method is nonparametric and belongs to the family Data-driven modeling relies on relationships between measured data without a need for a priori knowledge of of ensemble methods. The RF method consists of a set of regression trees used to reconstruct educational data. Typi- the physical system behavior (Jones et al. 2013; Ashrafza- deh et al. 2020; Biazar et al. 2020; Jafarpour et al. 2022). cally, a set of basic training examples is formed. Combining 1 3 139 Page 4 of 11 Applied Water Science (2023) 13:139 Fig. 2 Schematic of M5Tree: a splitting the input space, b the resultant dendriform (Wang et al. 2017) three parameters in RF is essential. The first is how many adaptive node, is shown in Fig. 3. The further detail of ANFIS trees should be created, the second is how many variables could be found in many books (e.g., Azar 2010; Esmaeili- are involved in creating a node for each network, and the Gisavandani 2017; Adib et al. (2021). third parameter is the size of the node, which indicates the depth of the regression tree created. One of the advantages Evaluation criteria of this method is that there is no need to prune the trees during modeling and classification (Esmaeili-Gisavandani Normal root-mean-square error (NRMSE) and correlation et al. 2022). coefficient (R ) were used to evaluate model performance: (Y − X ) i i ANFIS R = 1 − (1) (X − X ) i i Adaptive neuro-fuzzy inference system (ANFIS) could be a multilayer feed-forward network where each node performs (Y − X ) a selected function on incoming signals (Jang 1993; Heddam i i NRMSE = (2) 2014). An ordinary architecture of an ANFIS, during which n a circle indicates a set node, whereas a square indicates an Fig. 3 Sugeno ANFIS system equivalent to the system 1 3 Applied Water Science (2023) 13:139 Page 5 of 11 139 Table 2 The four combinations of input datasets where X and Y are the observed and estimated values and X i i i are the average values of observation, and n represents the Combinations Inputs Output number of data. A comparison of the correlation coefficient 1 T, L Q and RMSE values recognizes a better performance. The best 2 T, A, L Q model has higher value of R and a smaller value of RMSE. 3 T, H, A, L Q 4 T, A, H, L, S Q QT: high flow in the different return period, T: return period, L: length of streamflow, A: Area, S: slop, H: height Results According to Table 2, four combinations of input data were used in the MRM, ANFIS, M5 and RF models to peak flow Table 3 The ANFIS performance in different models with different return periods for regional flood frequency Combinations Number Membership function NRMSE R2 analysis (RFFA) (Table 3). of rules (Trimf) 0.871 0.72 1 9 (Trapmf) 0.876 0.71 ANFIS results (Gaussmf) 0.875 0.70 (Gauss2mf) 0.875 0.73 To calculate RFFA with the ANFIS model, for any combi- (Trimf) 0.821 0.84 nation, an optimum number of membership functions was 2 27 (Trapmf) 0.826 0.83 specified based on trial and error. The best type of mem- (Gaussmf) 0.885 0.84 bership function was recognized from between bell-shaped (Gauss2mf) 0.884 0.84 (gbellmf), trapezoidal-shaped (tramf), triangular-shaped (Trimf) 0.871 0.88 (trimf), Gaussian (gaussmf) and Gaussian 2 (gauss2mf) by 3 81 (Trapmf) 0.875 0.86 repeated model training and testing based on every member- (Gaussmf) 0.876 0.85 ship function number and type via trial and error. Based on (Gauss2mf) 0.866 0.86 the correlation coefficient (R ) and root-mean-square error (Trimf) 0.851 0.92 (RMSE), combinations 4 (R = 0.92 and NRMSE = 0.851) is 4 243 (Trapmf) 0.856 0.89 better performance than the others (Fig. 4). (Gaussmf) 0.865 0.89 (Gauss2mf) 0.864 0.91 The bold value for combination number indicates the best input com- bination identified by model evaluation criteria Fig. 4 The tree diagram generated by the M5 model for the case study 1 3 139 Page 6 of 11 Applied Water Science (2023) 13:139 Table 4 Evaluation criteria for 2 M5 results Combinations NRMSE R the M5 model 1 0.76 0.88 The M5 model tree does not require to set any user-defined 2 0.61 0.89 parameters. In addition, the M5 model can provide the num- 3 0.69 0.91 ber of linear relations which can be easily used to predict the 4 0.45 0.95 RFFA, as shown in Fig. 5. As shown in Table 4, the M5 model results indicated that The bold value for combina- input combination 4 gave a better performance than the other tion number indicates the best input combination identified by combinations (R = 0.95 and NRMSE = 0.45). The tree rela- model evaluation criteria tionships of the M5 model for the best combination of the inputs are presented in the appendix. As shown in Fig. 5, the peak flow values estimated by each model are compared. RF has the best performance in RF results peak flow estimation, while MRM has the worst. Further - more, most of the models underestimated the peak flow in As shown in Table 5, the RF model results indicated that the 2-year and 10-year return periods, while most overesti- input combination 4 gave a better performance than the other mated the peak flow in the 100-year and 1000-year return combinations (R = 0.96 and NRMSE = 0.223). periods. Fig. 5 Comparison of peak flow estimated by models in the validation phase 1 3 Applied Water Science (2023) 13:139 Page 7 of 11 139 Table 5 Evaluation criteria for 2 model is based on regression, the ANFIS model is based Combinations NRMSE R the M5 model on fuzzy logic, the M5 model is based on classification, 1 0.341 0.88 and the RF model is based on ensemble learning under 2 0.341 0.91 supervision. Models require inputs such as area, stream 3 0.225 0.95 length, basin slope, basin height, and return period num- 4 0.223 0.96 ber, which can all be derived from topography. Also, the best combination of inputs belonged to combination 4 with The bold value for combina- a higher correlation coefficient and lower NRMSE. Based tion number indicates the best input combination identified by on the excellent results obtained in estimating peak flow in model evaluation criteria the calibration and verification stages, particularly using the RF model, it is clear that this study is far more effec- As Fig. 6 illustrates performance of the used models in tive than similar studies whose inputs and modeling pro- calibration (twenty-seven stream gauges) and validation (5 cess were incredibly complex. It is evident from Fig. 6 that stream gauges) stages, according to the Taylor diagrams the models used for estimating peak flow had a favorable (Fig. 6), the performance of the RF model is the best. In the performance, especially the RF model, with better accu- return periods of 2, 10, and 1000 years, the M5 model ranks racy in the short-term return period of 2 and 10 years than second after the RF, but in the 100-year return period, the in the long-term return period (100 and 1000 years). ANFIS model ranks second after RF. RFFA makes a relationship between flood frequency and physiographical characteristics of catchments to esti- mate flood in ungagged regions like Rahman et al. (2020). Discussion In this regard, the performances of the RF and M5 tree network as piecewise linear functions, ANFIS and multi- This study aims to provide a relatively simple method to variate regression method were evaluated to estimate flood estimate peak flow amounts in ungauged region based on frequency in the ungagged sub-catchments like Vafakhah their physiographic characteristics. To achieve this, data- and Bozchaloei (2020). driven models of varying natures were used. The MRM Fig. 6 Performance of RF, M5, ANFIS, and MRM in estimating flood frequency in five validation stream gauges at 2,10,100, and 1000 return periods 1 3 139 Page 8 of 11 Applied Water Science (2023) 13:139 A comparison of the correlation coefficient and root- LM number = 2. mean-squared error values indicated an improved perfor- Q = 0.4087 × T + 0.0992 × A—0.4854 × H + 1.0686 × L + mance obtained from the data-driven model compared to 27.7689 × S + 287.7801. traditional methods such as the multivariate regression If H < = 1767.802 and S > 16.041 and H < = 1639.885 (MRM) Method. However, the performance of the RF model and A < = 72.041 and T < = 17.5 and A < = 35.66 then LM is almost similar to the M5 and ANFIS models. number = 3. LM number = 3. Q = 9.915 × T—0.1856 × A—3.4894 × H + 3.4408 × L + Conclusions 19.681 × S + 5136.1149. If H < = 1767.802 and S > 16.041 and H < = 1639.885 Knowing the magnitude of historical floods in a particular and A < = 72.041 and T < = 17.5 and A > 35.66 then LM area is crucial for designing hydraulic structures. Small and number = 4. medium watersheds often lack ground flow measurement LM number = 4. stations due to the costs involved in building and maintain- Q = 8.2118 × T—1.1938 × A—3.4894 × H + 3.4408 × L + ing them. In contrast, hydraulic structures need to be built 19.681 × S + 5144.5592. on rivers in these areas in order to develop civil and agricul- If H < = 1767.802 and S > 16.041 and H < = 1639.885 tural activities. Therefore, the flood discharge design must and A < = 72.041 and T > 17.5 then LM number = 5. be determined. This study used machine learning models to LM number = 5. estimate the peak flow of ungauged watersheds. Q = 1.6026 × T + 0.045 × A—4.3676 × H + 3.4408 × L + The following model performance was found in this 19.681 × S + 6531.1303. study: If H < = 1767.802 and S > 16.041 and H < = 1639.885 The procreated dendriform structure of multi-linear mod- and A > 72.041 and T < = 17.5 then LM number = 6. els utilized in RF and M5 is comprehensible and straightfor- LM number = 6. ward to grasp for decision-makers. It also provides an hon- Q = 21.6586 × T + 0.0392 × A—3.0971 × H + 3.4408 × L est overview of the relationships between the physiographic + 19.681 × S + 5203.2448. characteristics of the watershed; If H < = 1767.802 and S > 16.041 and H < = 1639.885 The RF and M5 model permits to simply create a family and A > 72.041 and T > 17.5 then LM number = 7. of explainable models with a varied number of component LM number = 7. models and thus varied strength and correctness; Q = 2.7201 × T + 0.0339 × A—3.0971 × H + 3.4408 × L + Modeling with the RF and M5 are the fastest data-driven 19.681 × S + 5622.2802. models (proceeding of data with RF and M5 is faster than If H < = 1767.802 and S > 16.041 and H > 1639.885 and ANFIS); L < = 67.812 then LM number = 8. The information encapsulated in RF and M5 algorithms LM number = 8. can potentially assist in variable selection and the evalua- Q = 0.8159 × T—0.0317 × A—1.332 × H + 4.5635 × L + tion of their relationships when processing data with other 19.681 × S + 1703.4591. models. For instance, M5 can aid users in determining the If H < = 1767.802 and S > 16.041 and H > 1639.885 and sensitivity of the data. L > 67.812 and T < = 17.5 and A < = 6591.094 then LM number = 9. LM number = 9. Appendix Q = 10.8508 × T—0.0197 × A—1.332 × H + 4.0254 × L + 19.681 × S + 1833.1274. Regression expressions of the best M5 model If H < = 1767.802 and S > 16.041 and H > 1639.885 and L > 67.812 and T < = 17.5 and A > 6591.094 then LM Here, the linear regressions of the best results of M5 number = 10. modeling. LM number = 10. If H < = 1767.802 and S < = 16.041 and T < = 17.5 then Q = 12.7516 × T—0.0197 × A—1.332 × H + 4.0254 × L + LM number = 1. 19.681 × S + 1851.5772. LM number = 1. If H < = 1767.802 and S > 16.041 and H > 1639.885 and Q = 0.3532 × T + 0.0288 × A—0.4854 × H + 1.0686 × L + L > 67.812 and T > 17.5 then LM number = 11. 27.7689 × S + 323.3353. LM number = 11. If H < = 1767.802 and S < = 16.041 and T > 17.5 then Q = 1.1691 × T—0.0107 × A—1.332 × H + 4.0254 × L LM number = 2. + 19.681 × S + 1967.2823. 1 3 Applied Water Science (2023) 13:139 Page 9 of 11 139 adaptation, distribution and reproduction in any medium or format, If H > 1767.802 and A < = 157.306 t hen LM as long as you give appropriate credit to the original author(s) and the number = 12. source, provide a link to the Creative Commons licence, and indicate LM number = 12. if changes were made. The images or other third party material in this Q = 0.1579 × T + 0.0084 × A—0.3096 × H + 0.4856 × L article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not + 15.8241 × S + 331.4818. included in the article's Creative Commons licence and your intended If H > 1767.802 and A > 157.306 and S < = 20.719 and use is not permitted by statutory regulation or exceeds the permitted T < = 17.5 then LM number = 13. use, you will need to obtain permission directly from the copyright LM number = 13. holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. Q = 0.2033 × T + 0.0109 × A—0.2609 × H + 0.4856 × L + 16.9375 × S + 225.23134818. If H > 1767.802 and A > 157.306 and S < = 20.719 and References T > 17.5 and H < = 1959.692 and A < = 1418.605 then LM number = 14. Adib A, Zaerpour A, Kisi O, Lotfirad M (2021) A rigorous wave- LM number = 14. let-Packet transform to retrieve snow depth from SSMIS data Q = 0.2281 × T + 0.031 × A—0.2609 × H + 0.4856 × L and evaluation of its reliability by uncertainty parameters. + 16.9375 × S + 212.4058. Water Resour Manage 35:2723–2740. https:// doi. org/ 10. 1007/ s11269- 021- 02863-x If H > 1767.802 and A > 157.306 and S < = 20.719 and Adib A, Farajpanah H, Shoushtari MM, Lotfirad M, Saeedpanah I, T > 17.5 and H < = 1959.692 and A > 1418.605 then LM Sasani H (2022) Selection of the best machine learning method number = 15.* for estimation of concentration of different water quality param- LM number = 15. eters. Sustain Water Resour Manag 8(6):172. https://doi. or g/10. 1007/ s40899- 022- 00765-3 Q = 0.2382 × T + 0.031 × A—0.2609 × H + 0.4856 × L Adib A, Kalantarzadeh SSO, Shoushtari MM, Lotfirad M, Liaghat + 16.9375 × S + 218.4885. A, Oulapour M (2023) Sensitive analysis of meteorological data If H > 1767.802 and A > 157.306 and S < = 20.719 and and selecting appropriate machine learning model for estima- T > 17.5 and H > 1959.692 then LM number = 16. tion of reference evapotranspiration. Appl Water Sci 13(3):83. https:// doi. org/ 10. 1007/ s13201- 023- 01895-5 LM number = 16. Ashrafzadeh A, Kişi O, Aghelpour P, Biazar SM, Masouleh MA Q = 0.2111 × T + 0.0109 × A—0.2609 × H + 0.4856 × L (2020) Comparative study of time series models, support vector + 16.9375 × S + 254.9837. machines, and GMDH in forecasting long-term evapotranspira- If H > 1767.802 and A > 157.306 and S > 20.719 and tion rates in northern Iran. J Irrig Drain Eng 146(6):04020010. https:// doi. org/ 10. 1061/ (ASCE) IR. 1943- 4774. 00014 71 T < = 17.5 then LM number = 17. Aziz K, Rahman A, Fang G, Shrestha S (2014) Application of artifi- LM number = 17. cial neural networks in regional flood frequency analysis: a case Q = 0.3026 × T + 0.2288 × A—0.2609 × H + 0.4856 × L study for Australia. Stoch Env Res Risk Assess 28(3):541–554. + 19.3761 × S + 20.518. https:// doi. org/ 10. 1007/ s00477- 013- 0771-5 Aziz K, Haque MM, Rahman A, Shamseldin AY, Shoaib M (2017) If H > 1767.802 and A > 157.306 and S > 20.719 and Flood estimation in ungauged catchments: application of arti- T > 17.5 then LM number = 18. ficial intelligence-based methods for Eastern Australia. Stoch LM number = 18. Env Res Risk Assess 31(6):1499–1514. https://doi. or g/10. 1007/ Q = 0.3336 × T + 0.3017 × A—0.2609 × H + 0.4856 × L s00477- 016- 1272-0 Aziz K, Rahman A, Fang G, Haddad K, & Shrestha S (2010) Design + 19.3761 × S—22.9879. flood estimation for ungauged catchments: application of artificial neural networks for eastern Australia. In: World Environmental and Water Resources Congress 2010: Challenges of Change (pp Authors contributions The authors declare that they have contribution 2841–2850). doi:https:// doi. org/ 10. 1061/ 41114 (371) 293 in the preparation of this manuscript. Biazar SM, Rahmani V, Isazadeh M, Kisi O, Dinpashoh Y (2020) New input selection procedure for machine learning methods in Funding The authors did not receive support from any organization estimating daily global solar radiation. Arab J Geosci 13:1–17. for the submitted work. https:// doi. org/ 10. 1007/ s12517- 020- 05437-0 Deo RC, Şahin M (2016) An extreme learning machine model for the Declarations simulation of monthly mean streamflow water level in eastern Queensland. Environ Monit Assess 188(2):90. https:// doi. org/ Conflict of interest The authors have no conflicts of interest to declare 10. 1007/ s10661- 016- 5094-9 that are relevant to the content of this article. Desai S, Ouarda TB (2021) Regional hydrological frequency analy- sis at ungauged sites with random forest regression. J Hydrol Ethical approval The manuscript is an original work with its own 594:125861. https:// doi. org/ 10. 1016/j. jhydr ol. 2020. 125861 merit, has not been previously published in whole or in part, and is not Esmaeili Gisavandani H (2017) Evaluation of the ability of adap- being considered for publication elsewhere. tive neuro-fuzzy interface system, artificial neural network and regression to regional flood analysis. J Water Soil Conserv 24(3):149–166. https:// doi. org/ 10. 22069/ JWFST. 2017. 11413. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, 1 3 139 Page 10 of 11 Applied Water Science (2023) 13:139 Esmaeili-Gisavandani H, Farajpanah H, Adib A, Kisi O, Riyahi MM, Kisi O, Parmar KS (2016) Application of least square support vector Lotfirad M, Salehpoor J (2022) Evaluating ability of three types of machine and multivariate adaptive regression spline models in discrete wavelet transforms for improving performance of different long term prediction of river water pollution. J Hydrol 534:104– ML models in estimation of daily-suspended sediment load. Arab 112. https:// doi. org/ 10. 1016/j. jhydr ol. 2015. 12. 014 J Geosci 15(1):1–13. https://doi. or g/10. 1007/ s12517- 021- 09282-7 Kumar R, Goel NK, Chatterjee C, Nayak PC (2015) Regional flood Fallah-Mehdipour E, Bozorg-Haddad O, Loáiciga HA (2020) frequency analysis using soft computing techniques. Water Climate-environment-water: integrated and non-integrated Resour Manage 29(6):1965–1978. https:// doi. or g/ 10. 1007/ approaches to reservoir operation. Environ Monit Assess s11269- 015- 0922-1 192(1):60. https:// doi. org/ 10. 1007/ s10661- 019- 8039-2 Latt ZZ (2015) Application of feedforward artificial neural network Farajpanah H, Lotfirad M, Adib A, Esmaeili-Gisavandani H, Kisi Ö, in Muskingum flood routing: a black-box forecasting approach Riyahi MM, Salehpoor J (2020) Ranking of hybrid wavelet-AI for a natural river system. Water Resour Manage 29(14):4995– models by TOPSIS method for estimation of daily flow discharge. 5014. https:// doi. org/ 10. 1007/ s11269- 015- 1100-1 Water Supply 20(8):3156–3171. https://doi. or g/10. 2166/ w s.2020. Latt ZZ, Wittenberg H (2014) Improving flood forecasting in a devel - 211 oping country: a comparative study of stepwise multiple linear Gao W, Shen Q, Zhou Y, Li X (2018) Analysis of flood inundation regression and artificial neural network. Water Resour Manage in ungauged basins based on multi-source remote sensing data. 28(8):2109–2128. https:// doi. org/ 10. 1007/ s11269- 014- 0600-8 Environ Monit Assess 190(3):129. https:// doi. or g/ 10. 1007/ Leclerc M, Ouarda TBMJ (2007) Non stationary regional frequency s10661- 018- 6499-4 analysis at ungaged sites. J Hydrol 343(3):254–265. https://doi. Gheitasi M (2016) Flood frequency analysis of the maximum annual org/ 10. 1016/j. jhydr ol. 2007. 06. 021 discharge of rivers in Lorestan province (case study: Karkheh Leščešen I, Urošev M, Dolinaj D, Pantelić M, Telbisz T, Varga G, watershed in Lorestan province) (Doctoral dissertation, Univer- Milošević D (2019) Regional flood frequency analysis based on sity of Zabol) L-moment approach case study Tisza river basin. Water Resour Ghumman AR, Ahmad S, Hashmi HN (2018) Performance assessment 46(6):853–860. https:// doi. org/ 10. 1134/ S0097 80781 90600 6X of artificial neural networks and support vector regression models Lotfirad M, Adib A, Haghighi A (2018) Estimation of daily runoff for stream flow predictions. Environ Monit Assess 190(12):704. using of the semi-conceptual rainfall-runoff IHACRES model https:// doi. org/ 10. 1007/ s10661- 018- 7012-9 in the Navrood watershed (a watershed in the Gilan province. Golestani M, Kavianpour MR, & Hedayatizade M (2010, November) Iran J Ecohydrol 5(2):449–460 Determination of homogeneous regions case study: South-East Malekinezhad H, Nachtnebel HP, Klik A (2011) Comparing the Urmia Lake Catchment, Iran. In: 2010 2nd International Confer- index-flood and multiple-regression methods using L-moments. ence on Chemical, Biological and Environmental Engineering (pp Phys Chem Earth, Parts a/b/c 36(1–4):54–60. https://doi. or g/10. 71–74). IEEE, doi: https://doi. or g/10. 1109/ ICBEE. 2010. 56489 35 1016/j. pce. 2010. 07. 013 Gris ffi VW, Stedinger JR (2007) The use of GLS regression in regional Merz R, Blöschl G (2008) Flood frequency hydrology: 2. Combin- hydrologic analysis. J Hydrol 344(1):82–95. https:// doi. org/ 10. ing data evidence. Water Resour Res. https:// doi. org/ 10. 1029/ 1016/j. jhydr ol. 2007. 06. 0232007w r0067 44 Heddam S (2014) Modeling hourly dissolved oxygen concentration Quinlan JR (1992, November) Learning with continuous classes. In: (DO) using two different adaptive neuro-fuzzy inference systems 5th Australian joint conference on artificial intelligence (Vol (ANFIS): a comparative study. Environ Monit Assess 186(1):597– 92, pp 343–348) 619. https:// doi. org/ 10. 1007/ s10661- 013- 3402-1 Rahimikhoob A (2014) Comparison between M5 model tree and neu- Holmes MGR, Young AR, Gustard A, Grew R (2002) A region of ral networks for estimating reference evapotranspiration in an influence approach to predicting flow duration curves within arid environment. Water Resour Manage 28(3):657–669. https:// ungauged catchments. Hydrol Earth Syst Sci 6:721–731doi. org/ 10. 1007/ s11269- 013- 0506-x Jafarpour M, Adib A, Lotfirad M (2022) Improving the accuracy Rahman AS, Khan Z, Rahman A (2020) Application of independ- of satellite and reanalysis precipitation data by their ensem- ent component analysis in regional flood frequency analysis: ble usage. Appl Water Sci 12(9):232. h t t p s : / / d o i . o rg / 1 0 . 1 0 0 7 / comparison between quantile regression and parameter regres- s13201- 022- 01750-z sion techniques. J Hydrol 581:124372. https://doi. or g/10. 1016/j. Jahangir MS, Biazar SM, Hah D, Quilty J, Isazadeh M (2022) Investi-jhydr ol. 2019. 124372 gating the impact of input variable selection on daily solar radia- Rao AR, Srinivas VV (2008) Regionalization of watersheds: an tion prediction accuracy using data-driven models: a case study in approach based on cluster analysis. Springer Science and Busi- northern Iran. Stoch Env Res Risk Assess 36(1):225–249. https:// ness Media, Cham doi. org/ 10. 1007/ s00477- 021- 02070-5 Sehgal V, Sahay RR, Chatterjee C (2014) Effect of utilization of Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference sys- discrete wavelet components on flood forecasting perfor- tem. IEEE Trans Syst Man Cybern 23(3):665–685. https:// doi. mance of wavelet based ANFIS models. Water Resour Manage org/ 10. 1109/ 21. 256541 28(6):1733–1749. https:// doi. org/ 10. 1007/ s11269- 014- 0584-4 Jones RM, Liu L, Dorevitch S (2013) Hydrometeorological variables Sharifi Garmdareh E, Vafakhah M, Eslamian SS (2018) Regional predict fecal indicator bacteria densities in freshwater: data- flood frequency analysis using support vector regression in driven methods for variable selection. Environ Monit Assess arid and semi-arid regions of Iran. Hydrol Sci J 63(3):426–440. 185(3):2355–2366. https:// doi. org/ 10. 1007/ s10661- 012- 2716-8https:// doi. org/ 10. 1080/ 02626 667. 2018. 14320 56 Jothiprakash V, Kote AS (2011) Effect of pruning and smoothing while Shokouhifar Y, Lotfirad M, Esmaeili-Gisavandani H, Adib A (2022) using M5 model tree technique for reservoir inflow prediction. J Evaluation of climate change effects on flood frequency in arid Hydrol Eng 16(7):563–574 and semi-arid basins. Water Supply 22(8):6740–6755. https:// Kinnison HB, Colby BR (1945) Flood formulas based on drainage doi. org/ 10. 2166/ ws. 2022. 271 basin characteristics. Trans Am Soc Civ Eng 110(1):849–876 Shu C, Ouarda TB (2012) Improved methods for daily streamflow Kisi O, Kilic Y (2016) An investigation on generalization ability of estimates at ungauged sites. Water Resour Res. https:// doi. org/ artificial neural networks and M5 model tree in modeling refer -10. 1029/ 2011W R0115 01 ence evapotranspiration. Theoret Appl Climatol 126(3–4):413– 425. https:// doi. org/ 10. 1007/ s00704- 015- 1582-z 1 3 Applied Water Science (2023) 13:139 Page 11 of 11 139 Smith A, Sampson C, Bates P (2015) Regional flood frequency analysis approaches. Environ Sci Pollut Res. https:// doi. or g/ 10. 1007/ at the global scale. Water Resour Res 51(1):539–553. https:// doi. s11356- 020- 07802-8 org/ 10. 1002/ 2014W R0158 14 Zalnezhad A, Rahman A, Vafakhah M, Samali B, Ahamed F (2022) Solomatine DP, Xue Y (2004) M5 model trees and neural networks: Regional flood frequency analysis using the FCM-ANFIS algo- application to flood forecasting in the upper reach of the Huai rithm: a case study in South-Eastern Australia. Water 14(10):1608. River in China. J Hydrol Eng 9(6):491–501. https:// doi. org/ 10. https:// doi. org/ 10. 3390/ w1410 1608 1061/ (ASCE) 1084- 0699(2004)9: 6(491) Zaman MA, Rahman A, Haddad K (2012) Regional flood frequency Srinivas VV, Tripathi S, Rao AR, Govindaraju RS (2008) Regional analysis in arid regions: a case study for Australia. J Hydrol flood frequency analysis by combining self-organizing feature 475:74–83. https:// doi. org/ 10. 1016/j. jhydr ol. 2012. 08. 054 map and fuzzy clustering. J Hydrol 348(1–2):148–166. https:// Zamani R, Tabari H, Willems P (2015) Extreme streamflow drought doi. org/ 10. 1016/j. jhydr ol. 2007. 09. 046 in the Karkheh river basin (Iran): probabilistic and regional Vafakhah M, Bozchaloei SK (2020) Regional analysis of flow duration analyses. Nat Hazards 76(1):327–346. https:// doi. org/ 10. 1007/ curves through support vector regression. Water Resour Manage s11069- 014- 1492-x 34(1):283–294. https:// doi. org/ 10. 1007/ s11269- 019- 02445-y Wang L, Kisi O, Zounemat-Kermani M, Zhu Z, Gong W, Niu Z, Liu Publisher's Note Publisher's Note Springer Nature remains neutral Z (2017) Prediction of solar radiation in China using different with regard to jurisdictional claims in published maps and institutional adaptive neuro-fuzzy methods and M5 model tree. Int J Climatol affiliations. 37(3):1141–1155. https:// doi. org/ 10. 1002/ joc. 4762 Zahiri J, Nezaratian H (2020) Estimation of transverse mix- ing coefficient in streams using M5, MARS, GA, and PSO 1 3

Journal

Applied Water ScienceSpringer Journals

Published: Jun 1, 2023

Keywords: Flood frequency; M5; RF; Regression; ANFIS

There are no references for this article.