Soil & Water Res., X:X | DOI: 10.17221/128/2025-SWR
Balancing data quality in predictive geochemical mapping using machine learning: A Czech regional case study on topsoil nickelOriginal Paper
- 1 Research Institute for Soil and Water Conservation, Prague, Czech Republic
Machine learning makes geochemical mapping highly adaptable, as its data-driven nature allows predictions to evolve with new information. In this study, topsoil nickel (Ni) data were compiled from various sources, each with different sampling times and analytical methods. To effectively use such imbalanced data into spatial modelling, it was necessary to test how the data uncertainty propagated through the final maps. A comprehensive benchmark of the quantile random forest algorithm was conducted to identify conditions under which the model performs optimally. Predictive maps of topsoil Ni at a 20-metre resolution were subsequently generated and compared using a multi-faceted evaluation strategy. This approach assessed how model adjustments – particularly those addressing the uncertainty introduced by the regression-based conversion of legacy measurements – affected the performance. Extensive benchmarking revealed that while out-of-sample validation showed only modest improvements (e.g., root mean square error (RMSE) reduced from 12.6 to 11.2 mg/kg) when modifying training data, covariates, or algorithm parameters, the resulting prediction grids differed substantially. The analysis also demonstrated that output variability across model scenarios occurred at different spatial scales: weighting approaches had localised effects, whereas high variability in the input data propagated more broadly across the region.
Keywords: data uncertainty; prediction maps; topsoil geochemistry
Received: October 27, 2025; Accepted: March 10, 2026; Prepublished online: April 24, 2026
References
- Arrouays D., McKenzie N., Hempel J., de Forges A.R., McBratney A.B. (2014): Global Soil Map: Basis of the Global Spatial Soil Information System. London, CRC Press.
Go to original source... - Baddeley A., Rubak E., Turner R. (2015): Spatial Point Patterns: Methodology and Applications with R. New York, Chapman & Hall/CRC.
Go to original source... - Behrens T., Schmidt K., MacMillan R.A., Viscarra Rossel R.A. (2018): Multiscale contextual spatial modelling with the Gaussian scale space. Geoderma, 310: 128-137.
Go to original source... - Bergstra J., Bengio Y. (2012): Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13: 281-305.
- Borůvka L., Vašát R., Šrámek V., Neudertová Hellebrandová K., Fadrhonsová V., Sáňka M., Pavlů L., Sáňka O., Vacek O., Němeček K., Nozari S., Oppong Sarkodie V.Y. (2022): Predictors for digital mapping of forest soil organic carbon stocks in different types of landscape. Soil and Water Research, 17: 69-79.
Go to original source... - Breiman L. (2001): Random forests. Machine Learning, 45: 5-32.
Go to original source... - Brunsdon Ch., Fotheringham S., Charlton M. (1998): Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician), 47: 431-443.
Go to original source... - Elvidge C.D., Zhizhin M., Ghosh T., Hsu F.C., Taneja J. (2021): Annual time series of global VIIRS nighttime lights derived from monthly averages: 2012 to 2019. Remote Sensing, 13: 922.
Go to original source... - Gollini I., Lu B., Charlton M., Brunsdon C., Harris P. (2015): GWmodel: An R Package for exploring Spatial Heterogeneity using Geographically Weighted Models. Journal of Statistical Software, 63: 1-50.
Go to original source... - Helfenstein A., Mulder V.L., Heuvelink G.B., Hack-ten Broeke M.J.D. (2024): Three-dimensional space and time mapping reveals soil organic matter decreases across anthropogenic landscapes in the Netherlands. Communications Earth & Environment, 5: 130.
Go to original source... - Hengl T., Nussbaum M., Wright M.N., Heuvelink G.B.M., Graler B. (2018): Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ, 6: e5518.
Go to original source... - Kirkwood C., Cave M., Beamish D., Grebby S., Ferreira A. (2016): A machine learning approach to geochemical mapping. Journal of Geochemical Exploration, 167: 49-61.
Go to original source... - Kuhn M. (2008): Building predictive models in R using the caret package. Journal of Statistical Software, 28: 1-38.
Go to original source... - Malone B., Searle R. (2021): Updating the Australian digital soil texture mapping (Part 2): Spatial modelling of merged field and lab measurements. Soil Research, 59: 435-451.
Go to original source... - McBratney A.B., Mendonça Santos M.L., Minasny B. (2003): On digital soil mapping. Geoderma, 117: 3-52.
Go to original source... - Meinshausen N. (2006): Quantile regression forests. Journal of Machine Learning Research, 7: 983-999.
Go to original source... - Pontius Jr., R.G., Santacruz A. (2023): diffeR: Metrics of Difference for Comparing Pairs of Maps or Pairs of Variables. Available at https://github.com/amsantac/diffeR
- Richer-de-Forges A.C., Chen S., Heuvelink G.B.M., van der Westhuizen S., Orton T.G., Bourennane H., Arrouays D. (2025): Does digital soil mapping prediction performance of soil texture improve when adding uncertain field texture estimates? A study based on clay content. Geoderma, 456: 117277.
Go to original source... - Skála J., Žížala D., Minařík R. (2025): Machine learning for predictive mapping of exceedance probabilities for potentially toxic elements in Czech farmland. Journal of Environmental Management, 380: 125035.
Go to original source...
Go to PubMed... - Vácha R., Sáňka M., Hauptman I., Zimová M., Čechmánková J. (2014): Assessment of limit values of risk elements and persistent organic pollutants in soil for Czech legislation. Plant, Soil and Environment, 60: 191-197.
Go to original source... - van der Westhuizen S., Heuvelink G.B.M., Hofmeyr D.P., Poggio L. (2022): Measurement error-filtered machine learning in digital soil mapping. Spatial Statistics, 47: 100572.
Go to original source... - Wadoux A.M.J.-C., Minasny B., McBratney A.B. (2020): Machine learning for digital soil mapping: Applications, challenges and suggested solutions. Earth-Science Reviews, 210: 103359.
Go to original source... - Wilford J., de Caritat P., Bui E. (2016): Predictive geochemical mapping using environmental correlation. Applied Geochemistry, 66: 275-288.
Go to original source... - Žížala D., Minařík R., Skála J., Beitlerová H., Juřicová A., Rojas J.R., Penížek V., Zádorová T. (2022): High-resolution agriculture soil property maps from digital soil mapping methods, Czech Republic. Catena, 212: 106024.
Go to original source...
This is an open access article distributed under the terms of the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0.), which permits non-comercial use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.

ORCID...