Search Constraints
1 - 3 of 3
Number of results to display per page
Search Results
-
- Resource Type:
- Article
- Creator:
- Millard, Koreen and Richardson, Murray
- Abstract:
- Random Forest (RF) is a widely used algorithm for classification of remotely sensed data. Through a case study in peatland classification using LiDAR derivatives, we present an analysis of the effects of input data characteristics on RF classifications (including RF out-of-bag error, independent classification accuracy and class proportion error). Training data selection and specific input variables (i.e., image channels) have a large impact on the overall accuracy of the image classification. High-dimension datasets should be reduced so that only uncorrelated important variables are used in classifications. Despite the fact that RF is an ensemble approach, independent error assessments should be used to evaluate RF results, and iterative classifications are recommended to assess the stability of predicted classes. Results are also shown to be highly sensitive to the size of the training data set. In addition to being as large as possible, the training data sets used in RF classification should also be (a) randomly distributed or created in a manner that allows for the class proportions of the training data to be representative of actual class proportions in the landscape; and (b) should have minimal spatial autocorrelation to improve classification results and to mitigate inflated estimates of RF out-of-bag classification accuracy.
- Date Created:
- 2016-05-17
-
- Resource Type:
- Thesis
- Creator:
- Millard, Koreen
- Abstract:
- Peatland ecosystems exhibit a wide range of biophysical conditions and Synthetic Aperture Radar (SAR) remote sensing provides a method to collect information about these conditions across large areas. The overarching purpose of this thesis was to advance understanding of SAR backscatter response to peatland hydrology and develop new approaches for remote mapping and monitoring of peatland environments with SAR. Specifically, this thesis aimed to 1) improve methods for peatland ecosystem mapping and classification accuracy assessment with a Random Forest classifier; and 2) develop methods for surface soil moisture and water table depth retrieval in peatlands using SAR remote sensing data. At Alfred Bog, a peatland in eastern Ontario, Canada, a Random Forest classification workflow was developed and enabled the creation of a site-wide peatland ecosystem map, which was used to better understand the SAR response to hydrological and vegetation conditions. For the retrieval of surface hydrologic information, SAR data were compared with trends in soil moisture, water table and vegetation spatial variability and change over time. Various polarimetric parameters were used to build statistical models of soil moisture and, in some cases, resulted in high explained variance but independent validation indicated that models were over-fit. These results are important, as many examples were found in the literature where, through statistical models, SAR was reported to be a strong predictor of soil moisture but models were not validated. To determine if models could predict soil moisture from SAR at times when no field measured data existed, linear mixed-effects models were built. These accounted for the temporal autocorrelation due to the repeated measures design of field data. While some models resulted in high explained variance, most of the explained variance was attributed to the variability between peatland classes and/or the specific date that the image was acquired, rather than the SAR data itself. Overall, this thesis points to some fundamental limitations on our ability to accurately monitor peatland hydrology with SAR due to the complexity of the scattering response. It highlights a need for extensive field monitoring campaigns and testing to further refine approaches for remote hydrologic monitoring in natural environments.
- Thesis Degree:
- Doctor of Philosophy (Ph.D.)
- Thesis Degree Discipline:
- Geography
- Date Created:
- 2016
-
- Resource Type:
- Article
- Creator:
- Banks, Sarah N., Millard, Koreen, Behnamian, Amir, White, Lori, Richardson, Murray, and Pasher, Jon
- Abstract:
- Random Forests variable importance measures are often used to rank variables by their relevance to a classification problem and subsequently reduce the number of model inputs in high-dimensional data sets, thus increasing computational efficiency. However, as a result of the way that training data and predictor variables are randomly selected for use in constructing each tree and splitting each node, it is also well known that if too few trees are generated, variable importance rankings tend to differ between model runs. In this letter, we characterize the effect of the number of trees (ntree) and class separability on the stability of variable importance rankings and develop a systematic approach to define the number of model runs and/or trees required to achieve stability in variable importance measures. Results demonstrate that both a large ntree for a single model run, or averaged values across multiple model runs with fewer trees, are sufficient for achieving stable mean importance values. While the latter is far more computationally efficient, both the methods tend to lead to the same ranking of variables. Moreover, the optimal number of model runs differs depending on the separability of classes. Recommendations are made to users regarding how to determine the number of model runs and/or trees that are required to achieve stable variable importance rankings.
- Date Created:
- 2017-09-15