Search Constraints
1 - 5 of 5
Number of results to display per page
Search Results
-
- Resource Type:
- Article
- Creator:
- Banks, Sarah N., Millard, Koreen, Behnamian, Amir, White, Lori, Richardson, Murray, and Pasher, Jon
- Abstract:
- Random Forests variable importance measures are often used to rank variables by their relevance to a classification problem and subsequently reduce the number of model inputs in high-dimensional data sets, thus increasing computational efficiency. However, as a result of the way that training data and predictor variables are randomly selected for use in constructing each tree and splitting each node, it is also well known that if too few trees are generated, variable importance rankings tend to differ between model runs. In this letter, we characterize the effect of the number of trees (ntree) and class separability on the stability of variable importance rankings and develop a systematic approach to define the number of model runs and/or trees required to achieve stability in variable importance measures. Results demonstrate that both a large ntree for a single model run, or averaged values across multiple model runs with fewer trees, are sufficient for achieving stable mean importance values. While the latter is far more computationally efficient, both the methods tend to lead to the same ranking of variables. Moreover, the optimal number of model runs differs depending on the separability of classes. Recommendations are made to users regarding how to determine the number of model runs and/or trees that are required to achieve stable variable importance rankings.
- Date Created:
- 2017-09-15
-
- Resource Type:
- Conference Proceeding
- Creator:
- Whitehead, Anthony D.
- Abstract:
- we present a method of segmenting video to detect cuts with accuracy equal to or better than both histogram and other feature based methods. As well, the method is faster than other feature based methods. By utilizing feature tracking on corners, rather than lines, we are able to reliably detect features such as cuts, fades and salient frames. Experimental evidence shows that the method is able to withstand high motion situations better than existing methods. Initial implementations using full sized video frames are able to achieve processing rates of 10-30 frames per second depending on the level of motion and number of features being tracked; this includes the time to generate the MPEG decompressed frames.
- Date Created:
- 2003-01-01
-
- Resource Type:
- Conference Proceeding
- Creator:
- Nayak, Amiya, Du, Jingzhe, and Kranakis, Evangelos
- Abstract:
- We describe a novel Distributed Storage protocol in Disruption (Delay) Tolerant Networks (DTN). Since DTNs can not guarantee the connectivity of the network all the time, distributed data storage and look up has to be performed in a store-and-forward way. In this work, we define local distributed location regions which are called cells to facilitate the data storage and look up process. Nodes in a cell have high probability of moving within their cells. Our protocol resorts to storing data items in cells which have hierarchical structure to reduce routing information storage at nodes. Multiple copies of a data item may be stored at nodes to counter the adverse impact of the nature of DTNs. The cells are relatively stable regions and as a result, data exchange overheads among nodes are reduced. Through experimentation, we show that the proposed distributed storage protocol achieves higher successful data storage ratios with lower delays and limited data item exchange requirements than other protocols in the literature.
- Date Created:
- 2010-08-27
-
- Resource Type:
- Conference Proceeding
- Creator:
- Yanikomeroglu, Halim and Al-Ahmadi, Saad
- Date Created:
- 2009-10-19
-
- Resource Type:
- Conference Proceeding
- Creator:
- Oommen, B. John, Zhan, Justin, and Crisostomo, Johanna
- Abstract:
- Anomaly detection involves identifying observations that deviate from the normal behavior of a system. One of the ways to achieve this is by identifying the phenomena that characterize "normal" observations. Subsequently, based on the characteristics of data learned from the normal observations, new observations are classified as being either normal or not. Most state-of-the-art approaches, especially those which belong to the family parameterized statistical schemes, work under the assumption that the underlying distributions of the observations are stationary. That is, they assume that the distributions that are learned during the training (or learning) phase, though unknown, are not time-varying. They further assume that the same distributions are relevant even as new observations are encountered. Although such a " stationarity" assumption is relevant for many applications, there are some anomaly detection problems where stationarity cannot be assumed. For example, in network monitoring, the patterns which are learned to represent normal behavior may change over time due to several factors such as network infrastructure expansion, new services, growth of user population, etc. Similarly, in meteorology, identifying anomalous temperature patterns involves taking into account seasonal changes of normal observations. Detecting anomalies or outliers under these circumstances introduces several challenges. Indeed, the ability to adapt to changes in non-stationary environments is necessary so that anomalous observations can be identified even with changes in what would otherwise be classified as normal behavior. In this paper, we proposed to apply weak estimation theory for anomaly detection in dynamic environments. In particular, we apply this theory to detect anomaly activities in system calls. Our experimental results demonstrate that our proposal is both feasible and effective for the detection of such anomalous activities.
- Date Created:
- 2012-09-22