Estimating the number of lines of code of the latest releases of open-source applications using nonlinear regression models
Main Article Content
Abstract
The problem of early estimating the number of lines of code of the latest releases of software, including open-source applications, is important because it directly affects the effort prediction for their development and subsequent improvement. The research aim is to build several regression models for early estimating the number of lines of code of the latest releases of open-source applications. The research object is the process of early estimating the number of lines of code of the latest releases of open-source applications. The research subject is the regression models for early estimating the number of lines of code of the latest releases of open-source applications. For early estimating the number of lines of code of the latest releases of open-source applications, the models, confidence, and prediction intervals of two nonlinear regressions with two predictors were constructed using the Box-Cox three-variate normalizing transformation and specialized techniques. These techniques, relying on multiple nonlinear regression analyses incorporating the use of multivariate normalizing transformations, account for the presence of outliers in multidimensional non-Gaussian data. In the paper, we built two nonlinear regression models for early estimating the number of lines of code of the latest releases of open-source applications that depend on two predictors: both the number of classes of their latest and first releases. The first model has good quality, but the second one can only be used for estimating the conditional mean, and it has poor quality for predicting the response as the dependent random variable. We have compared the quality of two constructed regression models and two linear support vector regressions. The quality of the above models is similar. An analysis has been carried out to compare the constructed models with nonlinear regression models that only depend on the number of classes for their current releases, which were built based on the Box-Cox bivariate transformation. Compared to such nonlinear regression models, the constructed models demonstrate a larger multiple coefficient of determination, a smaller value of the mean magnitude of relative error, a larger percentage of predictions that fall within 25 percent of the actual values, and narrower confidence and prediction intervals. The comparison results indicate better quality of the constructed models with two predictors. The prospects for further research may include the use of other data sets to construct the nonlinear regression models for early estimating the number of lines of code of the latest releases of open-source applications, for other restrictions on predictors.

