# Effort Estimation

## Description:

Effort estimation consists in predict how many hours of work and how many workers are needed to develop a project. The effort invested in a software project is probably one of the most important and most analysed variables in recent years in the process of project management. The determination of the value of this variable when initiating software projects allows us to plan adequately any forthcoming activities. As far as estimation and prediction is concerned there is still a number of unsolved problems and errors. To obtain good results it is essential to take into consideration any previous projects. Estimating the effort with a high grade of reliability is a problem which has not yet been solved and even the project manager has to deal with it since the beginning.

Several methods have been used to analyse data, but the reference technique has always been the classic regression method. Therefore, it becomes necessary to use some other techniques that search in the space of non linear relationship. Some works in the field have built up models (through equations) according to the size, which is the factor that affects the cost (effort) of the project the most [Dol00],[KT85]. The equation that relates size and effort can be adjusted due to different environmental factors such as productivity, tools, complexity of the product and other ones. The equations are usually adjusted by the analyst to fit the real data.

From this perspective, different equation patterns have come out [Dol00],[Hu97]. but none of them has produced enough evidence to be considered the definitive cost function, in case there is one. Nevertheless, the characteristic that has to be satisfied by the estimation equation is: the model should be capable of doing its best on estimating reliably the majority of the real values.

It hasn't been possible until now to obtain an equation, set of equations or patterns of equations that can satisfy this premise, and therefore there is no reference of comparison parameter. Then it can be assumed that the equations are not a good tool to obtain an optimum prediction.

Click here to get this description in tex format and here to get the figure in eps format.

## Instances and best known solutions for those instances:

The estimation of the effort invested in the development of software projects can turn into a complicated problem to be solved if the appropriate models are not available. Unfortunately, until this moment this is the situation, since there are not the necessary records in the software development companies. Years of investigation are required in order to obtain the volumes of information needed to carry out a prediction with a good level of reliability and with a low error margin.

The domains are not the most suitable, due to their size and limited number of variables, and because of the fact that they depend on the particular casuistry of each company. The quality of the prediction can improve if more appropriate sets of data are available and more deep study of the methods is performed.

Sets of data are provided bellow. Each set shows information about certain amount of software development projects. For each project, there are two variables: one, (independant variable) that refers to the size of the generated code -measured in lines of code or function points-, and the other (dependant variable) that indicates the effort (time) invested in the development of projects. Columns "Size" and "Effort" show the measure used. Column "Projects" shows the number of projects in the data.

Data Projects Size Effort
Abran 21 function points person-days
Bailey 18 thousands of lines of code man-months
Belady 33 lines of code man-months
Heiat 35 lines of code person-hours
Kitchen 33 funtion points man-months

Here we present some results extracted from [RGH04] and [Dol00]. Some part of the data analysis were done with a tool called WEKA, which includes methods such as: KNN, linear regression, neural networks and K*. The experiments done with the KNN method used a value of 3 and 4 for the constant k (and so are named in the tables as knn-3 and knn4). Neural networks (NN) used the backpropagation algorithm with 20 neurons in one hidden layer and 500 epochs to train. LR represents linear regression, and AR arithmetic regresion. The tools used in [Dol00] were approximation to square, cubic and logarithmic functions (named as "Curve" in the tables above) and genetic programming (GP). In order to meassure the prediction capacity of the methods, two well-known measures have been used: PRED and MMRE.

Level prediction of l (PRED(l)) can be defined as the quotient between the number of cases in which the estimated values are within the absolute limit l of the real values and the total number or cases.

MMRE is the Mean Magnitude of Relative Error. The criteria to consider a model as a good one is that MMRE<0,25.

Table 1: Obtained predictions with 25% of PRED
Method Abran Bailey Belady Heiat Kitchen
Curve 57,14 61,11 33,33 94,29 27,27
GP 77,30 73,70 35,30 94,40 32,40
knn-3 76,19 66,67 33,33 88,57 27,27
knn-4 76,19 66,67 33,33 91,43 39,39
NN 80,95 50,00 12,12 94,29 21,21
LR 66,67 61,11 12,12 91,43 12,12
AR 80,95 72,21 24,24 97,13 59,05
K* 80,95 72,22 90,91 97,14 84,85

Table 2: Mean Magnitude of Relative Error
Method Abran Bailey Belady Heiat Kitchen
Curve 0,2364 0,2935 0,6528 0,0892 0,8458
GP 0,2560 0,2690 0,7100 0,0870 1,1430
knn-3 0,2510 0,2546 0,9275 0,1014 0,6662
knn-4 0,3013 0,2587 1,0298 0,1036 0,7267
NN 0,2178 0,2808 1,1772 0,1080 1,6275
LR 0,2722 0,2665 1,3900 0,1221 1,6900
AR 0,1593 0,2035 1,3534 0,0890 0,4545
K* 0,2511 0,2450 0,2352 0,0998 0,1495

## Related Papers:

[RGH04] M. Rodríguez, I. Galván, J.C. Hernández, P. Isasi, "An Estimate of the Necessary Effort in the Development of Software Projects", Proceedings Workshop on Intelligent Technologies for Software Engineering (WITSE04), pp.309-319.

[Dol00] J.J. Dolado, "A validation of Component-based method for software size estimation", IEEE transactions on software Engineering, 26 (10) (2000), pp.61-72.

[Hu97] Q. Hu, "Evaluating alternative software functions", IEEE transactions on software Engineering, 23 (6) (1997), pp.379-387.

[KT85] B.A. Kitchenham, N.R. Taylor, "Software projects development cost estimation", Journal of Systems and Software, 5 (1985), pp.267-278.