Empirical Software Engineering
Software engineering requires a cycle of model building, experimentation, learning, and re-modeling. Researcher's role is to understand the nature of the processes, products, and their relationship in the context:- They (often) use laboratory settings to observe and manipulate the variables
- What is the effect?, Why is this so?,...
- They need to better understand how to build better systems
- What is the problem?, What are the potential solutions?, What is the cost?, To what extent do they solve the problem?,...
- Something is won with software development - what? why? and how?
- There must be some room for improvements - what? and where?
- A specific decision was taken - why? and how?
Measurement
A measure is a mapping from the attribute of an entity to a measurement value, usually a numerical value to characterize and manipulate the attributes in a formal way. One of the basic characteristics of a measure is therefore that it must preserve the empirical observations of the attribute i.e. if object A is longer than object B, the measure of A must be greater than the measure of B. We must be certain that the measure is valid:- The measure must not violate any necessary properties of the attribute it measures
- It must be a proper mathematical characterization of the attribute
Objective VS Subjective
Direct Measure VS Indirect Measure
Scales
The mapping from an attribute to a measurement value can be made in many different ways. Each different mapping of an attribute is a scale e.g. if the attribute is the length of an object, we can measure it in meters, centimeters or inches, each of which is a different scale of the measure of the length. In some cases a transformation is required to convert the measure from one scale to another. An admissible transformation is also known as rescaling that preserves the relationship among objects. With the measures of the attribute, we make statements about the object or the relation between different objects. If the statements are true even if the measures are rescaled, they are called meaningful, otherwise they are meaningless. There are 4 types of scales used for any measurement calculations:Statistical Tests
Quantitative analysis of a particular set of data requires statistical tests. This type of testing deals with the presentation and numerical processing of the data which may in turn be used to describe and graphically present interesting aspects of the data set. The goal of such type of testing is to learn about the distribution of data, understanding its nature and identifying outliers (abnormal/false data points). Following are some of the types of statistical tests:Parametric Tests
For this test the data should be equally variant and normally distributed. Parametric tests use either interval or ratio scales, requires complete information about the population being tested. The measure of central tendency of such tests are a mean of the population and are applicable only on variables. Following are some of the parametric tests used for empirically evaluating data:
Parametric Tests
|
Purpose
|
Welch’s
T-Test
|
It
is a similar test to 2-sample T-test comparing distributions that estimates
variances and adjusts the degree of freedom to use in a test
|
Dunnett’s
Test Williams Test
|
Instead
of comparing all possible combinations, the test allows us to compare each
group to a reference
|
Permutation
Student’s T-Test
|
It
is a function that deals with the limited floating point precisions and can
bias calculations of p-values based on static distributions of discrete test
|
Jarque-Bera
Test
|
It
tests for the normality of the data and checks whether the sample data have
the kurtosis and the skewness matching a normal distribution
|
Pearson’s
Correlation / Parametric Correlation
|
It
evaluates the association between 2 or more variables by measuring a linear
dependence between the variables. It tests the data that is normally
distributed.
|
Paired
T-Test
|
It is a statistical procedure used to determine whether the mean difference between 2 sets of observations is 0. In a paired sample t-test, each subject or entity is measured twice, resulting in pairs of observations.
|
Levene
Test
|
It is used to assess the equality of variances for a variable calculated for 2 or more groups. This test check whether the variances of the populations from which different samples are drawn are equal.
|
Un-Paired
T-Test
|
It compares the means of 2 unmatched groups, assuming that the values of both groups follow a Gaussian distribution
|
One
Way ANOVA
|
Also known as One Way Analysis of Variance, is used to determine whether there are any statistically significant differences between the means of 2 or more independent (unrelated) groups.
|
Non-Parametric Tests
Non-parametric tests use either ordinal or nominal scale, does not complete information on the population being tested and is applicable to both the variables and the attributes. Such type of tests use median as the measure of central tendency. For this type of test the data should not be normally distributed and not equally variant. Following are some of the non-parametric tests used during empirical evaluation:
Non-Parametric
Tests
|
Purpose
|
Binomial Test
|
It is a method for testing the
null hypothesis on binomial distribution
|
Wicoxon Test / Mann-Whitney U-Test
|
Also known as Wilcoxon
signed-rank test, used to compare 2 related samples, matched samples, or
repeated measurements on a single sample to assess whether their population
mean ranks differ
|
Kolmogorov-Smirnov Test
|
It tests for the sameness of 2
independent samples from a continuous distribution. The function is used as a
test for normality in the variables used as predictors in a regression model
before the fit
|
Adhoc Modification of Original
T-Test
|
Also known as Tukey's test,
Tukey's procedure, Tukey's honestly significat diference test or Tukey' HSD.
It is used to determine which means amongst a set of means differ from
the rest.
|
Discrete Cramer-Von Mises
Goodness-Of-Fit Tests
|
The test is used for a cumulative
distribution function. It is a criterion used for judging the goodness of
fit. It does the same as the Kolmogorov-Smirnov Test but is more powerful
against a large class of alternative hypothesis
|
D’ Agostino
|
It checks for the normality of the
data. Based on the D statistics, it gives an upper and lower critical value
|
F-Test
|
It is most often used when
comparing statistical models that have been fitted to a data set, in order to
identify the model that best fits the population from which the data were
sampled
|
Spearman’s Correlation / Kendall
Tau
|
It evaluates the association
between 2 or more variables using ranks and tests checks that the data is not
normally distributed
|
Bonferroni U-Test
|
It is a method to counteract the
problems of inflated Type I errors while working with multiple pairwise comparisons
between different sub-groups and is similar to Tukey’s Procedure
|
Barttell’s Test
|
It compares the variance of 2 or
more samples in order to determine whether they are drawn from the sample
population with equal variance. The test is however applicable to normally
distributed data
|
Kruskul Wallis Test
|
It is used for comparing two or more independent samples of
equal or different sample sizes.
|
Fligner Killeen Test
|
This test is similar to the Levene
Test and conducts variance analysis to check that the data is not normally
distributed. It also checks whether the variances in each group is the same
|
Brown-Forsythe Test
|
The test checks for homogeneity of
variance
|
It is a function responsible for computing and returning the distance matrix computed using the absolute distance between the 2 vectors to compute the distance between the rows of a data matrix
- Minkowski Distance:
The function is responsible for computing and returning the distance matrix computed using the p norm, the pth path root of the sum of the pth power of the difference of the components to compute the distance between the rows of a data matrix
- Manhattan Matric:
The function is responsible for computing and returning the distance matrix computed using the absolute distance between the 2 vectors to compute the distance between the rows of a data matrix
Examples of the Statistical Tests:
The
examples presented below against each of the tests are taken from the source of
the information related to the test:
Test
|
R-Command
|
Example Code
|
Results
|
Result Analysis
|
Fligner Killeen Test
|
fligner.test(size~location, data=sample.dataframe)
|
Ø size<-c(25,22,28,24,26,24,22,21,23,25,26,30,25,24,21,27,28,23,25,24,20,22,24,23,22,24,20,19,21,22)
Ø location<-c(rep("ForestA",10),
rep("ForestB",10), rep("ForestC",10))
Ø sample.dataframe<-data.frame(size,location)
fligner.test(size~location, data=sample.dataframe)
|
Fligner-Killeen test of
homogeneity of variances Data: size by location Fligner Killeen : med
chi-squared = 0.9556, df = 2, p-value = 0.6201
|
The p-value obtained through
the test shows that the variance are homogeneous
|
Bartlett’s Test
|
bartlett.test(values~groups,
dataset)
|
Ø Attach (PlantGrowth)
Ø bartlett.test(weight~group, PlantGrowth)
|
Bartlett test of homogeneity of
variances
Data: weight by group
Bartlett’s K-squared = 2.8786,
df = 2, p-value = 0.2371
|
The p-value being greater than
0.05 shows that the H0 of the variances being the same for all groups is
true.
|
Binomial Test
|
binom.test(x, n, p = 0.5,
alternative = c("two.sided", "less",
"greater"),\ conf.level = 0.95)
|
Ø Suppose in a
coin tossing, the chance to get a head or tail is 50%. In a real case, we
have 100 coin tossing, and get 48 heads, is our original hypothesis true?
Ø binom.test(48,100)
|
Exact binomial test
data: 48 and 100
number of successes = 48,
number of trials = 100, p-value = 0.7644
alternative hypothesis: true
probability of success is not equal to 0.5. 95 percent confidence interval:
0.3790055 0.5822102
sample estimates:
probability of success 0.48
|
The p-value obtained being
greater than 0.05 shows that the H0 being the probability of getting a head
or a tail is accepted
|
Permutation Student's T-Test
|
perm.t.test(x, y, paired =
FALSE, ...)
|
Ø response <-
c(rnorm(5),rnorm(5,2,1))
Ø fact <-
gl(2,5,labels=LETTERS[1:2])
Ø # Unpaired test
perm.t.test(response~fact,nperm=49)
Ø # Paired test perm.t.test(response~fact,paired=TR
|
|
|
Kolmogorov-Smirnov test
|
ks.test(x,y)
|
Ø x <- c(1,2,2,3,3,3,3,4,5,6)
Ø y <- c(2,3,4,5,5,6,6,6,6,7)
Ø z <-
c(12,13,14,15,15,16,16,16,16,17)
Ø ks.test(x,y)
Ø ks.test(y,z)
Ø ks.test(z,x)
|
|
|
Cramer-von Mises test for
normality
|
cvm.test(x)
|
Ø cvm.test(rnorm(100, mean = 10,
sd = 6))
Ø cvm.test(runif(100, min = 2,
max = 4))
|
|
|
Jarque–Bera test
|
jarqueberaTest(x, title = NULL,
description = NULL)
|
The
function returns the values for the 'W' statistic and the p-value.
|
Jarque–Bera test
|
jarqueberaTest(x, title = NULL,
description = NULL)
|
D'Agostino
|
dagoTest(x, title = NULL,
description = NULL)
|
|
|
|
Manhattan Matrics
|
dist(rbind(x, y), method =
"manhattan")
|
Ø x <- c(0, 0, 1, 1, 1, 1)
Ø y <- c(1, 0, 1, 1, 0, 1)
Ø dist(rbind(x, y), method =
"manhattan")
|
x
y 2
|
The distance between the rows
is 2
|
Minkowski Matrics
|
dist(rbind(x, y), method =
"minkowski")
|
Ø x <- c(0, 0, 1, 1, 1, 1)
Ø y <- c(1, 0, 1, 1, 0, 1)
Ø dist(rbind(x, y), method =
"minkowski")
|
x
y 1.414214
|
The distance between the rows
is 1.41
|
Parametric Correlation
|
cor(x, y, method =
c("pearson", "kendall", "spearman"))
cor.test(x, y,
method=c("pearson", "kendall", "spearman"))
|
Ø res <- cor.test(my_data$wt, my_data$mpg,method=
"pearson")
Ø resres<-cor.test(my_data$wt,
my_data$mpg, method = "pearson")
Ø res
|
Pearson's product-moment
correlation
data: my_data$wt and
my_data$mpg
t = -9.559, df = 30, p-value =
1.294e-10
alternative hypothesis: true
correlation is not equal to 0. 95
percent confidence interval:
-0.9338264 -0.7440872
sample estimates:
cor -0.8676594
|
The p-value of the test is
1.29410^{-10}, which is less than the significance level alpha = 0.05. Thus
wt and mpg are significantly correlated with a correlation coefficient of
-0.87 and p-value of 1.29410^{-10} .
|
Spearman Correlation
|
cor(x, y, method =
c("pearson", "kendall", "spearman"))
cor.test(x, y,
method=c("pearson", "kendall", "spearman"))
|
Ø res2<-cor.test(my_data$wt,
my_data$mpg,method = "spearman")
Ø res2
|
Spearman's rank correlation rho
data: my_data$wt and
my_data$mpg
S = 10292, p-value = 1.488e-11
alternative hypothesis: true
rho is not equal to 0
sample estimates:
rho -0.886422
|
The correlation coefficient
between x and y are -0.8864 and the p-value is 1.48810^{-11}.
|
Welch’s T-Test
|
t.test(x,y)
|
Ø x = rnorm(10)
Ø y = rnorm(10)
Ø t.test(x,y)
|
Welch Two Sample t-test data: x
and y
t = -0.8103, df = 17.277,
p-value = 0.4288 alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
-1.0012220 0.4450895
sample estimates:
mean of x mean of y
0.2216045 0.4996707
|
|
Dunnett's Test
Williams Test
|
test.out = glht(out, linfct =
mcp(ZNGROUP = "Dunnett"))
|
Ø library(multcomp) test.out = glht(out, linfct = mcp(ZNGROUP =
"Dunnett"))
Ø summary(test.out)
|
Multiple Comparisons of Means:
Dunnett Contrasts
Fit: aov(formula = DIVERSTY ~
ZNGROUP, data = d)
Linear Hypotheses:
Estimate Std. Error t value Pr(
> |t|)
2 - 1 == 0 0.23500 0.23303
1.008 0.6195
3 - 1 == 0 -0.07972 0.22647
-0.352 0.9701
4 - 1 == 0 -0.51972 0.22647
-2.295 0.0725 .
---
|
|
References:
[1]
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/dist.html
[2]
http://www.endmemo.com/program/R/binomial.php
[3]
http://math.furman.edu/~dcs/courses/math47/R/library/fBasics/html/015D-OneSampleTests.html
[4]
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ks.test.html
[5]
https://www.rdocumentation.org/packages/RVAideMemoire/versions/0.9-68/topics/perm.t.test
[6]
https://www.rdocumentation.org/packages/dgof/versions/1.2/topics/cvm.test
[7]
https://www.rdocumentation.org/packages/tsoutliers/versions/0.3/topics/jarque.bera.test
[8]
https://www.pinterest.com/APstatistics/chapter-7-sampling-distributions/?lp=true
[9]
http://cw.routledge.com/textbooks/9780415368780/e/CH26box.asp
[10]
http://www.jisponline.com/article.asp?issn=0972-124X;year=2013;volume=17;issue=5;spage=577;epage=582;aulast=Avula
[11]
http://keydifferences.com/difference-between-parametric-and-nonparametric-test.html
[12]
https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-statistical-methods/parametric-nonparametric-tests
[13]
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881615/
[14]
http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r#pearson-correlation-formula
[15]
https://statistics.berkeley.edu/computing/r-t-tests
[16]
https://www.otexts.org/node/687
I find this article useful. Kindly share more such articles so that I can get better insight. Statistical Analysis Services
ReplyDeleteThis blog is very helpful. Thanks for sharing this type of blog with us. Really very happy to say, your post is very interesting to read. I never stop myself from saying something about it. You’re doing a great job. Keep it up and share this kind of good knowledgeable content. I have also gone through the site related to your industry that is studywec.com offers BEng (Hons) Software Engineering
ReplyDeleteInternationalization: Software Engineering and application development is by virtue an international business and therefore isn’t bound by geographical area.BEng(Hons) Software Engineering (Enroll Now)
limited seats available.
hyperion online training
ReplyDeletemsbi training
sharepoint training
Great and that i have a super offer you: How To Properly Renovate A House house repair quotes
ReplyDelete