Software Engineering
Home Planning Requirements Writing Hazard Analysis Requirement Analysis Config Control Software Design Software Testing Software Standards Basic Logic

Basic Logic - Type I and Type II Errors

Just because the Null hypothesis is rejected as an invalid argument does not mean that it cannot be true. Likewise, failure to reject the Null hypothesis does not mean that it cannot be false. Hypothesis testing only ensures a valid argument - it cannot ensure the truth of the premises, therefore we still cannot prove it to be either true or false. Instead, we can only reject or fail to reject the null hypothesis on the basis of the argument being valid or invalid.

There are two conditions where the conclusions drawn from hypothesis testing could be in error:

  1. We rejected H0, but it was actually True (we saw that B = 1, but it was due to something else). This is a Type I error and can happen if the occurrence of B was due to chance or some other stimulus, rather than being caused by A.
    One common cause of Type I errors is having multiple events that can cause B and failing to control those events. If either A or C can cause B, and we fail to control C, then we can get false results.
    Type I errors cause us to mistakenly think that the system has no defect. This is the most serious error that can be made in system testing.
    The chances of making a Type I error are reduced by increasing the level of confidence that the event A and measurement B are within our control, and that B cannot be caused by chance or some other external events other than A (test design). (“After all, it works on my PC!”) Type I errors can be reduced with Prescriptive testing.
     
  2. We failed to reject H0, but it was actually False and should have been rejected (we saw B = 0, but there really is a relationship). This is a Type II error and can happen if the test itself contains a defect.
    For example, we may be mistaken about the expected result. This would be a defect in the test case. We may also have an error in the test environment itself.
    Type II errors cause us to mistakenly believe that the system contains a defect. For example, the test step may be incorrect.
    The chances of making a Type II error are reduced by determining the conditions necessary to produce the reported anomaly, reproducing the anomaly, and conducting additional tests or reviews to determine where the defect lies (in statistics, this would be increasing the sample size). Type II errors can be reduced with Descriptive testing.

Type I errors result is shipping defective product. Type II errors result in longer test times to find out why we had the error.

Conclusion: Type II errors affect the schedule. Type I errors affect product quality.

The range of all possibilities becomes:

A

B

A -> B

Conclusions:

0

0

1

System has defect and the test fails.

0

1

1

System has defect, but the test passes
(Type I error).

1

0

0

System is correct, but the test fails
(Type II error).

1

1

1

System is correct and the test passes.

The two shaded rows are the conditions that we hope to find – that either the system functions correctly or has a defect.

To put this into more formal terms:

The theory A in this case is that the software functions correctly when we do a specific action, whose result is B.

A -> B can be shown false by a single negative observation. That is, if A truely does imply B, and we cause A, and B does not occur, then A does not imply B:

A -> B
B = 0
Therefore, (modus tollens) A = 0. Therefore, the software does not function correctly when we do that specific action. Or...we may have had a Type II error.

However, in any observation of B there are embedded assumptions.
Observing B contains assumptions (P1 AND P2 AND P3 ... Pn)
By DeMorgan's Law, for B = 0 (the inverse):
NOT B = (NOT P1 OR NOT P2 OR NOT P3 ... OR NOT Pn)
Therefore, the failure to observe B may be due to the failure of only one of the underlying assumptions, and B may actually be True! Therefore, it is valid to use another theory to reject an observation if the theory is known to be true.

In software testing, an anomaly review is the most common means of reviewing testing anomalies (Type II errors) and determining if the observations were correct or incorrect.

Conversely, suppose that we did, in fact, observe B = 1.

A -> B
B = 1
Therefore A = 1 (unfortunately, this is an invalid argument!)
Or ... we had a Type I error.

Our theory that A -> B includes a number of assumptions and pre-conditions (like the assumption that we correctly understood the requirements).

A -> B assumes (p1, AND p2 AND p3 ... Pn)
By DeMorgan's Law, the inverse (or Null Hypothesis) results in:
NOT (A -> B) assumes (NOT p1 OR NOT p2 OR NOT p3 ... NOT pn)
So the apparent failure of the theory may be due to the failure of only one underlying assumption, therefore it is impossible to prove the theory false by observing B; and we cannot prove it true by observing B, because that is a formal logical fallacy.

In software testing, Type I errors go undetected, until customers finds them.

Again, we cannot prove that the software contains no defects, we can only demonstrate the existence of a defect.

References: Falsifiability: Wikipedia, the Online Encyclopedia