Reliability vs Validity

Reliability and validity venn diagram


Quality of software is affected by quality of the processes that create it. Anything that affects the project may also affect the product. This is also true of testing. The better the quality of testing, the better chance of providing quality information to stakeholders. This is why the story of testing forms an important part of the story of quality. Ensuring consistent and accurate testing when employing qualitative research methods is important. This can be achieved through reliability and validity (Kirk and Miller, 1986a) (Robson, 2011a).

Reliability in science is about getting consistent results. It’s important in science that that the same experiment or field study yields the same result every time. However it’s possible to get consistent results without them being correct. Such a result could be a miscalibrated instrument that always gives the same incorrect result. It could also be the same cultural bias affecting a social group so the researcher always gets the same response as an outsider that differs from insiders.

Validity in science is about getting correct results. It’s important in science that any experiment or field study yields a truthful result. However it’s possible to get accurate results only some of the time, resulting in a variation of results. Such a result could be damaged instruments that sometimes give false readings or the same researcher making different observations or assessments within the same contexts.

Reliability and validity can be achieved through diversity. There’s no way to paint a good picture of quality with a single test or type of tests. It’s important that the results are accurate and paint as truthful picture of quality as possible. Not least because, as truth can’t be known completely and because each person is biased, there will always be an incomplete, biased picture of quality. The more diverse the testing, the less chance important problems go unnoticed. Instead, testers apply many different methods, approaches, tools, perspectives or people themselves to the same product, function or requirement as any one of these may end up supporting or refuting claims and hypotheses of quality in some way. Both reliability and validity in testing helps in finding the problems that matter and only the problems that matter, reducing the impact of false positives and negatives leading to incorrect information to stakeholders.

False Positives, Negatives and Wrong Questions

Tick, cross and question mark crossed out


Known as type 1, type 2 and type 3 errors, false positives, false negatives and wrong questions are the three types of issues that threaten reliability and validity of any approach to science and software testing (Kaner, 2010) (Kirk and Miller, 1986b) (Page, Johnston and Rollison, 2009).

False positives are where testers highlight serious risk to stakeholders where no serious risk exists, including playing-up a risk to be greater than it is. Equally an experiment yields a false observation (illusion) leading the tester to believe a problem exists where there is none. This also extends to code assertions reporting fail where where no anticipated software failure has occurred.

Note the testers’ job is to bring serious risk to stakeholder’s attention. It’s not the testers’ job to decide whether a serious risk requires mitigating or not. This may lead to many reported bugs being withdrawn or reclassified by stakeholders. This does not mean the bug was a false positive so long as the tester made a valid case for a threat to value. False positives in this sense are more about misunderstandings, mistakes, applying insufficient context, too little critical thinking or using an ineffective test approach.

False negatives are where testers miss reporting serious risk to stakeholders where serious risk exists, including playing down a risk to be much less serious than it is. Equally an experiment yields a false observation (illusion) leading the tester to believe a no problem exists where there is one. This also extends to code assertions reporting pass where an anticipated software failure has occurred.

Wrong questions are looking for risk in the wrong place, performing the wrong test or writing code assertions that offer little to no value. Even if the test yields a correct result without a false positive or negative, the result itself isn’t of much use and is better off not being asked. Testing often has to be done under strict constraints where only a small amount of testing can be performed. It’s important to make sure the best testing is being done as possible. Ask stakeholders and other team members what is important to test. Focus on major areas or risk, business value or developer uncertainty, and apply heuristics like bug clustering.



Testing should always be performed with serious thought to reliability and validity. Diversity is key to achieving both. Vary approaches, mindsets and testers as much as possible when testing. Test the same thing to ensure results are consistent and accurate to minimise false information being reported to stakeholders.




  • Kaner, C., 2010. BBST Foundations 3A: Oracles [online] TestingEducation (BBST). Available at: Link t.i. 6:27
  • Kirk, J. and Miller, M., 1986. Reliability and Validity in Qualitative Research. 1st ed. Newbury Park: Sage, p.9(a), p. 29(b)
  • Page, A., Johnston, K. and Rollison, B., 2009. How We Test Software at Microsoft. 1st ed. Redmond, WA: Microsoft, p. 221
  • Robson, C., 2011. Real World Research. 3rd ed. Oxford: Wiley, pp. 18-19(a)

Special Thanks

  • Sebastian Stautz (@SebiSolidwork) for corrections and assistance on reliability, validity and diversity in software testing

Leave a Reply

Your email address will not be published. Required fields are marked *

You may also like these