Test Theory – Test Pappy

Testing, Quality, and my inability to teach

Hi, I’m Patrick and I’m a tester for 18 years now. And I have a problem: I don’t care about testing! I care about Quality! Yet people see and treat me as a tester.

I have to add, that I don’t like testing, as many people see it. And I don’t do testing as most people do it. Most colleagues I have worked with over the past two decades see testing to verify functional correctness and sometimes even conformity with some non-functional requirements, such as load and performance.

My understanding of quality starts where most colleagues understanding ends – when explicit requirements are met. I see quality more like Joseph Juran defined it: “Fitness for use”. And the additions that have been made to Jerry Weinberg’s “Quality is value to someone who matters at some point in time” are very helpful in understanding the flow and continuous urge for adaptation when thinking about quality from my point of view. More on that in a later blog post.

Testing as an activity and my role as tester, especially as test automator, are for me the best means and position to influence a project’s and product’s quality. As long as you don’t put me in the waterfall-ish place after development finished and before delivery/operations and expect me to stay there! I’m usually all over the place, wherever people let me.

I couldn’t care less for approaches like decision matrix, boundary value analysis, path coverage, etc. Probably either because I learned about them in a formal way already in 2004 and have them internalized by now (I never explicitly use them!), or because they are the formal explanations of what I usually call “common sense of a tester”.

Tasks like “Test Analysis”, “Test Design”, “Creating test cases based on requirements” is something against my personal nature. I’m so much driven by the problems in front of me, the context I’m in, the problems that need to be solved, the things I’ve seen, explored and sensed, that coming up with “a full test set” up front, is just not my style of working.

Quality is relative, quality is in a constant flow, quality is highly subjective, quality is everywhere and nowhere, quality can’t be predicted, quality cannot be put in numbers. And that’s why my style of “testing” follows the same behavior. I just cannot reduce my work to writing and executing test cases. I just can’t!

When I have to look into a bugfix, and I see the commit is a one-liner that fixes exactly the problem at hand, probably I have even seen the code in that area before myself, had a short chat with the developer and I came to the conclusion that we are here in what Cynefin describes as the obvious domain, I’m very much fine with closing the issue and not testing it any further. Some colleagues would describe this as “not testing”. For me that is a lot of testing, even if the actual task of creating a test idea/case and executing the code against some predefined scenario never happened.

Don’t get me wrong, I’m not against test documentation and all that, which is probably required and necessary in several contexts. But writing 95% of the documentation upfront, even before any line of code is written, and just adding the fact that I did what is written there, was never for me.

My “testing” is a complex and unpredictable process, that uses a lot of experience, common sense, systems thinking, domain knowledge, and many more aspects. Which is actually causing a big issue for me. I was so far unable to teach other testers, especially the younger generation, to mimic my approaches. Except one. But from a senior role that is sort of expected. Here is an attempt to describe why I am not able to do that.

Being the tester in a team means for me that I support my team to establish and maintain trust in the code we build and deliver, help us optimize the way we work, help to come up with solutions for the problems our clients and we are facing. I try to open up bottlenecks, never be a bottleneck myself, enable the team to act fast, and most of all I help to uncover potential risks, so that we are at least aware of them, talked about them and included mitigations for them, if relevant, in the solution we came up with. I cannot describe what I’m doing any better or more precise than that. Simply because I don’t know where I can help next.

Tomorrow I might:

write test designs, when I have to,
automate some test cases,
improve existing test scripts,
pair up with a developer
refactor the test automation framework,
pimp the Jenkins pipeline,
explore
step in for the product owner
participate actively in refinement meeting, and with actively I mean, I don’t only ask questions for clarification, I also propose actual solutions for the problems at hand,
I might pick up a story to implement,
do a code review
discuss with the business architect
help my tester colleagues
suggest architecture improvements
analyze test failures after the latest pipeline run,
discuss how we can reorganize the team structure to become better
update dependencies
sit in a meeting and just listen
step in for the scrum master
take care of the test infrastructure
or any other task that has to be done to deliver value to the customer, improve the quality of our own working, or just help future me to have a better day in a few weeks/months.

How many of these tasks do you read or expect in actual tester position descriptions. How many of those do you expect to be part of a tester training syllabus. And before the suggestions comes, I don’t see myself as a coach. I’m a hands-on person, I taste my own dog-food, and I want to stand behind things I propose. I lead by example, and hope that others are able to understand what I’m doing, and mimic my behavior to become better. Whenever I see behavior that is worth mimicking, I try to do that.

Quality, value, improvements, reducing waste, and making an impact drive my daily actions. I think, despite the level of Impostor Sydrome I suffer from, that I’m doing a good job, having a big impact on teams. At least that’s the feedback I get sometimes. I don’t even want to teach developers HOW to test. I’m rather good to help developers WANT to test.
But please don’t assign me any rookie and expect me to teach them how to test. In about 17 of 18 cases I will most probably fail miserably.

Something that just came into my mind when reviewing the post a last time:
I try to positively influence systems to heal, maintain, and improve. And as an embedded tester I have the chance to do that from within the system. But what I am doing to achieve that is so much more than just “testing”.

How your personal understanding of “Quality” influences your way of testing

I want to offer you a hypothesis, a proposition that I don’t have proof for. But I believe I’m on the right way.

“Your personal definition of quality influences the way you test.”

Quality is a seven-letter word. I think that’s the only statement that we can commonly agree on. Quality is a very complex matter and I don’t want to go too deep into detail on that this time.

To make my point I’ll base this post on three common defintions of quality, that your personal understanding might be more or less based on.

Q1: “Quality is conformance to requirements.” – Philip B. Crosby

Q2: “Quality is value to someone who matters at some point in time.” – Jerry Weinberg, extended by James Bach, Michael Bolton and Anne-Marie Charett

Q3: “Quality is fitness for use.” – Joseph M. Juran

I want to describe the type of testers that I see behind those definitions. This is based on my mind model, my experience of 16 years and the people I work with and talk to. I know that this number is way too small to be representative. But maybe it is helpful for some, at least it helps me to understand people and their motivations, and their way of working, how projects are set up and so on.

Type “Quality is conformance to requirements.”

Frameworks like iSTQB and PMBoK are based on variations of Q1. And this totally makes sense from their point of view. That way testing is plannable and controllable, and you might even come up with metrics to make quality measurable, based on that definition. It’s a good way to define price tags for testing, which enables schemes like outsourcing testing.
The other definitions would not be able to serve that purpose in a similar way.

Testers with an understanding of quality like Q1 might tend towards test cases and test coverage metrics based on requirements. Waterfall-like approaches (including those covered as Agile) make them feel comfortable and standard test case deduction methods are their daily tools.

Projects and general product development with very specific lists of requirements, standards to adhere to and processes to follow would need a quality understanding like this.
Also people with a background in model-based testing might feel comfortable with this definition.
For concrete implementation projects they need to rely on customers that are able to express their requirements.

Type “Quality is value to someone who matters at some point in time.”

Proper Exploratory Testing in my opinion tends to be more based on definitions similar to Q2. They explore the system under test from different angles, and exercise it based on the findings that they evaluate most important. Test reports should inform decisions and rather tell a story than produce numbers. They are aware that they are not the customer or end user of the system, but they try to resemble them as good as possible.
I could imagine that people who see themself as context-driven testers might have a quality definition based on Q2.

They understand that context matters most and the usefulness can change over time. This type of tester in my opinion is more aware of potential risks and trying to detect potential risks is more important than covering every edge case possible. They also understand that quality is different for different stakeholders and users.

Type “Quality is fitness for use.”

Approaches like observation, monitoring, testing in production, data analytics and alike belong more to quality definitions like Q3. They want to see that the implementation works in the field. Testing before releasing to production is mostly used to minimize risks of massive failure. Carefully releasing software into the wild and rolling back in case of failure is their preferred way of checking code changes.

I’d assume a trend towards incorporating parts of definition Q2 in their understanding of quality.

Background story

I came to this hypothesis recently when I had to change teams in the same context and room from a customizing implementation team, to the product development team. I did not feel too well in the beginning after the change and I wanted to understand why. It’s not the people or the domain. It’s the way of working or rather the definition of quality you need to apply that defines the context. In my case I had to switch from a Q2-context to a Q1-context. Guess, what I prefer.

Summary

I believe that your personal definition of quality is a fundamental piece of the puzzle how you subconsciously work, how you test, how you design test strategies, how you’d set up a testing project and so forth.

Of course people can adapt to their current context and fulfill the requirements of the job, and do it good. But I assume they won’t feel as comfortable as they could. At least I do.

I need your help!

You made it this far, thanks for staying. I need your help! I would like to know if my hypothesis is worth following up on.
If you have a personal definition of quality, maybe it fits roughly to one of the three examples provided. And maybe you are aware of what kind of context you mostly enjoy working in.
Please let me know, if my generalized description above fits to your personal situation or not.
Thank you!

TestPappy on “tacit knowledge”

TL;DR: This article only reflects the highlights of a discussion on tacit knowledge, it will not describe what tacit knowledge is. My key takeaway was, based on the exchanges in this twitter conversation, there didn’t appear to be a shared understanding of ‘tacit knowledge’ by all participants of the discussion.

The concept of tacit and explicit knowledge is an important one to software testing. Testing usually is a highly brain engaging discipline. And making things explicit is part of our every day. Writing test scripts, writing documentation to share knowledge, reporting on our testing, etc. But what about the tacit part, that often stays tacit?

In some discussion happening lately on Twitter there was a disagreement on the nature of tacit knowledge. I thought I have a certain understanding of what tacit knowledge is, but a personal discussion with a good friend left me not so sure anymore. There is explicit knowledge, knowledge that has been spoken, written, or somehow made available for others to consume without interacting with the initial knowledge owner. Then there is knowledge that is not explicit. You might call it “tacit” knowledge. So, a question that came up was, is tacit knowledge “only” very hard to describe or “simply” impossible to describe? Or is it “just” something that was not made explicit yet? That’s what I was trying to learn.

I posted a badly written question with a survey on Twitter the next morning:

What is tacit knowledge?

— Patrick Prill (@TestPappy) June 28, 2016

And it triggered some interesting discussions. I will only quote few examples from the threads out of context, so you rather check them on Twitter yourself, if you are curious.
This will not be my definition of tacit knowledge, I will only summarize some insights and highlights I got from the discussions.

First of all I quickly realized that my question was badly explained and too ambiguous. I got the impression that several people thought that I don’t understand the principle of tacit knowledge. But that helped to trigger some further interesting thoughts.

@testpappy The description on wikipedia is not too bad (I think): https://t.co/UnOcODfgUI

— Stephan Kämper (@S_2K) June 28, 2016

Even if Wikipedia partially describes tacit knowledge as impossible to describe, alternating with “just” very hard to describe, most people participating in the survey agreed rather on “very hard to describe”.

The first (and longest) example used was learning how to ride a bicycle. Actually most of the examples were very technically oriented. The discussion also changed between “teaching” a robot to ride a bike and teaching a person to ride a bike.

@TestPappy @TheTestingMuse @steveo1967 Yes, my context is ride a bike successfully 1st time via any method you choose. Except riding it.

— Stephen Blower (@badbud65) June 28, 2016

@TestPappy Staying with bike's. I'd wager all I have for you to explain it and then get them to ride proficiency 1st time. @steveo1967

— Stephen Blower (@badbud65) June 28, 2016

For Stephen Blower it seemed to be very important that the learning from explicit knowledge is successful on the 1st try.
When I remember how I learned how to ride a bike, I remember a lot of failed attempts to do so. So is explicit knowledge sometimes held to different standards than tacit knowledge?
But even if it is on the 10th try means that it is possible, and maybe only hard to describe or wasn’t described yet.

And there was no agreement on the bicycle example being really a good example. Which for me showed an important thing. Several people that I all highly respect, especially for their deep understanding of many test related topics, were not able to agree on a “simple” example of tacit knowledge.

@TestPappy Answers to all of these questions are in Collins, /Tacit & Explicit Knowledge/. @badbud65

— Michael Bolton (@michaelbolton) June 28, 2016

Several times I got the tip to read Harry Collins’ book “Explicit and tacit knowledge“. But is that just an excuse, because people are not able to explain it in their own words?
For me that raised also the question, is that the only truth about tacit knowledge or are there other explanations existing as well? Maybe contradicting Collins?

@rajeshmathur does the book contain explicit knowledge about tacit knowledge? @MichelePlayfair

— Patrick Prill (@TestPappy) June 28, 2016

I was not happy to get answers with a reference to a book.

A new example came on the scene, using cooking, as in following a recipe (that in my opinion presumes a lot tacit knowledge) or “simply” cooking based on what you know (your tacit knowledge).

.@TestPappy Impossible since it is an amalgam of experiences long forgotten that become part of one's being. Inexpressible, hence tacit.

— Mark Federman (@MarkFederman) June 28, 2016

Now that is a position that was completely new to me. “An experience long forgotten.”

But Mark Federman also came up with alternative reading material on the topic. The concept of “Ba” is very interesting on how knowledge can be managed within organizations and companies. It also describes the spiraling lifecycle how tacit knowledge becomes explicit, to then become tacit knowledge again for other people. It is though not talking about if tacit knowledge is hard or impossible to make explicit. It simply skips that part and assumes that knowledge is transferable from individuals to groups and organizations.

@TestPappy See this for either enlightenment or complete confusion: https://t.co/B37pUlJI65 "Ba" is the key (also to my org theory work)

— Mark Federman (@MarkFederman) June 28, 2016

One aspect that makes a good example for tacit knowledge in my opinion are emotions. Mohinder mentioned them shortly, but nobody picked it up from there.

@Bill_Matthews Observation isn't exact science provide qualitative data only.Emotions are context driven @badbud65 @steveo1967 @TestPappy

— Dr Mohinder Khosla (@mpkhosla) June 28, 2016

Emotions are actually very hard, if not impossible to describe, if you can’t rely on comparison or the other person having had the very same experience. But is that even possible? Is naming the emotion just shallow agreement? Do both really feel the same?
Are emotions an example of something that is impossible to describe in a sufficient way? Or do we tacitly agree on a more or less shallow understanding to make life not too complicated?

In the learning to ride a bike thread, John Stevenson tried to shift the aspect away from the actual riding a bike, but to ride a bike in cultural context.

@badbud65 @TestPappy Now what makes it interesting is context. Ask that robot to say cycle in Holland, then India, then China..

— John Stevenson (@steveo1967) June 28, 2016

@badbud65 @TestPappy Now tacit (cultural, social, customs) come into play. These may not be written down but held internally (tacit)

— John Stevenson (@steveo1967) June 28, 2016

One aspect that was important for me, some things are maybe possible to make explicit, but they are not worth it.

@Bill_Matthews I think a lot of tacit could be made explicit – but is the effort worth the value? @TestPappy @mpkhosla @badbud65

— John Stevenson (@steveo1967) June 28, 2016

My summary: Tacit knowledge seems to be of tacit nature by itself. Very hard to describe, though some, e.g. Collins, have done it. At least to a degree that satisfied many readers. What I can observe on Twitter or in blog posts is, that people boldly use the term tacit knowledge as if it has a well-known meaning. But my impression is rather that it’s shallow agreement. Most people I know are aware of the concept of tacit and explicit knowledge, some read Collins, some read summaries, some read other interpretations, some learned from someone. But my short, and malformed, question and the following discussion showed me clearly, that it seems not be that clear of what it actually is.

I, at least, have learned a lot about tacit knowledge in the course of the discussion. And I want to thank everyone involved for their input.

My 2 cents on metrics in software testing… (Part I)

As always, it depends on your situation and your context. This is my personal view on using metrics. And in this article I want to add information and thoughts about the different metrics that are commonly used in software testing projects, when it comes to measuring the success of Testing Service Providers.

In the past I gained a bit of experience with adding off-shore resources to test teams, working together with an off-shore dev team and using a complete near-shore test team for certain parts of a big integration project.
My new team is supported lately, again, by a couple of testers from a Testing Service Provider located abroad. They were assigned to another project team the first half of the year, now they are back to my team. Since I started working at the company only by January 2013, I had no chance yet to evaluate the quality of their work. So I started to look for methods how to measure external testing sources effectively.

When I skimmed through the “Practical Approach to Software Metrics” by Cem Kaner, I came across some points that I want to summarize with my words.
* We use metrics to gain information, but most metrics are invalid (to some degree) for that purpose.
* We have to learn about strength, weakness and risks of our tools / metrics, to improve them and mitigate risks.
* We need to look for the truth behind numbers.
* We need to use detailed, qualitative analysis to evaluate the validity and credibility of the metrics.
So this will be rough guide for me to evaluate the metrics I found.

And I will never forget the statement, if you measure someone by numbers, the measured will become this number.
And there is a saying in Germany from the electrical engineers, that’s translated like: “who measures, measures crap”, in regards to influencing the system being measured by the measuring itself.

One of the first sources of information I found was this webinar by RBCS about “Measuring Testing Service Providers” that I read about on Twitter. Since I know Rex Black only from the Twitter bashing contests about “ISTQB” and “Best Practices”, my expectations were set to a certain level. But I have to tell you, I got not that disappointed I expected to be.

Since RBCS is a testing service provider itself and coaches about measuring his own kind, that’s a bit like asking the wolf to help protect your sheep barn from wolves. But since the companies that use testing service providers are not interested in sharing their knowledge and experience with the world, all that’s out there is coming from TSPs and consulting agencies. But let’s take a look now at those metrics.

“Find defects”
Measure the count and priority of defects and compare them with the defects found in production. The metric is called “defect detection effectiveness” or “defect detection percentage” (DDP) and I first learned about it nearly a decade ago. This is to evaluate the effectiveness of the different test stages. You simply count all bugs found in all test stages available, including production and you compare the number of defects (usually in addition by priority) of your test stage with the next one. From stage to stage you should find fewer defects. The theory expects good test stages to have a DDP of 85% – 90 % and up.
This is only possible to measure once the product/project is live for a certain time frame (usually 90 days). So you get a result only way after the job has been finished. You only get valid results of your TSP if you outsourced the stage completely or have another way to limit the calculation to the work packages the TSP tested.
And you get valid results only for certain kinds of projects. You need a good base for your production bugs. Do you have many customers, a few or only one? How is the discipline of your customers when it comes to using the defect process? Are your customers telling you about every bug they found? And you will need some time to filter through the bugs to prepare the data for comparison.

“Find important defects”
Of course this is a variant of the “Find defect” metric, focusing on e.g. priorities of critical and high or whatever grades you measure your defects with. So all restrictions of the “Find defect” metric count in here, too. Plus there are the risks, that you might not have the same prioritization, once the project is live. And the TSP might try to rate his found defects higher to increase this metric.

“Cover the Test Basis”
I quote this directly from Rex’s slide: “Engagements should include clearly defined test scope (e.g., requirements, risks, etc.), which is the test basis”, then you might measure the test basis coverage.
That is basically a good idea. But how do you define if a requirement, risk, etc. is covered completely, enough, at all, or to your satisfaction. Without a very good understanding on how to test what item on the list you measure, this metric shows completely nothing. And as always with a metric, every item counts the same. Is that the truth in your projects?
You might have a valid point here if you use the “metrics” used in a report dashboard, like for session based testing. (e.g. as described here)
But you still have to trust the TSP about the degree of coverage or you need a very exact description for every item.

“Report in Time”
This “metric” counts if the regular reports are delivered on time. Now that is nice to evaluate the testing skills of your TSP.
Yes, discipline is important for a TSP. But if the report is on time tells you nothing about the quality of the report nor the quality of testing that the report is about. So, nice add-on, maybe, but not useful for the original purpose.

The next metric is only in, because Rex mentioned this.
“Assign skilled, qualified testers”
And the according metric would be, “percentage qualified testers assigned”. I won’t count that in as a reliable field for installing a metric due to many reasons. Qualifications, resumes, certifications can be trimmed in a certain way and in a certain degree. If you really speak with the people themselves that you will hire, there are always some good in self-marketing and some not so good. But that doesn’t say a thing about their abilities as a tester. And of course there is always a chance, that you end up with another “resource” than you initially hired, because that resource was not available and of course you get someone with the same experience and quality. Right!

“Finish within approved budget”
Well, that could be either a metric or a criterion for finishing the tests. Stop testing, when you’re out of budget. But when it comes down to a metric, even Rex states, that you need a good estimation process and change request process in place. OK, but when you don’t hit the budget, was it the estimation or change request process or was it the performance of the TSP?
And Rex mentioned, of course, the positive return on invest. But why should I meet the budget a 100% to keep it positive, Rex? Is your ROI, however you define that for testing, calculated so tight, that you cannot afford to spend even 10% more without additional benefit?

And now to the surprising part of Rex’s webinar. That’s a method I have already seen in action, but completely forgot about.
“Stakeholder Surveys”
“Meaningful, Actionable Results Reporting” and
“Defect Report Satisfaction”
Now that’s something where I see the aspect of quality measured. You ask the stakeholders and project members about an evaluation of different topics, give them grades like in school, and measure that over time.
If you can keep this on an objective level, and use good facts as reason and examples for your evaluation, that has in my opinion the most value.
Negative aspects about that “metric”. First the objectivity; if some project members don’t like each other or have other personal differences, that will influence the report. Second, using the results for project-political reasons (saw that the last time I participated in the evaluation process). That will falsify your context and with that the value of the metric. And last to name here, it is very intense and time-consuming to make this right. So far from this set of slides.

I know of a metric, that is pretty special, when it comes to measuring TSP.
“Number of test cases executed”
If you use a pre-scripted approach and have a certain number of test cases to execute, that is a well known way to measure your progress. You can split it to priorities if you want, but of course it doesn’t take into account a lot of other things, like size of the test cases, time for execution, and so on. It lacks a bit context. And it tells you nothing about the quality of your TSP.
And who is writing the test cases? Do you have them already and you have experience on the execution of the test set. Great, then you will have a benefit for measuring the TSP, if you take the quality of each and every execution into the context. If the TSP writes the test cases himself or gets even paid per test case, that will be a mass production of stupid test cases, guaranteed.
I remember of a special call for tender for a complex project. The customer wanted to pay per test case and wanted a rough number of planned test cases without giving much information about the infrastructure. Now that is one serious base for estimation and offering.

Lately I found a nice white-paper by Infosys: Realizing Efficiency and Effectiveness in Software Testing
What can be used for testing projects might be adapted to measuring outsorced testing as well. Some of the metrics were already covered by Rex’s slides, so I won’t go through all of them, that I find useful.

The “test progress curve” (S curve), well that’s one nice piece of theory. In 11 years of testing I have never seen a S-curve without faking. The theory behind that is quite simple and understood, but reality is something that does not look like that. So even if you want to measure the test progress with this. Keep in mind, the S-curve won’t stand long. So the difficult task is, where to set the expectation.
But you have to measure the test progress somehow, that’s for sure. So keep in mind, you might find the S in the end, but it won’t be there all the time or not at all.

“Test execution productivity trends” is a metric that I would like to try. Short description from the white paper: “The test execution productivity may be defined as the average no. of test cases executed by the team per unit of time.”
It might fit well into the theory of thread based test management, I have to find out more about that. In case of using pre-scripted test cases, where you might have an experienced basis for execution length, this can be covered pretty good. I think the metric needs to be adapted to every project in a way to normalize the measured values. Not every test case takes the same time to execute. You need to take into account number of found bugs, problems with test environment availability, and simple things like meetings, status reporting and so on. So not a simple task, but you might get some good numbers if you can keep it up.

That’s what I’ve found so far, very disappointing overall.

In my opinion you need to monitor at least progress and quality.
What metric you use depends on your project and the possibilities you have.
What you need to do if the metrics don’t hit the expectation depends on your project and the context why it missed the expectations.

If you’re already measuring your projects with some of the metrics described above and never asked yourself what the numbers tell me. Try an experiment/role play and try to explain the numbers from different positions. Don’t forget to subtract tacit knowledge in your experiment. Now re-evaluate the value of your metrics. If you still think those metrics are good, congratulations. I would like to hear about your successfully used metrics. Either via comment, email or Twitter.

This was only part 1 about this topic. I will try to write more about metrics I found and some of the metrics I tried.

Thinking about a how to get a good regression test set

I used the phrase “regression testing” for about 10 years, never even thinking about the definition of the word regression. I just accepted the term as it was used commonly and frequently in our projects. This was before enjoing a webinar of EuroStar with Michael Bolton. One hour talking about the term regression and regression testing. This webinar changed my tester life.
One thing was, that I began to read blogs and articles about testing, the other thing was, I started to think about many of the terms I used in my daily life and if I used them wrong so far. I tried to challenge some of my colleagues with discussions about those terms that are used every day in our project. The outcome was, that they, too, had not spend much time thinking about those terms and accepting it, as it was.

Coming back to regression and regression testing. In both my old company and my new one regression is used as a synonym for regression testing. Because a common problem seems to be, that people don’t know what regression means. Maybe the reason is, that I am German and in German the word “Regression” is not present in the common vocabulary.

Regression: to regress originating from the latin word regressus, means to go back or as a noun a backward movement.

Wikipedia says under “Software Regression”:
A software regression is a software bug which makes a feature stop functioning as intended after a certain event (for example, a system upgrade, system patching or a change to daylight saving time).

So what is our intention, when we speak of regression testing? To check if the (hopefully) unchanged features are still working as intended.

Instead of looking for a definition of regression testing, I want to use the four different but intersecting concepts, that Michael Bolton offered in the aforementioned webinar:

Any test that we’ve performed before.
A set of automated checks, run periodically and repeatedly.
Testing that we perform after some change.
Testing to probe whether quality has got worse.

When different roles / stakeholders in a project speak about regression testing, which concept or which mixture of concepts do they mean? My tip, ask them. In long grown teams / projects it is interesting to see if all are on the same page or if the definitions vary. If they miss one of the concepts completeley, challenge them by asking about them.

What test cases will be used?
On the one hand we can use any test cases that we’ve performed before. Because their goals should be defined to check some features of the product at test, that we want to re-check. But be careful when reusing test cases that were originally written to test a new requirement / feature / change request. Those test cases might be going to deep into detail and be too time consuming.
If you don’t have a good selection of test cases that you can use for regression testing, what have you done until now? Now is the time to start creating a regression test set.
When automated tests are available, great. Just run them all if possible. In case the features are still working the same as in the last version, all tests have to get the same result as they did on the last run. And yes that means, that we also would expect failed tests to fail the same way. How would you react if a test that failed before is now passing, without knowing of a bug fix for that problem?

Do we have to change the test set every time?
That depends on the reason or event that triggers the execution of the test set. If you change a feature, the related regression test cases have to be reworked. If you prepare for a major release change, what has been changed? Are your test cases general enough to be executed on the new version? If you switch the platform like the web server or the operating system, I would go with the last test set.
In case the software changes, check if those changes affect your test set.

And what is a short regression (test) doing?
I think we all heard from our project managers and steakholders the term, let’s cover this with a short regression (test). And with short, they mean you don’t have much time and budget for this. I will show one of the many choices how to reduce your test set for a short regression test. If that fits to your project or if that fits every time, you have to decide yourself.

I am wondering lately, mostly now that I have to test again, how does a good regression test case look like?
Since it all comes down to money in the end, my main aspects would be these:

check at least all the box features
reasonable in execution time
easy to maintain

For me the answer to this question would be, the test case fits into the big picture of my regression test set strategy. To meet those properties above, I created a strategy to rework the regression test sets of my projects.

My regression test set strategy
Get a good visual of your product at test (use Mindmaps, Visio charts, whatever suits you best). You should find all of your features represented in this picture. Try to make a picture of the specification, as good as you can get with the design techniques at hand.

Create a set of test cases that represent all of your features, also involving roles, use cases and business processes of your customers. Use scenario testing techniques together with claim testing, to cover your complete list of features. Create positive checks of all your features. When performing this list of test cases you get a result if all features are still working. At least for the well-minded users. This test set is in my opinion the smallest test set you should perform, when speaking of regression testing (Short regression test set). If you have to make this even shorter, use risk analysis to skip the not so important functions. But please remember to report, that you skipped it.
Since maintainability and reusability of those test cases is important, keep them at a detail level that is sufficient enough for most testers to understand and high-level enough not to change it every time someone changed a configuration or translation. e.g. if your product or feature is well-documented on the GUI or more or less self-explaining, you can skip most of the details what to do in the test case. You can concentrate on the expectation and goal of the test case rather than on the test steps.

The context-driven part
Now that we have checked that all our features are still working, let’s hunt for some bugs. Nobody said, that you cannot look for new bugs, when performing regression tests.
Use your positive check scenarios and combine them with different assignments (e.g. James Whittaker speaks of tours) to explore and test those features (use function and domain testing techniques or whatever comes to your mind). Not every tester thinks the same, not every tester is capable of the same test techniques. And that’s ok. Because bugs aren’t all the same either. And when it comes down to testing and looking for bugs you should leave your team members a certain degree of freedom what to do to your system. Most testers get new ideas how to break a feature during test execution. And using variation in your techniques is a good way to find new bugs. And in case a tester has no idea, what harm to do next, she can still check old session protocols to get an inspiration.
Take a test case or a subset of test cases and create a variant of them concentrating on different testing techniques to test those features for bugs. When those test cases are performed, you can report, that a search for bugs with certain techniques in this area was performed. Like all test cases, it cannot say, we tested this feature completely and it is bug-free. So this statement is enough for me.
Since you are never finished testing, please don’t discuss if it wouldn’t be better to write down all negative tests performed for a certain feature to repeat them every time again, I disagree. The test case will be too long and unmaintainable in no time. And why should you find more bugs when repeating the same tests over and over again. You can give hints in the test charter what techniques you could use, but this should be the maximum to influence you or other testers. You should try to come up with new ideas every time. Try to learn new techniques and use them in those sessions. If you have a day of no inspiration, use old session protocols, talk with your team members or simply concentrate on positive tests. We all have those days sometimes.
If you have a special technique that you find often bugs with, feel free to use this technique in all of your test executions, mention them in the test charter, but don’t script them in detail. And if this technique really finds bugs always and everywhere, you should have a word with the development lead how to improve the skills of the development team to prevent those bugs in the near future.

Make a list of the last fixed bugs found in production. Retest those bug fixes again. It’s really bad to bring an already fixed bug back into production. Try to retest all of the bugs on your list. Sometimes the reoccurrence of a low priority bug is more noticeable, than a high priority bug in a constellation that happens only now and then, and only a few users would ever see.

So what do we have:

a big picture of the product
matching test cases for verifying the features
a strategy for negative testing (bug hunting)
a couple of bug fixes to retest

This should leave us with good input to report to stakeholders, a big picture to understand the software, a reasonably maintainable test set and an approach for bug hunting which challenges your team, the capability to find new bugs and a reduced risk of reoccurring bugs that were fixed lately.

I don’t know of this is the best solution for my projects, but I know that after some analysis it should be a better solution than the existing one. But it still has to prove this. I’m sure that this strategy is not reworked for the last time. There is a big part of context-driven testing, which is project-appropriate application of skill and judgment, as it is short and easily described on Cem Kaners blog. This also means, that this concept might only fit partially, or even not at all to your situation. Don’t forget to include the expectations of your stakeholders.
My original intention of this article is to get you think also about your usage of the words regression and regression testing and to give some hints how to improve a strategy every now and then, when the context changes or there might be other or newly learned things that might fit better. Maybe my improvement of my project’s test strategy helps you to come to a better strategy for your project.

Update (11.06.2013): I could have saved a lot of time thinking about regression testing and writing this blog post, if I simply came across Iain McCowatt’s blog earlier. He wrote a five part series about regression testing, that is going even deeper. A must-read if you haven’t already done so: Exploring Uncertainty
But at least I now know that I’m not the only one thinking that way.

Comments are welcome!

	Five for Friday… on Software Testing for 20 y…
	Patrick on Software Testing for 20 y…
	Kulsum Siddique on Software Testing for 20 y…
	Patrick on Software Testing for 20 y…
	Rikard Edgren on Software Testing for 20 y…