TestPappy on “tacit knowledge”

TL;DR: This article only reflects the highlights of a discussion on tacit knowledge, it will not describe what tacit knowledge is. My key takeaway was, based on the exchanges in this twitter conversation, there didn’t appear to be a shared understanding of ‘tacit knowledge’ by all participants of the discussion.

The concept of tacit and explicit knowledge is an important one to software testing. Testing usually is a highly brain engaging discipline. And making things explicit is part of our every day. Writing test scripts, writing documentation to share knowledge, reporting on our testing, etc. But what about the tacit part, that often stays tacit?

In some discussion happening lately on Twitter there was a disagreement on the nature of tacit knowledge. I thought I have a certain understanding of what tacit knowledge is, but a personal discussion with a good friend left me not so sure anymore. There is explicit knowledge, knowledge that has been spoken, written, or somehow made available for others to consume without interacting with the initial knowledge owner. Then there is knowledge that is not explicit. You might call it “tacit” knowledge. So, a question that came up was, is tacit knowledge “only” very hard to describe or “simply” impossible to describe? Or is it “just” something that was not made explicit yet? That’s what I was trying to learn.

I posted a badly written question with a survey on Twitter the next morning:

And it triggered some interesting discussions. I will only quote few examples from the threads out of context, so you rather check them on Twitter yourself, if you are curious.
This will not be my definition of tacit knowledge, I will only summarize some insights and highlights I got from the discussions.

First of all I quickly realized that my question was badly explained and too ambiguous. I got the impression that several people thought that I don’t understand the principle of tacit knowledge. But that helped to trigger some further interesting thoughts.

Even if Wikipedia partially describes tacit knowledge as impossible to describe, alternating with “just” very hard to describe, most people participating in the survey agreed rather on “very hard to describe”.

The first (and longest) example used was learning how to ride a bicycle. Actually most of the examples were very technically oriented. The discussion also changed between “teaching” a robot to ride a bike and teaching a person to ride a bike.

For Stephen Blower it seemed to be very important that the learning from explicit knowledge is successful on the 1st try.
When I remember how I learned how to ride a bike, I remember a lot of failed attempts to do so. So is explicit knowledge sometimes held to different standards than tacit knowledge?
But even if it is on the 10th try means that it is possible, and maybe only hard to describe or wasn’t described yet.

And there was no agreement on the bicycle example being really a good example. Which for me showed an important thing. Several people that I all highly respect, especially for their deep understanding of many test related topics, were not able to agree on a “simple” example of tacit knowledge.

Several times I got the tip to read Harry Collins’ book “Explicit and tacit knowledge“. But is that just an excuse, because people are not able to explain it in their own words?
For me that raised also the question, is that the only truth about tacit knowledge or are there other explanations existing as well? Maybe contradicting Collins?

I was not happy to get answers with a reference to a book.

A new example came on the scene, using cooking, as in following a recipe (that in my opinion presumes a lot tacit knowledge) or “simply” cooking based on what you know (your tacit knowledge).

Now that is a position that was completely new to me. “An experience long forgotten.”

But Mark Federman also came up with alternative reading material on the topic. The concept of “Ba” is very interesting on how knowledge can be managed within organizations and companies. It also describes the spiraling lifecycle how tacit knowledge becomes explicit, to then become tacit knowledge again for other people. It is though not talking about if tacit knowledge is hard or impossible to make explicit. It simply skips that part and assumes that knowledge is transferable from individuals to groups and organizations.

One aspect that makes a good example for tacit knowledge in my opinion are emotions. Mohinder mentioned them shortly, but nobody picked it up from there.

Emotions are actually very hard, if not impossible to describe, if you can’t rely on comparison or the other person having had the very same experience. But is that even possible? Is naming the emotion just shallow agreement? Do both really feel the same?
Are emotions an example of something that is impossible to describe in a sufficient way? Or do we tacitly agree on a more or less shallow understanding to make life not too complicated?

In the learning to ride a bike thread, John Stevenson tried to shift the aspect away from the actual riding a bike, but to ride a bike in cultural context.

One aspect that was important for me, some things are maybe possible to make explicit, but they are not worth it.

My summary: Tacit knowledge seems to be of tacit nature by itself. Very hard to describe, though some, e.g. Collins, have done it. At least to a degree that satisfied many readers. What I can observe on Twitter or in blog posts is, that people boldly use the term tacit knowledge as if it has a well-known meaning. But my impression is rather that it’s shallow agreement. Most people I know are aware of the concept of tacit and explicit knowledge, some read Collins, some read summaries, some read other interpretations, some learned from someone. But my short, and malformed, question and the following discussion showed me clearly, that it seems not be that clear of what it actually is.

I, at least, have learned a lot about tacit knowledge in the course of the discussion. And I want to thank everyone involved for their input.

 

Advertisements

My 2 cents on metrics in software testing… (Part I)

As always, it depends on your situation and your context. This is my personal view on using metrics. And in this article I want to add information and thoughts about the different metrics that are commonly used in software testing projects, when it comes to measuring the success of Testing Service Providers.

In the past I gained a bit of experience with adding off-shore resources to test teams, working together with an off-shore dev team and using a complete near-shore test team for certain parts of a big integration project.
My new team is supported lately, again, by a couple of testers from a Testing Service Provider located abroad. They were assigned to another project team the first half of the year, now they are back to my team. Since I started working at the company only by January 2013, I had no chance yet to evaluate the quality of their work. So I started to look for methods how to measure external testing sources effectively.

When I skimmed through the “Practical Approach to Software Metrics” by Cem Kaner, I came across some points that I want to summarize with my words.
* We use metrics to gain information, but most metrics are invalid (to some degree) for that purpose.
* We have to learn about strength, weakness and risks of our tools / metrics, to improve them and mitigate risks.
* We need to look for the truth behind numbers.
* We need to use detailed, qualitative analysis to evaluate the validity and credibility of the metrics.
So this will be rough guide for me to evaluate the metrics I found.

And I will never forget the statement, if you measure someone by numbers, the measured will become this number.
And there is a saying in Germany from the electrical engineers, that’s translated like: “who measures, measures crap”, in regards to influencing the system being measured by the measuring itself.

One of the first sources of information I found was this webinar by RBCS about “Measuring Testing Service Providers” that I read about on Twitter. Since I know Rex Black only from the Twitter bashing contests about “ISTQB” and “Best Practices”, my expectations were set to a certain level. But I have to tell you, I got not that disappointed I expected to be.

Since RBCS is a testing service provider itself and coaches about measuring his own kind, that’s a bit like asking the wolf to help protect your sheep barn from wolves. But since the companies that use testing service providers are not interested in sharing their knowledge and experience with the world, all that’s out there is coming from TSPs and consulting agencies. But let’s take a look now at those metrics.

“Find defects”
Measure the count and priority of defects and compare them with the defects found in production. The metric is called “defect detection effectiveness” or “defect detection percentage” (DDP) and I first learned about it nearly a decade ago. This is to evaluate the effectiveness of the different test stages. You simply count all bugs found in all test stages available, including production and you compare the number of defects (usually in addition by priority) of your test stage with the next one. From stage to stage you should find fewer defects. The theory expects good test stages to have a DDP of 85% – 90 % and up.
This is only possible to measure once the product/project is live for a certain time frame (usually 90 days). So you get a result only way after the job has been finished. You only get valid results of your TSP if you outsourced the stage completely or have another way to limit the calculation to the work packages the TSP tested.
And you get valid results only for certain kinds of projects. You need a good base for your production bugs. Do you have many customers, a few or only one? How is the discipline of your customers when it comes to using the defect process? Are your customers telling you about every bug they found? And you will need some time to filter through the bugs to prepare the data for comparison.

“Find important defects”
Of course this is a variant of the “Find defect” metric, focusing on e.g. priorities of critical and high or whatever grades you measure your defects with. So all restrictions of the “Find defect” metric count in here, too. Plus there are the risks, that you might not have the same prioritization, once the project is live. And the TSP might try to rate his found defects higher to increase this metric.

“Cover the Test Basis”
I quote this directly from Rex’s slide: “Engagements should include clearly defined test scope (e.g., requirements, risks, etc.), which is the test basis”, then you might measure the test basis coverage.
That is basically a good idea. But how do you define if a requirement, risk, etc. is covered completely, enough, at all, or to your satisfaction. Without a very good understanding on how to test what item on the list you measure, this metric shows completely nothing. And as always with a metric, every item counts the same. Is that the truth in your projects?
You might have a valid point here if you use the “metrics” used in a report dashboard, like for session based testing. (e.g. as described here)
But you still have to trust the TSP about the degree of coverage or you need a very exact description for every item.

“Report in Time”
This “metric” counts if the regular reports are delivered on time. Now that is nice to evaluate the testing skills of your TSP.
Yes, discipline is important for a TSP. But if the report is on time tells you nothing about the quality of the report nor the quality of testing that the report is about. So, nice add-on, maybe, but not useful for the original purpose.

The next metric is only in, because Rex mentioned this.
“Assign skilled, qualified testers”
And the according metric would be, “percentage qualified testers assigned”. I won’t count that in as a reliable field for installing a metric due to many reasons. Qualifications, resumes, certifications can be trimmed in a certain way and in a certain degree. If you really speak with the people themselves that you will hire, there are always some good in self-marketing and some not so good. But that doesn’t say a thing about their abilities as a tester. And of course there is always a chance, that you end up with another “resource” than you initially hired, because that resource was not available and of course you get someone with the same experience and quality. Right!

“Finish within approved budget”
Well, that could be either a metric or a criterion for finishing the tests. Stop testing, when you’re out of budget. But when it comes down to a metric, even Rex states, that you need a good estimation process and change request process in place. OK, but when you don’t hit the budget, was it the estimation or change request process or was it the performance of the TSP?
And Rex mentioned, of course, the positive return on invest. But why should I meet the budget a 100% to keep it positive, Rex? Is your ROI, however you define that for testing, calculated so tight, that you cannot afford to spend even 10% more without additional benefit?

And now to the surprising part of Rex’s webinar. That’s a method I have already seen in action, but completely forgot about.
“Stakeholder Surveys”
“Meaningful, Actionable Results Reporting” and
“Defect Report Satisfaction”
Now that’s something where I see the aspect of quality measured. You ask the stakeholders and project members about an evaluation of different topics, give them grades like in school, and measure that over time.
If you can keep this on an objective level, and use good facts as reason and examples for your evaluation, that has in my opinion the most value.
Negative aspects about that “metric”. First the objectivity; if some project members don’t like each other or have other personal differences, that will influence the report. Second, using the results for project-political reasons (saw that the last time I participated in the evaluation process). That will falsify your context and with that the value of the metric. And last to name here, it is very intense and time-consuming to make this right. So far from this set of slides.

I know of a metric, that is pretty special, when it comes to measuring TSP.
“Number of test cases executed”
If you use a pre-scripted approach and have a certain number of test cases to execute, that is a well known way to measure your progress. You can split it to priorities if you want, but of course it doesn’t take into account a lot of other things, like size of the test cases, time for execution, and so on. It lacks a bit context. And it tells you nothing about the quality of your TSP.
And who is writing the test cases? Do you have them already and you have experience on the execution of the test set. Great, then you will have a benefit for measuring the TSP, if you take the quality of each and every execution into the context. If the TSP writes the test cases himself or gets even paid per test case, that will be a mass production of stupid test cases, guaranteed.
I remember of a special call for tender for a complex project. The customer wanted to pay per test case and wanted a rough number of planned test cases without giving much information about the infrastructure. Now that is one serious base for estimation and offering.

Lately I found a nice white-paper by Infosys: Realizing Efficiency and Effectiveness in Software Testing
What can be used for testing projects might be adapted to measuring outsorced testing as well. Some of the metrics were already covered by Rex’s slides, so I won’t go through all of them, that I find useful.

The “test progress curve” (S curve), well that’s one nice piece of theory. In 11 years of testing I have never seen a S-curve without faking. The theory behind that is quite simple and understood, but reality is something that does not look like that. So even if you want to measure the test progress with this. Keep in mind, the S-curve won’t stand long. So the difficult task is, where to set the expectation.
But you have to measure the test progress somehow, that’s for sure. So keep in mind, you might find the S in the end, but it won’t be there all the time or not at all.

“Test execution productivity trends” is a metric that I would like to try. Short description from the white paper: “The test execution productivity may be defined as the average no. of test cases executed by the team per unit of time.”
It might fit well into the theory of thread based test management, I have to find out more about that. In case of using pre-scripted test cases, where you might have an experienced basis for execution length, this can be covered pretty good. I think the metric needs to be adapted to every project in a way to normalize the measured values. Not every test case takes the same time to execute. You need to take into account number of found bugs, problems with test environment availability, and simple things like meetings, status reporting and so on. So not a simple task, but you might get some good numbers if you can keep it up.

That’s what I’ve found so far, very disappointing overall.

In my opinion you need to monitor at least progress and quality.
What metric you use depends on your project and the possibilities you have.
What you need to do if the metrics don’t hit the expectation depends on your project and the context why it missed the expectations.

If you’re already measuring your projects with some of the metrics described above and never asked yourself what the numbers tell me. Try an experiment/role play and try to explain the numbers from different positions. Don’t forget to subtract tacit knowledge in your experiment. Now re-evaluate the value of your metrics. If you still think those metrics are good, congratulations. I would like to hear about your successfully used metrics. Either via comment, email or Twitter.

This was only part 1 about this topic. I will try to write more about metrics I found and some of the metrics I tried.

Thinking about a how to get a good regression test set

I used the phrase “regression testing” for about 10 years, never even thinking about the definition of the word regression. I just accepted the term as it was used commonly and frequently in our projects. This was before enjoing a webinar of EuroStar with Michael Bolton. One hour talking about the term regression and regression testing. This webinar changed my tester life.
One thing was, that I began to read blogs and articles about testing, the other thing was, I started to think about many of the terms I used in my daily life and if I used them wrong so far. I tried to challenge some of my colleagues with discussions about those terms that are used every day in our project. The outcome was, that they, too, had not spend much time thinking about those terms and accepting it, as it was.

Coming back to regression and regression testing. In both my old company and my new one regression is used as a synonym for regression testing. Because a common problem seems to be, that people don’t know what regression means. Maybe the reason is, that I am German and in German the word “Regression” is not present in the common vocabulary.

Regression: to regress originating from the latin word regressus, means to go back or as a noun a backward movement.

Wikipedia says under “Software Regression”:
A software regression is a software bug which makes a feature stop functioning as intended after a certain event (for example, a system upgrade, system patching or a change to daylight saving time).

So what is our intention, when we speak of regression testing? To check if the (hopefully) unchanged features are still working as intended.

Instead of looking for a definition of regression testing, I want to use the four different but intersecting concepts, that Michael Bolton offered in the aforementioned webinar:

  • Any test that we’ve performed before.
  • A set of automated checks, run periodically and repeatedly.
  • Testing that we perform after some change.
  • Testing to probe whether quality has got worse.

When different roles / stakeholders in a project speak about regression testing, which concept or which mixture of concepts do they mean? My tip, ask them. In long grown teams / projects it is interesting to see if all are on the same page or if the definitions vary. If they miss one of the concepts completeley, challenge them by asking about them.

What test cases will be used?
On the one hand we can use any test cases that we’ve performed before. Because their goals should be defined to check some features of the product at test, that we want to re-check. But be careful when reusing test cases that were originally written to test a new requirement / feature / change request. Those test cases might be going to deep into detail and be too time consuming.
If you don’t have a good selection of test cases that you can use for regression testing, what have you done until now? Now is the time to start creating a regression test set.
When automated tests are available, great. Just run them all if possible. In case the features are still working the same as in the last version, all tests have to get the same result as they did on the last run. And yes that means, that we also would expect failed tests to fail the same way. How would you react if a test that failed before is now passing, without knowing of a bug fix for that problem?

Do we have to change the test set every time?
That depends on the reason or event that triggers the execution of the test set. If you change a feature, the related regression test cases have to be reworked. If you prepare for a major release change, what has been changed? Are your test cases general enough to be executed on the new version? If you switch the platform like the web server or the operating system, I would go with the last test set.
In case the software changes, check if those changes affect your test set.

And what is a short regression (test) doing?
I think we all heard from our project managers and steakholders the term, let’s cover this with a short regression (test). And with short, they mean you don’t have much time and budget for this. I will show one of the many choices how to reduce your test set for a short regression test. If that fits to your project or if that fits every time, you have to decide yourself.

I am wondering lately, mostly now that I have to test again, how does a good regression test case look like?
Since it all comes down to money in the end, my main aspects would be these:

  • check at least all the box features
  • reasonable in execution time
  • easy to maintain

For me the answer to this question would be, the test case fits into the big picture of my regression test set strategy. To meet those properties above, I created a strategy to rework the regression test sets of my projects.

My regression test set strategy
Get a good visual of your product at test (use Mindmaps, Visio charts, whatever suits you best). You should find all of your features represented in this picture. Try to make a picture of the specification, as good as you can get with the design techniques at hand.

Create a set of test cases that represent all of your features, also involving roles, use cases and business processes of your customers. Use scenario testing techniques together with claim testing, to cover your complete list of features. Create positive checks of all your features. When performing this list of test cases you get a result if all features are still working. At least for the well-minded users. This test set is in my opinion the smallest test set you should perform, when speaking of regression testing (Short regression test set). If you have to make this even shorter, use risk analysis to skip the not so important functions. But please remember to report, that you skipped it.
Since maintainability and reusability of those test cases is important, keep them at a detail level that is sufficient enough for most testers to understand and high-level enough not to change it every time someone changed a configuration or translation. e.g. if your product or feature is well-documented on the GUI or more or less self-explaining, you can skip most of the details what to do in the test case. You can concentrate on the expectation and goal of the test case rather than on the test steps.

The context-driven part
Now that we have checked that all our features are still working, let’s hunt for some bugs. Nobody said, that you cannot look for new bugs, when performing regression tests.
Use your positive check scenarios and combine them with different assignments (e.g. James Whittaker speaks of tours) to explore and test those features (use function and domain testing techniques or whatever comes to your mind). Not every tester thinks the same, not every tester is capable of the same test techniques. And that’s ok. Because bugs aren’t all the same either. And when it comes down to testing and looking for bugs you should leave your team members a certain degree of freedom what to do to your system. Most testers get new ideas how to break a feature during test execution. And using variation in your techniques is a good way to find new bugs. And in case a tester has no idea, what harm to do next, she can still check old session protocols to get an inspiration.
Take a test case or a subset of test cases and create a variant of them concentrating on different testing techniques to test those features for bugs. When those test cases are performed, you can report, that a search for bugs with certain techniques in this area was performed. Like all test cases, it cannot say, we tested this feature completely and it is bug-free. So this statement is enough for me.
Since you are never finished testing, please don’t discuss if it wouldn’t be better to write down all negative tests performed for a certain feature to repeat them every time again, I disagree. The test case will be too long and unmaintainable in no time. And why should you find more bugs when repeating the same tests over and over again. You can give hints in the test charter what techniques you could use, but this should be the maximum to influence you or other testers. You should try to come up with new ideas every time. Try to learn new techniques and use them in those sessions. If you have a day of no inspiration, use old session protocols, talk with your team members or simply concentrate on positive tests. We all have those days sometimes.
If you have a special technique that you find often bugs with, feel free to use this technique in all of your test executions, mention them in the test charter, but don’t script them in detail. And if this technique really finds bugs always and everywhere, you should have a word with the development lead how to improve the skills of the development team to prevent those bugs in the near future.

Make a list of the last fixed bugs found in production. Retest those bug fixes again. It’s really bad to bring an already fixed bug back into production. Try to retest all of the bugs on your list. Sometimes the reoccurrence of a low priority bug is more noticeable, than a high priority bug in a constellation that happens only now and then, and only a few users would ever see.

So what do we have:

  • a big picture of the product
  • matching test cases for verifying the features
  • a strategy for negative testing (bug hunting)
  • a couple of bug fixes to retest

This should leave us with good input to report to stakeholders, a big picture to understand the software, a reasonably maintainable test set and an approach for bug hunting which challenges your team, the capability to find new bugs and a reduced risk of reoccurring bugs that were fixed lately.

I don’t know of this is the best solution for my projects, but I know that after some analysis it should be a better solution than the existing one. But it still has to prove this. I’m sure that this strategy is not reworked for the last time. There is a big part of context-driven testing, which is project-appropriate application of skill and judgment, as it is short and easily described on Cem Kaners blog. This also means, that this concept might only fit partially, or even not at all to your situation. Don’t forget to include the expectations of your stakeholders.
My original intention of this article is to get you think also about your usage of the words regression and regression testing and to give some hints how to improve a strategy every now and then, when the context changes or there might be other or newly learned things that might fit better. Maybe my improvement of my project’s test strategy helps you to come to a better strategy for your project.

Update (11.06.2013): I could have saved a lot of time thinking about regression testing and writing this blog post, if I simply came across Iain McCowatt’s blog earlier. He wrote a five part series about regression testing, that is going even deeper. A must-read if you haven’t already done so: Exploring Uncertainty
But at least I now know that I’m not the only one thinking that way.

Comments are welcome!