Monitor your personal “test idea creation” process

At my job it was very busy the last year. Unfortunately that meant I had not much time to take good care of my team members when it comes to coaching and mentoring. One of the things that I am interested most in my colleagues is the way how they think while testing. How does their model look like, how do they approach the situation. If I know about that, I can adapt my language and my “public” model, I can help them to understand things better and faster. And that also gives me the opportunity to know what articles, books, webinars or courses to push forward to them to help improve their skills.

Lately I had some spare hours and I tried to improve their testing skills a bit further. My intention was that they actively think about the actions they take when they analyse a requirement document and create a test plan and strategy for approaching and testing that requirement. So I gave them the following instructions:

When you read and transfer an information from the BRD (business requirement document) to a test idea, try to find out some answers to the following questions for yourself:

  • how you came up with this idea?
  • why you came up with this idea?
  • was this the first thing that popped up?
  • are there other things to mention (ideas, information, etc.)?
  • will I write down other ideas as well?
    • if not, why not?
    • if yes, all of them?
      • if not, why not?
  • where does this thing fit in the existing picture/model?
  • what picture/model you ask?
    • if there is no picture/model, why not? Are you sure?
  • are there questions you want to ask?
    • to whom?
  • are there sources you want to use for more test ideas?
    • why did you not already use them?

Don’t answer them to me in the first place, answer them to yourself. Answer them for every single idea you come up with. If you start giving the same answers over and over again. Why is that? Would you like to change something? If yes, where and why? There is a good reason for all of those questions, so please take them serious and take your time to think about them.

I can give you several reasons for all of the above questions, can you too?

I know that this list of questions is not even close to complete. The intention was to start an active thinking process, what they are doing when and WHY. That way they might enable themselves to a more structured approach and get ideas where to improve themselves or find opportunities, where I can help them to improve or learn new things.

Advertisement

Bugs are like fruits

Just a short piece on a metapher I like to use, when someone tries to explain me how important bug metrics are.

“Bugs are like a piece of fruit!” But, every bug is a different piece of fruit, of different size, different degree of ripeness, and different amount of calories. Now give that fruit to a developer. How many fruits can a developer eat (fix bugs), before he is stuffed (end of the day). Do you know how the developer is eating the piece of fruit? Why not take a single grape, wash it, slice it up, remove the seeds, enjoy every bite and clean up afterwards. Or slice a watermelon, and eat a couple of bites of every slice in the same time. Leaving lots of water melon still to eat. Which means, the bug is not fully fixed, maybe to a certain acceptable degree, or not. So back to the slices and have more water melon. If the fruit is not ripe yet (bug not fully described), it needs to ripe some more. What if the developer grabs a bite, finds out he doesn’t like the taste and gives it to some other developer.

So, why do you count melons and apples and grapes with the same numbers and compare them. Or you try to get any additional information out of those numbers.

Without more information on the sort of bug, how long did it take to fix, what was the root cause, did the retest fail – maybe more than once. You just know there was a box full of fruit that development had to eat.

Just my 2 cents, why basic bug metrics are of no good use without additional information or even the story behind them.

 

Judging the Software Testing World Cup Preliminaries for Europe – Part 2

We are coming closer to announcing the winner. So here is my report of the hard part for the judges. Read here about the first part: the event.

The event itself went rather well. On Saturday around midnight Maik was finished with retrieving the results. So Sunday morning I had my first 28 Teams assigned for rating. I first retrieved all reports and prepared a sheet and spread them on all my devices, so I could do some judging whenever and wherever time allowed.

First I looked shortly through all 28 reports to see what expects me. I had set some expectations from my own experience and there were of course some guidelines for rating already there. Jackie sent a nice summary from her first round through her reports, what she liked and not liked. I fully agreed and had only one detail to add. So I knew I was on the right way for my expectations and that the other judges had a similar view.

The rating of the reports was very interesting. At first I thought I could do a fair rating by simply reading and rating. But details I saw later on reports I found essential, were missing on already rated. So back to start, preparing a spreadsheet with all essential things I want to see and off we went. But the first round of rating was rather good and fair. I had only to make two minor corrections, but with adding the info to a sheet it was easier to rate and better for future ratings. Because 28 won’t be the end.

The quality of the reports was widely spread. From some barely filled documents to 8 page reports I had a good selection of everything. My summary of the test reports from the first 28 teams: there was some good work there. I was not highly impressed nor really blown away, but I would not expect that from such small teams in 3 hours with no preparation of the SUT and so much to do. Except some teams who spent nearly no time on the report, it was a good and solid statement of work by most teams. Well done!

The bug reports were from a different quality. Also teams who did not so well on the test reports were quite good in finding and describing bugs. The main problem I have seen is, that some information were missing, to completely understand the bug and the motiviation behind it, and why development should fix this bug first. But from my list of individual findings, the first 8 teams found over 100 different bugs. So most teams were entering rather more bugs (most of “my” teams ~30), instead of really good described bugs. I think, the amount of bugs “available” had an impact on the quality of the bug reports.
One thing I saw, “there are four severities, so I have to use them all.” A couple of teams seem to have split their bugs and put something under each severity. That reminds me of my past life as a factory tester, when there were preferred / prescribed rates for the usage of test case priorities, bugs and who knows what. Just my feeling.

Regarding the bug descriptions I got the picture that most testers are good at that, which is nice to see, since bug reporting is a central part of a tester’s daily work. For all teams I have rated so far, there were at least several good bug descriptions per team. To excel in that category I was looking for a lot of ingredients. The title should have some prefix, the description should use some template, additional infos about reproduction, helping screen shots, explanations, why this bug deserves that severity and fixing at all. If it was an improvement, it should be clearly marked as such. And what I wanted to see is, that all team members use the same templates for title and description and show the same quality of bug description.

I have not seen many bugs where I would say, I don’t understand what the tester wants to say. So the rating is on a rather high level and looking for perfection. I see it in my daily work, especially when the deadline is coming closer, the bug counts are increasing, you should maintain a high level of bug report quality to help development and stakeholders to triage correctly and fix the right bugs in the available time and budget. So, even with 30+ bugs in 3 hours, you still have to keep a high level. Many teams were above average here, so well done again!

The final round of judging for Europe is just ongoing and I am through with nearly all my tasks and can’t wait for the winners to be announced.

At this point I want to congratulate all teams for participating in the contest and doing a very good job! Thank you all, judging was interesting and fun for me. I really enjoyed all the work.

Weekend Testing America – Testing Deep (WTA-52)

Somehow I managed to join my first session of Weekend Testing, in my case session WTA-52 of Weekend Testing America.

It was a very promising topic, “Testing Deep”. What does it mean, when do we know we are there, and is there a point of “enough”?

Introduction

The session was facilitated by Justin Rohrman, because Michael Larsen was busy, and there were 13 participants including Justin and Michael. I guess most of the participants, if not all, I know from Twitter. So it was interesting for me, to interact with them kind of live for the first time.

Justin set the mission with some questions to consider during the session:
– how do we know what we are doing is deep testing?
– what do we do different (thought process, approach, techniques, etc.)?
– how to actually do deep testing (hint: staring at a feature longer doesn’t make it “deep”)?
– how do we know when enough is enough?

And he offered some ideas to start with

1 – what is it
2 – how do we know when we are there
3 – how do we know we are not being shallow
4 – how does it feel
5 – what are we actually doing different
6 – how is our mental process different‏

Also I want to mention, that the credit goes to Michael Bolton, who also attended, and James Bach. Justin got the idea for this topic when taking Rapid Testing Intensive (RTI).

Justin chose the online collaboration wordpad titanpad.com for the SUT. And I think this was a good choice for a software to try deep testing. I guess all had immediately an idea what that software was intended to do. So Justin gave three example areas to chose from for “deep testing”. Some participants where building teams to test the collaboration features together, and some, like me, were testing alone. Well, sort of. Being in a Skype chat with 12 other people who explore the same software is not really being alone.

Exploring / Hands on “deep” testing

I set my expectation for testing deep. Instead of wandering around and building up my model of the whole application layer by layer, I chose Export/Import and I wanted to stay at one part of the feature as long as possible. I opened up my XMind and started sketching the feature to test. I started testing the Export feature, file format HTML. I soon realized that my private notebook is not yet set up for supporting testing. But at least I had Firebug installed already and just downloaded Notepad++. Soon I started finding minor problems in the HTML structure, translating empty lines in enumerations, and so on. The ideas, what to do next kept popping into my head. But then I moved to the next export format, taking time, coming up with ideas and exploring further. But as I have to reflect now, the time I took for exploring each export format got shorter and shorter.

And then on Skype problems kept popping in. But I was trying to keep my focus on export. Some interesting bugs in Import were mentioned, so I soon extended my focus a bit. Maybe too much. I wanted to test Import, too. I also wanted to see those errors. I wanted to use the file I exported, and import them back.

At the end I started a bit interacting with the others. Trying to better understand the situation they faced and trying to help. Yes, I know funny. Why should I be needed to help them?

Discussion

The hands on part was over after nearly an hour. How time flies by. So all were gathering in Skype again for the discussion.

Justin took some interesting notes about his feelings during the session. I find this a good source of information, at least something that informs yourself and those who know you. I read something in a blog lately about capturing the feeling of a tester at the beginning and the end of a session. After that list from Justin, I really have to try that in action.

An interesting comment came from Neil Studd, refining the initial question:
You can’t go deep until you know how deep deep is.

Amy wrote:
How about ‘going beyond the expected’. I felt like I was forcing myself to go back and take another look even when I thought I’d seen it all.

I found that idea rather familiar with what I tried myself. At least in the beginning. I tried to force myself to keep on digging. And I have to say, in the beginning it was rewarding, I found more issues on each return.

Richard and Amy came to the conclusion, they found some sort of wall, where the collating of information by simply using the application stopped. They came up with a model, how to get to more information beyond that wall. I am pretty sure (and hope) that Richard will write something up about that model, so I am stopping here.

I came up with the definition, that it’s “digging so deep that the Information you encouter there are no longer in the responsibility of the owner/creator/dev of the app?”. But is it really helpful to test every feature until you arrive close to the hardware level? I don’t think so now.

I liked Neil’s addition: “if we use depth with an ocean analogy, the seabed is not flat – some investigations are likely to hit bedrock (i.e. non-issues outside our control) sooner than others‏”.

We then discussed a bit around what deep is, without coming to a conclusion that most were comfortable with.

Then Michael brought first this definition from the RST:
Here’s what we say in Rapid Testing: testing is “deep” to the degree that it reliably and COMPREHENSIVELY fulfills its mission AND to the degree that substantial skill, effort, preparation, time, or tooling is required to do so.

I keep falling over the term “fulfills its mission”. At my company the time for testing is sort of fixed and restricted as a ratio of the amount of development effort (in many cases, not all). The time sets for me the main part of my daily mission. So I can only test as deep as possible in the given amount of time. So testing is kind of shallow, in most cases, by definition of the time box. Everything that goes beyond that is deep testing for me. So every feature I decide to spend more time on as planned by someone else, someone who does not know what all is possible, or maybe necessary, is testing deep.

I questioned Michael, if depth is something he would want to “measure” in some way to report on it. Because as with Neil’s example earlier, for some features the sea bed is not as deep as for others. Michael’s response was, that the extent of the mind map one created while exploring might show the depth of investigating. The answer is in most ways okay for me, because it shows that you invested time there and digged up lots of information. If you really hit ground or how deep you came is still hard to tell. But define “ground”, there lies also the solution to the question we were working on.

Michael brought in this list in regards to the SFDIPOT:
For a given feature or function…
– to focus on that feature or function
– to consider a wide variety of risks
– to use and/or develop a very detailed structural diagram
– to break the function down into a detailed set of sub-functions, and to test each one
– to use highly diverse and extensive data sets
– to identify and exercise as many interfaces as are there
– to test on a wide variety of platforms
– to consider and work on a wide variety of operational models
– to consider and test for lots of interactions with time‏

SFIDPOT is hanging on my office wall to remind me every moment I need it. But I didn’t recognize it at first. Maybe too much to read in the Skype chat.

Justin brought then up this idea: If you are testing and discovering / creating the model as you go, you are always at the “bottom” of the model. So, are you always doing deep testing when this is happening?

I was not completely happy with that definition, because it would mean, that you are at this stage from the very beginning. But something of this idea still revolves around in my head.

CONCLUSION / SUMMARY

In the last 5 minutes we were asked to share a definition of what “deep testing” is. Some gave it an immediate try, and some were retrieving to come back with the ultimate answer later. From the definitions that came up, I found none that satisfied my view on the topic, that I just started to think about for the past two hours.

Michael gave a good summary of what is needed to go deeper, and therefore complete the picture he set earlier with his definition and SFDIPOT:
Time: All this takes time to develop, maintain, and perform.
Determination: It’s hard to blunder into deep testing. You have to want it.
Skills: You need to know how to model products, identify testable conditions, and design experiments to evaluate them.
Learning: You need to a rich and detailed model of the product and its risks (may be a mental model, formal model, or both).
Requisite Variety of Test Activities: You need to work out a pattern of test activities that will find the obscure, yet important bugs, based on a good theory of risk.
Tooling: You may need tools to help you cover large areas or to reach otherwise inaccessible areas of the product.
Environments for Testing: You need a requisite variety of test platforms configured and available for tester use.
Data for Testing: You need a requisite variety of test data so you can trigger the important bugs.
Team Support: You may need lots of eyes and minds poring over it. Developers can help immensely by exposing the code.
Testability: You may need special features in the product that help you observe and control it.

After sleeping a night over the session I tried to come up with a perfect definition for me of “Deep Testing”. But the idea is still so vague and tacit, that I am not able to write it down now. If I will be able, you will read it here…

Michael explained the parts how you can structure the width of the hole, SFDIPOT, and also the abilities you need to exploit each of the areas in depth. The initial definiton “testing is “deep” to the degree that it reliably and COMPREHENSIVELY fulfills its mission”, sounds to me, that this is the ultimate depth of testing something. OK, then everything above that level is to a certain degree shallow. But even that is still vague and depends on context.

Now comes the hard part, at the job, you need the budget to get the time to to do all this, so you should be able to estimate and sell this strategy. But how do you know, how deep deep is and how much time it takes? That’s something to sleep over the next night(s).

Thank you Weekend Testing America, that was an interesting and inspiring session. Thanks to all who attended and enriched the discussion.