A Lesson about Quality Balance and Systems Thinking or “Zen and the Art of Bicycle Maintenance”

Trying to get my bike fixed taught me an interesting lesson about balancing quality aspects and keep the greater system in mind, when optimizing things. The manufacturer changed their wheel design, but forgot the maintenance aspect.

This last week I learned a rather interesting lesson about quality and systems thinking. To set the context:

End of June I bought myself a new bicycle for the first time in 20 years, a road/race bike. When looking for potential models I focused on one set of components (brakes, groupsets and all that). Regarding frame and wheel set I had no preference. Buying a bike in 2021 is anyway a rather hopeless endeavor. I was very lucky, when I found a shop in Ingolstadt (about 85km away) that had one in my frame size in stock.

When riding through the “outback” last week, I hit one bump and heard a crack. A spoke broke and my tour that day was done. I called my wife to pick me up.
As someone with an aspiration to be a maker, I wanted to fix it myself. I had done it before (truing more often, replacing a spoke ages ago, but okay), so why not this high-tech thingy of a wheel. While waiting, I was searching on my mobile for replacement parts. There actually was a maintenance set by Newmen with a set of spokes, nipples and washers on the market. Only problem, it was sold out in all shops that I could find.

To save me some time – how funny that sentence is, only came to my realization 7 days later – I decided to bring the wheel to one of the bike repair shops in the vicinity. On Saturday I tried one in my town, who also sells race bikes. They didn’t have the right length of spokes in stock and were about to go on vacation. The one in the next town didn’t have one exactly the size, but wanted to try it. On Tuesday he called me, that it didn’t work. Ordering replacement parts was not in his interest, as he has to order them by the 100. I called a few more bike shops, but no success.

So I started looking again on the internet. I found the right length of spokes (as pack of 20), and got offered also the right tools I needed under “What other customers were also buying”. 3 days later the parcel arrived, exactly 1 week after the incident. I took the Friday afternoon off to fix the wheel.

I removed the rim tape, was able to shake out the nipple with the broken spoke threading, and put the wheel on the truing stand. I checked the packs of spokes, and the longer ones seemed to be the right ones. Only the nipples looked different. The ones that came with the new spokes had a slit on the side where they should hit the rim. That slit fitted to one of the fancy special tools that I ordered. But that way around the tool was useless. So I checked the spoke wrench. That one didn’t fit as well. WTF? Never blindly trust an algorithm!

I was grumpy – for a change – took one nipple as example and drove to a couple of bike shops (5!) to find a fitting spoke wrench. I haven’t seen so many confused faces in a while. One was even mansplaining me, that I am wrong, and that’s not the way these things are supposed to work, and the tool I’m looking for doesn’t even exist. At the last shop on that round I had a nice chat with the boss, chatting about shortage of materials on the market and other things. Also he didn’t know about that way of mounting nipples. But he gave me the tip to call the manufacturer.

On my way home, already driving around through the county for over 90 minutes, I decided to call the shop in Ingolstadt first. Luckily the guy picked up the phone who had sold me the bike. A true bike freak. I told him my problem and he immediately understood and told me that the wheel set had to be sent to the manufacturer. I should come by and they would arrange a replacement wheel set, until mine is coming back from the manufacturer. So, back into the car, driving 1h north to Ingolstadt, and finally all pieces of the puzzle came together.

Someone had an idea

Someone at Newmen had a fancy idea. To achieve a cleaner look on the rims, they basically turned the nipples around to fix the spokes from inside the rim.

Here is a rough sketch to illustrate the idea. Newmen produced nipples with a clean head (no slit) to prevent shaving off the rim, when screwing the spoke in. To do that, you need a long tool with a slim head and a 3mm square profile to reach the nipples inside the hollow chamber of the rim (an inside nipple wrench). Also the holes on the rims are much smaller, as only a 2mm spoke has to fit through it. Nice idea!

Basically the Rest of the World (ROW) is doing it differently. They drill a bigger hole in the rim, and fiddle the nipple through that hole. The nipple with its square profile is then outside of the rim and reachable by the standard nipple wrenches. That also explained the slit on the nipples I received and the special tool to fiddle them in.

And this design change explained everything I have experienced the past week. The stunned faces, the material shortage, the inability to find someone able to help me.

The Balance of Quality Criteria

What the manufacturer did was underestimating the right balance between two quality criteria. The two competing aspects in this case were “Look/Attractiveness” and “Maintainability”.

Yes, hiding the nipples inside the rim created a cleaner, crisper look.

BUT! That means trouble for maintenance. And here systems thinking comes into the picture. When you look at one thing to improve its quality, you have to keep the environment/system that it is running in, in mind. In case of Newmen, they forgot the bike repair shops and people maintaining their bikes themselves.

https://twitter.com/allenholub/status/1431659048093556736

Because not only to change entire spokes, but also to simply true a wheel, you need to remove the tires, tubes, and rim tape. Truing a wheel, mounted on the bike, is basically impossible that way. And you need special tools, that many shops don’t even know about (5 out of 6 in my vicinity). And also special spare parts, as the nipples need to be without a slit.

In manufacturing this is probably not too much of a change, as you have a naked rim on a truing stand in front of you, and where from to apply the tool to change the tension of the spokes is just a matter of exercise and training.

As long as the wheel is intact and functioning 100%, all is fine. But as soon as you face an issue, all that beauty is getting in the way of quality!

Newmen has accepted that fact and will most probably / hopefully now redrill my rims and re-equip them with standard nipples, ones that look out of the rim. I was told by the bike salesman, that they had seen the problems happening quite fast, once the wheels were out in the field and changed their design mid-year of production, which rarely happens in the bicycle industry.
As a side note, I also don’t expect that the maintenance set for that rim to get back in stock.

When I buy my next bike, I know now one more quality aspect to look for. But now I wait for my wheel set to return in a few weeks.

Oh, this stupid pyramid thingy…

Another blog post on the testing pyramid? RUN!!! But wait, it’s actually about an alternative. Maybe read it first, and then decide to run.

It seems to be an unwritten law that at testing conferences there needs to be at least:
– one Jerry Weinberg quote
– one mention of the 5-letter acronym in a non-positive context
– and there has to be at least one testing pyramid

Whenever test strategies are discussed these days, you will probably also find the testing pyramid been referenced. It is one of the most used models in a testing context that I’m aware of. And yet, my personal opinion is, that too many people don’t understand the actual intention behind it, or are probably unable to properly communicate it. And I blame the model for this!

All models are wrong, but this one isn’t even helpful!

Patrick Prill

The basic testing pyramid (or triangle for some) is mostly mentioned in context with the number of automated test cases to have. A whole lot of unit tests, a bunch of integration tests, and as few as possible end-to-end tests, especially when it comes to this dreadful UI thing. But what does this mean when it comes to actually designing and writing the tests? And this is the point where I feel that the pyramid has some severe short-comings. At least the key aspects that I mean, are often not mentioned together with the pyramid.

Yes, we got it, people misunderstand the pyramid thingy. What’s your solution?

Well, to start with, it’s not my idea. The model that helped me the most for the past four years is Noah Sussman‘s band-pass filter model.

The band-pass filter model by @noahsussman

What this model basically states is, that at every testing stage you should find the issues that can be found at that stage. That means, you can probably find the most issues at a unit test level. As a result you will probably have the most test cases here that focus on functional testing.

You don’t want to find basic functionality issues with an end-to-end-UI test!

On a unit test level you won’t be able to catch integration level issues. These stay in the system to be hopefully discovered in that stage.
End-to-end-tests should then find the issues when looking at the system all together, when taking a view through the business process band-pass filter.

And this model is easily extendible, by adding new band-pass filters for security testing, accessibility testing, load & performance testing, and so on. All these kind of issues should be found in their respective test stages, however you implement them.

And, of course, the optimal distribution of tests will – in the end – probably for most systems, look like a pyramid / triangle. But it doesn’t have to! It depends on your system’s architecture, design, testability, influence, and so on. Of course, the root cause that some systems cannot reach a testing pyramid lies in some problems of some kind. But that’s not a reason to not having tests!

The band-pass filter model helps you with that. You have to find the issues at the earliest stage possible. At a recent client of mine I was analyzing the unit test set. And basically most tests weren’t unit tests, but integration tests, as most of the methods under test needed DB access. That’s caused by the fact that the MVC approach is not properly used most of the times. But does this project has to refactor all code before properly starting with testing? NO! They have to start then with integration tests, and find ways to filter out all these functionality bugs on the integration test layer, which runs basically after every commit. That’s much better than waiting for the end-to-tests that only come at the end of each release cycle. As a next step they will find better ways to implement their code, with that comes better unit-testable code and more unit tests. Or not, and they stick with their way of mixing MVC and just drastically increase the amount of integration tests. Fine for me!

I gave the developers and testers of my last client an exercise to experience the power of the band-pass filter:

For the next bug report that lands on their desk, they should find a way to reproduce the issue on the earliest possible testing stage. And if that is not the lowest test level (unit test = low, UI test = high, yes, I know, pyramid lingo, duh!), try to check again, if it’s not reproducible one stage earlier.

If they need data, is there a way to simplify the necessary data to reproduce the issue. On an integration level or higher, how do you select or create proper test data? If you made it to the unit test level, you will probably be able to simply define the data or appropriately mock it.

Now that you have a failing test case on the earliest stage possible to find the issue, fix the bug and see the test case turn green.
While you are at it, you may add a few more tests on that level. Just to be sure.

Use the power of the band-pass filter model!

PS: The idea to this blog post came during TestBash Home 2021, when the testing pyramid appeared for the first time, in the first talk of the day, and the chat went wild. I stated that I prefer the band-pass filter model and earned a lot of ??? So, here is the explanation of what it is, and why I find it more useful than the pyramid!

In Test We Trust

Have you ever thought about how much trust testing gets in your projects?

Isn’t it strange that in IT projects you have a group of people that you don’t seem to trust? Developers! I mean, why else would you have one or more dedicated test stages to control and double-check the work of the developers? Developers often don’t even trust their own work, so they add tests to it to validate that the code they write is actually doing what it should.

And of course you trust your developers to do a good job, but you don’t trust your system to remain stable. That’s even better! So you create a product that you don’t trust, except when you have tested it thoroughly. And only then you probably have enough trust to send it to a customer.

Testing is all about trust. We don’t trust our production process without evaluating it every now and then. Let’s take some manufacturing industries, they have many decades, even centuries more of experience than IT. They create processes and tools and machines to produce many pieces of the same kind. Expecting it to have the specified properties and reaching the expected quality aspects every time. Depending on the product, they check every piece (like automobiles) or just rare and random spot checks (for example screws and bolts). They trust their machines – usually to a high degree – to reproduce a product thousands or even millions of times.

We are in IT, we don’t reproduce the same thing over and over again.
Can you imagine that for your project you only do some random spot checks, and only check for a handful of criteria each time? If your answer is ‘yes’, then I wonder if you usually test more and why you still test it. If your answer is ‘no’, you belong to what seems to be the standard in IT.

So, what we have established now is, that we don’t overly trust the outcome of our development process. Except when we have some kind of testing in place.

Have you ever realized how much decision makers rely on their trust in test results? If you are a developer, BA, PO, or a tester, who is part of the testing that happens in your delivery process, have you ever felt the trust that is put into your testing? Decision makers rely on your evaluation or the evaluation of the test automation you implemented!

Does your project have automated tests? Do you trust the results of your tests? Always? Do you run them after every check-in, every night, at least before every delivery? Do you double-check the results of your automated tests? When you implement new features, when you refactor existing code, when you change existing functionality, you re-run those tests and let them evaluate if something changed from their expectations. You trust your automated tests to warn you in case something has been messed up. The last code change, a dependency upgrade, a config change, refactoring an old method.

Do you put enough care into your automated tests, that you can really rely on them to do what you want them to do? Why do you have that trust in your tests, but probably not in your production code? And I don’t ask the question “who tests the tests?”

Of course we do some exploratory testing in addition to our test automation. And sure, sometimes this discovers gaps in your test coverage, but most of all exploratory testing is to cover and uncover additional risks, that automation can not supply. So, when you established exploratory testing in some form, alongside your change detection (a.k.a. test automation), you add another layer of trust, or respectively distrust to some parts of your system.

This is not about distrust, we just want to be sure that it works!

In one of my last consulting gigs for QualityMinds, I had an assignment for a small product company, to analyze their unit tests and make suggestions for improvement. The unit test set was barely existent, and many of the tests I checked were rarely doing anything useful. That wasn’t a big problem for the delivery process, as they have a big QA team who is doing lots of (end-to-end) testing before deployment, and even the developers help in the last week before delivery.

Yet they have a big problem. They don’t have enough trust in their tests and test coverage to refactor and modernize their existing code base. So my main message for the developers was to start writing unit tests that they trust. If you have to extend a method, change functionality, refactor it, debug it, fix something in it, you want to have a small test set in place that you can trust! I don’t care about code coverage, path coverage, or whatever metric. The most important metric is, that the developers trust the test set enough to make changes and receive fast feedback for their changes and that they trust that feedback.

I could add more text here about false negatives, false positives, flaky tests, UI tests, and so many more topics that are risks to the trust that we put into our change detectors.
There are also risks in this thing that is often referred to as “manual testing”. When it is based on age-old pre-defined test cases, or outdated acceptance criteria. Even when you do exploratory testing and use your brains, what are the oracles that you trust? You don’t want to discuss every tiny bit of the software with your colleagues all the time, if it makes sense or not.

We can only trust our tests, if we design them with the necessary reliability. The next time you design and implement an automated test, think about the trust you put into it. The trust that you hand over to this piece of code. Is it reliable to detect all the changes you want it to detect? When it detects a change, is it helpful? When you don’t change the underlying code and run the test 1000 times, does it always return the same result? Did you see your test fail, when the underlying code changes?

PS: This blog post was inspired by a rejected conference talk proposal that I submitted for TestBash Philly 2017. All that time since then, I wanted to write it up. Now was the time!

Testing, Quality, and my inability to teach

Hi, I’m Patrick and I’m a tester for 18 years now. And I have a problem: I don’t care about testing! I care about Quality! Yet people see and treat me as a tester.

I have to add, that I don’t like testing, as many people see it. And I don’t do testing as most people do it. Most colleagues I have worked with over the past two decades see testing to verify functional correctness and sometimes even conformity with some non-functional requirements, such as load and performance.

My understanding of quality starts where most colleagues understanding ends – when explicit requirements are met. I see quality more like Joseph Juran defined it: “Fitness for use”. And the additions that have been made to Jerry Weinberg’s “Quality is value to someone who matters at some point in time” are very helpful in understanding the flow and continuous urge for adaptation when thinking about quality from my point of view. More on that in a later blog post.

Testing as an activity and my role as tester, especially as test automator, are for me the best means and position to influence a project’s and product’s quality. As long as you don’t put me in the waterfall-ish place after development finished and before delivery/operations and expect me to stay there! I’m usually all over the place, wherever people let me.

I couldn’t care less for approaches like decision matrix, boundary value analysis, path coverage, etc. Probably either because I learned about them in a formal way already in 2004 and have them internalized by now (I never explicitly use them!), or because they are the formal explanations of what I usually call “common sense of a tester”.

Tasks like “Test Analysis”, “Test Design”, “Creating test cases based on requirements” is something against my personal nature. I’m so much driven by the problems in front of me, the context I’m in, the problems that need to be solved, the things I’ve seen, explored and sensed, that coming up with “a full test set” up front, is just not my style of working.

Quality is relative, quality is in a constant flow, quality is highly subjective, quality is everywhere and nowhere, quality can’t be predicted, quality cannot be put in numbers. And that’s why my style of “testing” follows the same behavior. I just cannot reduce my work to writing and executing test cases. I just can’t!

When I have to look into a bugfix, and I see the commit is a one-liner that fixes exactly the problem at hand, probably I have even seen the code in that area before myself, had a short chat with the developer and I came to the conclusion that we are here in what Cynefin describes as the obvious domain, I’m very much fine with closing the issue and not testing it any further. Some colleagues would describe this as “not testing”. For me that is a lot of testing, even if the actual task of creating a test idea/case and executing the code against some predefined scenario never happened.

Don’t get me wrong, I’m not against test documentation and all that, which is probably required and necessary in several contexts. But writing 95% of the documentation upfront, even before any line of code is written, and just adding the fact that I did what is written there, was never for me.

My “testing” is a complex and unpredictable process, that uses a lot of experience, common sense, systems thinking, domain knowledge, and many more aspects. Which is actually causing a big issue for me. I was so far unable to teach other testers, especially the younger generation, to mimic my approaches. Except one. But from a senior role that is sort of expected. Here is an attempt to describe why I am not able to do that.

Being the tester in a team means for me that I support my team to establish and maintain trust in the code we build and deliver, help us optimize the way we work, help to come up with solutions for the problems our clients and we are facing. I try to open up bottlenecks, never be a bottleneck myself, enable the team to act fast, and most of all I help to uncover potential risks, so that we are at least aware of them, talked about them and included mitigations for them, if relevant, in the solution we came up with. I cannot describe what I’m doing any better or more precise than that. Simply because I don’t know where I can help next.

Tomorrow I might:

  • write test designs, when I have to,
  • automate some test cases,
  • improve existing test scripts,
  • pair up with a developer
  • refactor the test automation framework,
  • pimp the Jenkins pipeline,
  • explore
  • step in for the product owner
  • participate actively in refinement meeting, and with actively I mean, I don’t only ask questions for clarification, I also propose actual solutions for the problems at hand,
  • I might pick up a story to implement,
  • do a code review
  • discuss with the business architect
  • help my tester colleagues
  • suggest architecture improvements
  • analyze test failures after the latest pipeline run,
  • discuss how we can reorganize the team structure to become better
  • update dependencies
  • sit in a meeting and just listen
  • step in for the scrum master
  • take care of the test infrastructure
  • or any other task that has to be done to deliver value to the customer, improve the quality of our own working, or just help future me to have a better day in a few weeks/months.

How many of these tasks do you read or expect in actual tester position descriptions. How many of those do you expect to be part of a tester training syllabus. And before the suggestions comes, I don’t see myself as a coach. I’m a hands-on person, I taste my own dog-food, and I want to stand behind things I propose. I lead by example, and hope that others are able to understand what I’m doing, and mimic my behavior to become better. Whenever I see behavior that is worth mimicking, I try to do that.

Quality, value, improvements, reducing waste, and making an impact drive my daily actions. I think, despite the level of Impostor Sydrome I suffer from, that I’m doing a good job, having a big impact on teams. At least that’s the feedback I get sometimes. I don’t even want to teach developers HOW to test. I’m rather good to help developers WANT to test.
But please don’t assign me any rookie and expect me to teach them how to test. In about 17 of 18 cases I will most probably fail miserably.

Something that just came into my mind when reviewing the post a last time:
I try to positively influence systems to heal, maintain, and improve. And as an embedded tester I have the chance to do that from within the system. But what I am doing to achieve that is so much more than just “testing”.

How your personal understanding of “Quality” influences your way of testing

I want to offer you a hypothesis, a proposition that I don’t have proof for. But I believe I’m on the right way.

“Your personal definition  of quality influences the way you test.”

Quality is a seven-letter word. I think that’s the only statement that we can commonly agree on. Quality is a very complex matter and I don’t want to go too deep into detail on that this time.

To make my point I’ll base this post on three common defintions of quality, that your personal understanding might be more or less based on.

Q1: “Quality is conformance to requirements.” – Philip B. Crosby

Q2: “Quality is value to someone who matters at some point in time.” – Jerry Weinberg, extended by James Bach, Michael Bolton and Anne-Marie Charett

Q3: “Quality is fitness for use.” – Joseph M. Juran

I want to describe the type of testers that I see behind those definitions. This is based on my mind model, my experience of 16 years and the people I work with and talk to. I know that this number is way too small to be representative. But maybe it is helpful for some, at least it helps me to understand people and their motivations, and their way of working, how projects are set up and so on.

Type “Quality is conformance to requirements.”

Frameworks like iSTQB and PMBoK are based on variations of Q1. And this totally makes sense from their point of view. That way testing is plannable and controllable, and you might even come up with metrics to make quality measurable, based on that definition. It’s a good way to define price tags for testing, which enables schemes like outsourcing testing.
The other definitions would not be able to serve that purpose in a similar way.

Testers with an understanding of quality like Q1 might tend towards test cases and test coverage metrics based on requirements. Waterfall-like approaches (including those covered as Agile) make them feel comfortable and standard test case deduction methods are their daily tools.

Projects and general product development with very specific lists of requirements, standards to adhere to and processes to follow would need a quality understanding like this.
Also people with a background in model-based testing might feel comfortable with this definition.
For concrete implementation projects they need to rely on customers that are able to express their requirements.

Type “Quality is value to someone who matters at some point in time.”

Proper Exploratory Testing in my opinion tends to be more based on definitions similar to Q2. They explore the system under test from different angles, and exercise it based on the findings that they evaluate most important.  Test reports should inform decisions and rather tell a story than produce numbers. They are aware that they are not the customer or end user of the system, but they try to resemble them as good as possible.
I could imagine that people who see themself as context-driven testers might have a quality definition based on Q2.

They understand that context matters most and the usefulness can change over time. This type of tester in my opinion is more aware of potential risks and trying to detect potential risks is more important than covering every edge case possible. They also understand that quality is different for different stakeholders and users.

Type “Quality is fitness for use.”

Approaches like observation, monitoring, testing in production, data analytics and alike belong more to quality definitions like Q3. They want to see that the implementation works in the field. Testing before releasing to production is mostly used to minimize risks of massive failure. Carefully releasing software into the wild and rolling back in case of failure is their preferred way of checking code changes.

I’d assume a trend towards incorporating parts of definition Q2 in their understanding of quality.

Background story

I came to this hypothesis recently when I had to change teams in the same context and room from a customizing implementation team, to the product development team. I did not feel too well in the beginning after the change and I wanted to understand why. It’s not the people or the domain. It’s the way of working or rather the definition of quality you need to apply that defines the context. In my case I had to switch from a Q2-context to a Q1-context. Guess, what I prefer.

Summary

I believe that your personal definition of quality is a fundamental piece of the puzzle how you subconsciously work, how you test, how you design test strategies, how you’d set up a testing project and so forth.

Of course people can adapt to their current context and fulfill the requirements of the job, and do it good. But I assume they won’t feel as comfortable as they could. At least I do.

I need your help!

You made it this far, thanks for staying. I need your help! I would like to know if my hypothesis is worth following up on.
If you have a personal definition of quality, maybe it fits roughly to one of the three examples provided. And maybe you are aware of what kind of context you mostly enjoy working in.
Please let me know, if my generalized description above fits to your personal situation or not.
Thank you!