03/07/2024

Bug hunters in the code jungle

DETECTIVE WORK IN SOFTWARE

Most people have already experienced it: a computer programme, a mobile phone app or a video game simply doesn't do what it's supposed to do. Instead, an error code appears on the display or nothing works at all. It's annoying and often leads to the question: How can this actually be?

People like Frank Krahl are there to identify and minimise such problems in advance. He is the Testing Coordinator at KIX. Together with his team, he runs through a wide variety of test scenarios on a daily basis - from automated processes to regression tests. In our interview, he reveals why there is no such thing as error-free software, why testers often do detective work and how a bug can sometimes become a feature.

Writing code and developing programmes is certainly a dream job for many people. But how do you actually become a tester?

There are different ways. Often it's developers who have studied computer science in the traditional way and want to try out something new. The other option, as was the case for me, is a computer science-related apprenticeship with a more or less direct route into testing. I initially worked in support at a software company, but my area of responsibility overlapped with testing from time to time.

I moved to KIX in 2017, and today I coordinate this area. Testers don't necessarily have to know a software down to the last detail, that's more the job of the developers. But they do need to look at it through the eyes of a customer. And often also think outside the box.

So you don't have a typical day-to-day routine?

We already have fixed structures, processes and individual steps. Be it when testing a new function, checking a bug or carrying out RC, release or regression tests. The usual daily routine consists of putting newly developed functions and fixed errors to the test and checking their expected behaviour.

What is exciting and varied, however, is that we are confronted with new topics and tasks every day, which means we are constantly expanding our technical and professional knowledge. It never gets boring with us. And only when we have completed all the steps do we give the go-ahead.

How exactly do you go about this?

Firstly, we try to reproduce the problem on a current systemaccording to the reported reproduction steps. If this is not successful, we try this on a system with the same version as the customer's system. If this is also unsuccessful, we take a closer look at the customer's system. The specific configurations are often an important factor in causing the error.

So it's a mixture of precise mapping of the steps taken by the customer and analytical detective work. Sometimes I feel like an investigator (laughs). But of course I prefer to catch the culprit or the bug in advance. There are already plenty of examples where a faulty programme has caused damage running into millions.

Behind every investigator is a resourceful team. As a digital detective, do you also use digital assistance?

Yes, of course. We use test automation for this, and I can only recommend this to all colleagues. In other words, a tool that carries out test steps that are repeated regularly. This relieves us of some of our work, but our job is primarily to feed this tool with requirements.

This means that we define the test scenarios and steps depending on the requirements and it usually runs them once a day. If the tool finds an error, we are automatically notified. This allows us to make much more efficient progress.

With KIX, you are working with an ITSM system based on open source. Does it put you under particular pressure that all users can view the source code and discover errors that you have overlooked?

On the contrary, the advantages of open source code simply outweigh the disadvantages. Possible sources of error can be recognised by a much larger number of people and often even rectified directly. With thousands of users, a problem simply becomes apparent more quickly than in the internal testing and development area.

Sometimes it can be a bit of a challenge for professional honour, but completely error-free software is simply utopian. Even the software for the space shuttle contained bugs.

HOW CAN IT BE THAT BUGS KEEP SLIPPING THROUGH?

The natural limits of testing usually lie in economic efficiency. It is always necessary to find a compromise between error costs and error prevention costs. Conversely, this means that there is always a so-called grey area of possible errors that are still contained in the software. Before a release, tests are carried out that cover all important functions whose malfunction could have a serious impact on the customer's business.

At KIX, we work with five error classes from A to E. A class A error for us would be, for example, that users are unable to create tickets. In other words, a serious problem that we have to rectify immediately. Class E, on the other hand, includes more trivial things. If an icon is misaligned or contains a spelling mistake, the probability that this will lead to a business-critical situation is very low.

The most important functions are then part of the regression test. The less important test cases are only executed if the time frame allows. To take up the example of the space shuttle again: At NASA, there was less than one error for every 420,000 lines of code. They managed this because they are not a commercial organisation, but have a large budget supported by the state. A written, tested and documented line of code cost NASA around 1,000 US dollars in 1973. Adjusted for inflation, we would be looking at around 7,000 dollars per line today. Or to put it another way: if the project were to be realised in the same way today, almost three billion dollars would be needed for the software alone. And then no rocket would have been built yet.

Has it ever happened that a bug ended up being useful?

Sure, it happens from time to time. When we get an error message, we always check first to see if it really is a bug. The phrase 'That's not a bug, it's a feature' comes up occasionally (laughs).

Thanks to our feature list, we can quickly determine whether this is intentional or not. And even if it's a new bug, it doesn't necessarily have to be bad. Sometimes there are moments when we integrate a supposed problem into our solution - a happy accident, so to speak. This happens from time to time in the industry. A well-known example of this is the almost 50-year-old Space Invaders bug.

In a nutshell: the more spaceships the player shot down, the less computing power was required. This meant that the remaining enemies moved faster towards the player. The developers had not planned for this, but left it in the game to increase the difficulty.

And have you ever had such strange bugs that you could only laugh about them?

We have funny experiences from time to time. At least we can laugh about it more often afterwards if the effects weren't too serious. One example is what we call the Monday bug.

One of our customers had a very strange time recording error that only ever occurred on Mondays. It was an odyssey to find and fix the problem, precisely because we could only observe it on this one day of the week. In the end, it was a format error in a database field that led to hundreds of error messages.

We had another funny error during contact synchronisation in an LDAP system, where random first and last names were created. Here, however, we realised relatively quickly that the system had loaded our demo data instead of the customer data. In any case, there's a bit of everything here in software testing - sometimes something to laugh about, sometimes something new and sometimes we're detectives.

It's always exciting, and sooner or later we find out about all the bugs.

IT Software Testing - Frank Krahl - KIX Testengineer — KIX software testing engineer, Frank Krahl

‹ back