ML Tool Trains on Old Code to Spot Bugs in New Code
How to spot bugs in new code? No one likes new bugs, but they’re especially unwelcome if they pop up in an application that you’ve already deployed and now need to fix. How can you make sure your software doesn’t contain any nasty mistakes before you release it? We need better ways to answer this question, and machine learning may be the solution we’re looking for. In a paper published on arXiv this week, researchers from MIT demonstrate how AI can spot problems in legacy software by learning from old code with known issues.
How do modern tools find bugs?
Modern tools rely heavily on machine learning. ML scans through the old code and is able to learn how that code works. They then use this knowledge to predict how the new code will behave. ML tools can also be used for integration and functional testing, automation of these tasks, detecting security threats, and more! These services have become necessary because there are so many lines of code being written each day. For example, Facebook’s 2 billion users create around 10 billion posts every day. In order to keep up with all this new code, Facebook has over 15000 people working on keeping their site secure by manually reviewing their software. However, it is impossible for humans to review everything that they make and they often miss things. Machine learning offers a solution as it finds bugs much faster than humans do.
The software world will never be bug-free
Programming is a difficult skill, and not one that you can master overnight. Every piece of code has errors in it. A programming error may not show up the first time the code is run, or the error may be so subtle that no one notices it. It might take months or years before the problem shows up.
You should always use test-driven development (TDD) because TDD will force you to fix bugs as they occur rather than waiting for them to pop up as failures at a later date or having them happen when your customer sees them. Additionally, TDD will also help you see where a piece of code needs to be fixed without guessing what’s wrong with it.
Why scan old code?
Software developers regularly make changes and additions to existing code. The longer it has been since the last change, the more likely that bugs will have emerged. As software grows, it becomes increasingly difficult for developers to work out what needs fixing without having access to old code. A recent study looked at how machine learning (ML) can be used as a solution. ML is currently being used in various forms of software development and data analysis across the world; it is relatively simple for an ML tool to analyze one body of code (the old) with another body of code (the new). This can provide not only a useful overview for fixing errors, but also teach the ML tool about anomalies that might exist in older blocks of code- providing protection from them going into future code versions too.
How a team developed their own ML solution
One of the challenges we had with learning this new technology was that nobody could help us, not even Google’s developers. On top of that, given our time and budget constraints, it would have been impossible for us to go learn and come back as experts. In order to tackle the problem and quickly get back up-to-speed, we first had to get our old code migrated over. That way, when you write a bug in old code but apply it to new code, you will immediately see the typo without having to look through both versions of the program. Once the old code was transferred, we were able to implement a simple ML solution by using Jupyter Notebooks which can be easily updated to provide more features down the line. For example, if you’re trying to debug a function and want it to stop executing once it reaches an error point then your machine learning model needs access to its inputs and outputs. It can also act as an early warning system by identifying future errors before they occur.
Final thoughts
New bugs are created every day because of the variety of devices that are now connected, but traditional methods for testing can only do so much. Enter Machine Learning, a technology that can find bugs without any prior knowledge or training. The method is based off an algorithm from 1950s used by early scientists called Bayesian analysis. This process uses two important factors: the likelihood of a bug occurring and the frequency it has occurred in old code. As long as developers have properly catalogued all the code they have ever written, then the ML tool is able to analyze new code and see where potential problems lie – without even knowing what those problems are. The one downside? It takes a lot of processing power to train this machine learning tool, which means it needs access to vast amounts of data. But with more machines in the world than people, there should be plenty of data at its disposal.