Markus Fischer
Architect
In our first two articles Daniel introduced you to the concept of Software Supply Chain Security (SSCS) (click here if you need a refresher) and Nicolas showed you how to up your SSCS game without spending a cent by leveraging open-source tooling (click here if you need a refresher).
In this third post we want to show you how we approach a comparison between the various available solutions. We will go over defining upfront criteria and then improving them during our evaluation, as we learn more about the actual features offered and their limitations. This will empower you to reach a well-founded conclusion in a structured way, even though you have not had years of experience in daily driving each solution.
The Plan
Before looking at any tools, try to write down what problems you see right now and how a great tool might solve them for you.
In the case of SSCS we started with an Excel-Sheet (of course) with the major topics:
- General: Always good to have a catch-all category. We’ll add things like usability, price, documentation here.
- SBOM support: In our initial research we quickly noticed there is a standard for listing dependencies. It would be great if the tool would support it.
- Vulnerability scanning: One of the major features we need is to be told when there are vulnerable packages or binaries in our software.
- Secret detection: Way too many incidents to list, where access was lost due to uploading secrets by mistake.
- License detection: We want to be warned if one of our dependencies (or a transitively included one!) has a “toxic license” – meaning we would have to opensource our code because we’re using it.
To find these, we tapped into our field experience and looked at the big flashy feature boxes that the vendors show on their product’s webpages.
Game Day
Now it was time to get our hands dirty. We budgeted quite a bit for buying software licenses but were very pleasantly surprised that many vendors offer trial versions, and their sales departments will happily assist you in evaluating the tools.
Learning #1: Just ask for a limited time license, no need to pull out the black Amex.
With that, we set up sample projects which would include security issues in our respective categories. Nothing big mind you, just a hello-world with some outdated libraries, secrets and toxic licenses. We used representative tech stacks we see every day at our customers and ended up with three small sample projects. However, we also used an internal project to have a “real-world” project. This turned out to be very revealing as some tools worked great with 20 files but noticeably choked when we threw a couple of hundred megabytes at them.
Learning #2: Use toy examples to test individual features, use a real-world application for testing the overall experience.
During our tests, we noticed that some tools offered very convenient features. For example, some would automatically raise pull requests for updating a library version, if the currently used one contains a vulnerability. Some allowed you to collect multiple issues in a “shopping basket for vulnerabilities” and then bulk fix them. Some allowed you to define exceptions in code, some only in the UI, some not at all.
For all these features we picked up along the way, we added sub-items under our major categories. We also added a weight to them to reflect how important we deemed that feature and to obtain a weighted average at the end for each category.
Do watch out though: Some issues were K.O. for us. For instance, the tool having SBOM-imports is great, however if a silent error occurs and nothing happens once the file is uploaded, it is K.O. for that category – no matter how well the button is integrated into the workflow.
Learning #3: Use weights to reflect your subjective importance assigned to individual features. Also, have a K.O. handling to use for deal breaking issues.
Of course, if you’re picking up new items along the way, you will have to go back occasionally and re-evaluate tools. At this point, however, iterative processes should delight you as they do give you a chance to recalibrate your findings (#agile). We were working in a team of four and assigned work on a tool basis, meaning specific tools were assigned to specific people. This greatly enhanced our throughput but be sure to check in regularly to synchronize on a common grading schema. We explicitly wrote down how many points (0-4 in our case) some functionality would give.
For example: Validation of Pull Requests
1 = PRs are scanned automatically.
2 = PR builds can be failed.
3 = Good description of why a build is failed is displayed inside the build step.
4 = Option to only scan new code and ignore existing code; PRs are commented inside the PR.
For other features, it might be easier to sum up the number of features, such as we did here for example:
Simple and auditable way of suppressing issues:
+1 suppression rules overview
+1 time limited suppressions
+1 central suppression list per repo
+1 suppression accross multiple repos
We didn’t go into that much detail everywhere, but you will notice topics for discussions when talking to your colleagues. Writing it down in detail like this noticeably improved the uniformity of our individual ratings. You can even have two different people evaluate the same tool to make sure of that. Again, just iterate until you hit the mark.
Learning #4: Specify the exact number of points to award for specific features to ensure that the different people give comparable scores when evaluating different tools.
After math
Now that you hopefully have a giant Excel sheet full of tools and scores, it’s time to make sense of it all. By using weighted averages and subcategories, we can paint a good picture of where different tools shine and where you would rather avoid them (i.e. combine them with another tool that shines in this respect). Of course, everyone wants to break it down to a 0-4 stars total rating (and of course we obliged) but be sure to answer the question “Which tool is the best” with a big asterisk and a consultant’s favorite line: “It depends”.
Do note that we only started adding up the numbers at the very end. This was done on purpose to avoid biasing our evaluations.
At this point you might be in for some surprises – at least we were. Some tools that looked very promising in the beginning turned out to end up with rather low scores. If you see this, you know you avoided bias, and the structured approach just made your evaluation more objective!
We did our evaluation in the context of working for strongly regulated industries, putting a big emphasis on auditability and compliance. Therefore, our rankings may not apply to your use case (of course we will happily check this for you and send you a quote for a tailored evaluation 🤝).
Learning #5: The best tool for SSCS really – and I never get tired of saying it – depends on your use case.
If your goal is to quickly find good and free tooling, read our second post in this series.
However, if you invest the time and go through the steps outlined above, you will end up with a good foundation for making educated decisions. This will help you improve on the categories you defined (in our case SSCS) above. Additionally, you will be able to save money by knowing exactly which tool to use for which job and not falling trap to some fancy marketing slides.