Software Asset Inventory

The Software Asset Inventory keeps track of all the software that is operating in an organisational environment. These assets might range from business applications operating on an organisational level, to the personal applications installed on the end-user devices.

These software assets include:

  • Code repositories - to identify vulnerabilities in project dependencies.
  • Software installed in servers - to identify vulnerabilities in packages.
  • Software packaged inside Docker container images - to identify vulnerabilities in packages.
  • End-user devices - to identify vulnerabilities in third-party software.

TL;DR

  • A complete software inventory is a critical foundation for preventing attacks
  • We built a solution to quickly find vulnerabilities in software components and Linux servers
  • The solution has been open sourced to support the community

On Friday 9 December 2021, a severe remote code execution vulnerability in Apache’s log4j was announced to the world and tracked as CVE-2021-44228. It took about two days for Vinted engineers to determine all the software components where we had vulnerable log4j libraries. This was an eye-opener for us. We’re now aware that we need a solution to quickly identify software components in the infrastructure, and the Software Asset Inventory project has begun.

A complete software inventory is a critical foundation for preventing attacks, which have become more and more prevalent after the log4shell attacks. In Vinted, we need to monitor our software for vulnerable libraries and unacceptable licences. We also need this information readily available, updated automatically, and of course easy to digest. The capability is especially important for patch management when there are new vulnerabilities published.

Challenges

In order to collect dependencies from a given project, we had to figure out the package manager in use first. Many projects have different package managers and build systems:

  • JVM (Java Virtual Machine) based projects are often built and managed with Gradle or Maven.
  • JavaScript and TypeScript projects typically use npm, pnpm or yarn for package management.
  • Python projects typically use pip, conda or poetry.
  • The Apple community uses three different package managers: CocoaPods, Carthage and SPM (Swift Package Manager). There is a limited amount of tools supporting SBOM (Software Bill Of Material) collection from these package managers.

Many tools are required to fully build the project and resolve transitive dependencies. Additional tools are needed to carve out dependencies and their licences.

Packaging all build systems required to generate SBOMs resulted in a huge Docker container. We package JDK (Java Development Kit) for Gradle and Maven builds, NodeJS to run CDXGen, Android SDK (Software Development Kit) to build Android projects, bundler to extract Ruby dependencies, and the list goes on. The resulting image didn’t conform to the containerisation philosophy, so we had to try harder.

How we built SBOM collection

Most of the requirements for this solution were adapted from the OWASP Software Component Verification Standard.

Vinted’s software inventory solution consists of the following steps:

  1. A SBOM - a nested list of ingredients that make up software components. It optionally also contains the licences that govern those components. At Vinted we use the OWASP (Open Worldwide Application Security Project) CycloneDX standard for SBOMs.
  2. To achieve the best result possible, we combined three tools together in order to produce a SBOM:
    • Cdxgen is used as a first SBOM generator, apart from its capabilities to identify libraries, it also does a great job in identifying licences.
    • Anchore syft is used as another layer on top of cdxgen to have even more coverage. Syft is also used to collect SBOMs from Linux filesystems.
    • RetireJS is used for the detection of vulnerable JavaScript libraries.
  3. Upon filling a SBOM with components from three different collectors (cdxgen, syft, RetireJS) we ended up with a very rich SBOM, but with some duplicates. Each duplicate component is then merged together into one - which enriches results and provides us with a unified SBOM that contains the best of three worlds.
  4. Finally, SBOMs are uploaded to OWASP Dependency Track which is used as a centralised storage for software components. This open source solution also has great analysis capabilities, automatically identifying known vulnerabilities and licence violations for each library in every single project.

However, we still include a Docker image in our repository so that users can run the SBOM collector with ease, without worrying about all tools being present on their host filesystems.

Solution workflow

Language Package manager/build system
JavaScript/TypeScript yarn, npm, pnpm, bower
JVM Family gradle, maven, sbt
Python pip, poetry, conda
Ruby Bundler
Golang go.mod, go.sum, Gopkg
Rust Cargo

SBOMs from Linux servers

To collect SBOMs from Linux servers we made use of the syft library, as it is very efficient in extracting package information from filesystems and Docker images. The Software Asset Collection solution we developed integrates syft as a Go module and then uses it to collect all RPM packages present on the server.

Once the RPM package SBOM is collected, it’s then uploaded to the Dependency Track for further analysis.

It’s important to note that we made an exception for the Dependency Track not to scan these server SBOMs, as we noticed that Dependency Track scanners don’t work well for RPM packages, a lot of false positives were produced. To address this issue, we decided to download raw SBOMs from Dependency Track and scan them with grype for vulnerabilities. The Grype scanner produced more accurate results and allowed us to quickly identify vulnerable server packages.

SBOMs from Linux servers are collected on a weekly schedule. After they are uploaded to Dependency Track, we retrieve them and run Grype scans every Thursday. With this setup, we created a convenient feedback loop that is backed by metrics.

Source code

We have open sourced our Software Asset Inventory solution - https://github.com/vinted/sbomsftw. Please feel free to provide suggestions, open issues or pull requests to make this even better!

Improvements and future work

  1. Integrate Dependency Track with Backstage to enrich component information with service owner, location data.
  2. Create capabilities to generate SBOMs for:
    • Infrastructure as Code
    • More complicated cases, e.g. iOS and macOS apps
    • Cloud infrastructure provisioned using Terraform, Kubernetes, Helm charts.
  3. Maintain continuous development and active support for the codebase.