The vulnerability information in the National Vulnerability Database (“NVD”) and security forums typically does not contain exact URLs of software products or info on how programmers refer to them in the dependency management systems (although they share commonly used English names and versions). Typically, programmers or security officers themselves have to match up the names and version info of dependent software and their vulnerability information. To build automatic vulnerability scanning services, we need to collect and combine information from separate independent sources. Our system utilizes various keyword matching and natural language processing techniques to hone in and match against each database.
In order to scale up to billions of records, our system utilizes big data (MongoDB) and search engine (Apache Solr) technologies for storing and indexing our data. In addition, the system processes each software package independently of each other, which is the perfect case for parallelization.