The vulnerability information in the National Vulnerability Database (NVD) and security forums often lack exact URLs of software products or information on how programmers refer to them in the dependency management systems. Programmers and security officers have to manually match the names and version information of dependent software and their vulnerability. To build automatic vulnerability scanning services, we need to collect and combine information from separate independent sources. Our system utilizes keyword matching and natural language processing techniques to hone in and match against each database.
To scale up to billions of records, our system utilizes big data (MongoDB) and search engine (Apache Solr) technologies for storing and indexing data. The system processes each software package independently, which is optimal for parallelization.