In the broad domain of security, analysts and policymakers need knowledge about the state of the world to make timely critical decisions, operational/tactical as well as strategic. This knowledge has to be extracted from a variety of different sources and then represented in a form that will enable further analysis and decision making. Some of the data underlying this knowledge is in textual sources traditionally associated with Open-source Intelligence (OSINT). OSINT is intelligence gathered from publicly available overt sources such as newspapers, magazines, social networking sites, video sharing sites, wikis, blogs, etc. In the cybersecurity domain, information available through OSINT can complement data obtained through traditional security systems and monitoring tools like Intrusion Detection and Prevention Systems (IDPs). Cybersecurity information sources can be divided into two abstract groups, formal sources such as NIST’s National Vulnerability Database (NVD), United States Computer Emergency Readiness Team (US-CERT), etc. and various information sources such as blogs, developer forums, chat rooms and social media platforms like Twitter1, Reddit2 and Stackoverflow, these provide information related to security vulnerabilities, threats and attacks. A lot of information is published on these sources on a daily basis making it nearly impossible for a human analyst to manually comb through, extract relevant information, and then understand various contextual scenarios in which an attack might take place.
Twitter as an OSINT source
Over the past decade, Twitter has become a vital source of open-source intelligence. The social media site’s data has been used by researchers to gather intelligence about the impact of natural disasters, terrorist attacks, government elections, predicting stock markets, etc. In our work, we are interested in using Twitter as a source of information to study various cybersecurity events. Twitter users, as in when new vulnerabilities are made public, tweet about these vulnerabilities (Figures) to spread information on the network so that others can use that particular information to secure their systems. Individuals or reputed security experts like Brian Krebs (an investigative journalist who writes about cybercrime) can be valuable resources for cybersecurity incidents. Established companies like @web security or @intersecww or disseminate news, tips and latest information on web security, web application protection, hacker incidents, data breaches, penetration testing results, etc.
Cyber Framework
We develop CyberTwitter, a framework to automatically issue cybersecurity vulnerability alerts to users (Figure). CyberTwitter begins by collecting relevant tweets by querying the Twitter API. The tweet Collection module collects, cleans and stores tweets returned by the API. Every tweet is further processed by the Security Vulnerability Concept Extractor (SVCE) which extracts various terms and concepts related to security vulnerabilities. Intelligence from these terms and concepts is then converted to RDF statements using our intelligence ontology. We use UCO ontology (Unified Cybersecurity Ontology) to provide our system with cybersecurity domain
information. RDF Linked Data representation is stored in our “Cybersecurity Knowledge Base” allowing our alert system to reason over the data. Finally, we issue alerts to the end-user based on a “User System Profile”. We will further explain various details and sub-modules present in our system in the next few subsections.
Tweet Collection
CyberTwitter collects data through the Twitter Stream API3 based on a set of keywords. These keywords are derived from the “User System Profile” and a list of cybersecurity terms (see Figure). For our system, we limit ourselves to tweets in the English language. After collecting a good number of tweets we clean the data using WordNet, which is a large lexical database for English.
Cybersecurity Ontologies and Knowledge Bases
A data feed sent through the Twitter Stream API essentially consists of a stream of strings that computers can process. However, in the real world, strings represent terms and concepts that may sometimes be ambiguous and computers are not programmed to handle ambiguity. Computer systems can be aided in this task by various Semantic Web technologies that represent the real world as concepts. These concepts are then associated with Uniform Resource Identifiers (URIs). For example, the string “Apple” can be associated with the company Apple Inc. or the fruit apple. Also, these concepts can have various attributes and relations to other concepts.
By: Anjan Neema
(Tech Intern, WCSF)
To stay updated with our blogs, please don't forget to "SUBSCRIBE" us.
To know more about us, please visit: https://www.worldcybersecurities.com/
Comments
Post a Comment