This post is also available in: heעברית (Hebrew)

An international team of researchers has developed a new algorithm that analyzes Twitter data to identify smaller disaster-related events, known as sub-events, and generate highly accurate, real-time summaries that can be used to guide response activities.

The team from Penn State, the Indian Institute of Technology Kharagpur, and the Qatar Computing Research Institute created an algorithm that looks at the crisis as it happens.  Prasenjit Mitra, associate dean for research in Penn State’s College of Information Sciences and Technology and a contributor to the study, said: “The best source to get timely information during a disaster is social media, particularly microblogs like Twitter.” Analyzing this data and using it to generate reports related to a sub-topic of a disaster — such as infrastructure damage or shelter needs — could help humanitarian organizations better respond to the varying needs of individuals in an affected area.

One of the problems is the large volume of data produced. Manually managing this process in the immediate aftermath of a crisis is not always practical.

In the study, the group collected more than 2.5 million tweets posted during three major global catastrophes — Typhoon Hagupit that hit the Philippines in 2014, the 2014 flood in Pakistan, and the 2015 earthquake in Nepal. Then, volunteers from the United Nations Office for the Coordination of Humanitarian Affairs trained a machine learning system by manually categorizing the tweets into different sub-events, such as food, medicine and infrastructure, according to homelandsecuritynewswire.com.

Once the system can identify tweets with a high level of accuracy, the researchers allow the system to categorize large amounts of data quickly and accurately without human intervention. As events develop, however, new categories of content appear that require the process to restart.

Their algorithm identified noun-verb pairs representing sub-topics — such as “bridge collapse” or “person trapped” — and ranked them based on how frequently they appear in tweets. Then, they created an algorithm to write summaries on the broad event and the identified sub-events. Finally, human evaluators ranked the usefulness and accuracy of sub-events and auto-generated summaries against those created by other existing methods.

The evaluators found the research group’s method to be more relevant, useful and understandable compared to other leading algorithms.

In the future, the researchers hope to apply their work to specialized situations, such as summarizing information on missing people, and pulling specific information from tweets that could create a more thorough description and visualization of an event.