Social Media Crawler

Software / service definition

The social media crawler can provide an effective content management tool for COVID-related tweets in Italy. The tool can monitor streams of Twitter content that mention that tag #COVID19Italia and #COVID19Italy in their short text. Locations that are mentioned in the text as Named Entities are extracted so as to geo-localise tweets on the Italian map. The tweets are clustered by topic, communities of (anonymized) users are visualized and key-players in the social network are presented.

Usage in COVID19

The tool can enable civil protection agencies to monitor in an effective way the large streams of Twitter content, especially when location information appears in the text but is not provided by Twitter in its indexed tweet representation. Health authorities and LEAs are assisted by this tool in monitoring the most active regions in terms of social media activity, that could lead to hidden interdependencies with traffic in hospitals or roads.

Access

Online web service.

TRL

7

Security and access protocols

Username and password protected for the service, IP protection for the database (MongoDB)

GDPR compliance

The collection of social media data, i.e. Twitter posts (tweets), is carried out in full compliance with the Twitter Development Agreement and Policy and the EU General Data Protection Regulation - GDPR. The collected posts are public and the names of the authors are pseudonymised, ensuring that a tweet cannot lead to a natural person. In addition, the geoinformation of each post derives from an NLP analysis on the content and is not linked to the user account that has published it. Finally, all collected tweets are stored in a secure database, preventing unauthorised access from third parties.

Support point of contact

Ilias Gialampoukidis – heliasgj@iti.gr

Stefanos Vrochidis – stefanos@iti.gr

Previous application scenario

This social media monitoring tool has been used to collect tweets that are relevant to fires, heatwaves and flood events in beAWARE project. A modified version of this tool in the context of EOPEN project, mainly focusses on floods and snow cover observations, not only for the collection of relevant information, but also for the geographical annotation of the locations that are mentioned in text, for the topic detection and clustering of tweets, for the community detection of the user community and the key-player identification to highlight the most central Twitter accounts being active in the social network of user-to-user mentions.

Licence

Free to share, copy and redistribute the material in any medium or format until end of pandemic as per WHO declaration under Creative Commons license CC BY-NC-ND v2.0.

Resources