NASTAC - Work Package 4

Our efforts in terms of data generating and processing have focused on collecting and integrating data on state borders and capitals, settlement areas of ethnic groups, conflict patterns, and railroad networks.

To support the above activities, we purchased two servers with enough computational power (24-core each), memory (1.5 TB and 768 GB), and storage (> 40 TB aggregated). We installed them with Linux and macOS to make sure our results can be replicated under different conditions. A data pipeline to process the data has been developed using a combination of R, Java, and SQL procedures, storing intermediate and final datasets in a geospatial relational database (PostgreSQL/PostGIS). Access to these two servers is protected by firewall and secured through strong encryption with public key authentication. All code is put under revision control for strict reproducibility, and archived alongside the data into three redundant backup systems.