Improving the quality of survey data on forcibly displaced populations

Background and context

In line with the JDC’s effort on strengthening data systems and filling data gaps, this activity addresses the issues around microdata quality in surveys on Forcibly Displaced Populations (FDPs) and their host communities. Moreover, this activity speaks to JDC’s mission: “to leverage partnerships and innovation in its focus on the collection, analysis, and dissemination of primary microdata that enables policy-making and programming.”

The quality of the survey data is thus paramount, as it not only determines the reliability of currently collected information, but also the quality of any future information, such as remote sensing data, which requires validation through primary data. Any errors introduced during the data collection process of the primary data, will most likely also create further issues with the information reliability in the future. The overall quality of survey data may be determined by the sampling frame and strategy, questionnaire design as well as errors from field.

A sampling frame constitutes the backbone of any probability-based sampling design and therefore plays one of the main roles in determining quality and reliability for the survey. The quality of existing sampling frames in some low- and middle-income countries are often questionable as sometimes, they do not meet the basic criteria of being current, comprehensive and sufficiently informative on the variable(s) of interest in the target population. This problem becomes even more severe, when the target population size is small, and/or the population is hard to reach, as it is very often the case with refugee or IDP populations. In such cases, the construction of the sampling frame, as well as the selection of the appropriate sampling design is crucial to produce statistically valid estimates that ensure representativeness in the sample and thus critical for evidence-based decision making.

The recent survey environment has been defined by a dramatic decline in mobile data collection costs, a rapid spread of global connectivity as well as a requirement for reduced human interactions caused by the COVID-19 pandemic. Large scale remote support can (and must) therefore be made available to National Statistical Offices (NSOs) and other data producers with considerably less obstacles than in the past. And, this even with smaller budgets and more sustainably, by not only providing manuals and trainings, but also the through the dissemination of applications (computational tools) to implement the recommendations themselves and on a continuous basis.

As such, introducing electronic questionnaires not only changes the way data is collected, but also enhances the set of options available for managing a survey. Computer Assisted Survey Systems (CASSs) such as Survey Solutions are critical in ensuring the quality of surveys at all different stages of survey data collection and management as well as by ensuring comparability between different

Activity description

Through its three core areas, this activity produces tools critical to assure quality in the different phases of the survey design and implementation (sampling design & creation, questionnaire design and data collection) all of which are critical in improving the quality of the data collected and managed on populations affected by forced displacement. With the scope of guaranteeing quality in the different stages of survey design and implementation, this project proposes different tools and methods for:

  1. Sampling and sampling frame; this area of the activity will develop open source tools using R/Python and Shiny to provide data collecting teams with user-friendly, advanced sampling approaches for forcibly displaced and host communities’ populations that will require minimal programming skills. To cover the target populations in different population settings (i.e. returnee vs. IDPs, host- vs. camp-population) with the most effective sample design, the tool will allow implementation not only of standard sampling approaches (i.e. two stage stratified area sample), which can be used to sample households in host communities, as well as:
    • Sampling from a spatial grid in the absence of any existing area- or list-frame;
    • Use of gridded population data like WorldPop (https://www.worldpop.org/ );
    • Sampling based on other satellite-based data.
  2. Questionnaire design; This area of the activity will be to cover questionnaire template creation, sharing and administration. The proposed solution will enable users planning to collect data, to easily create questionnaires tailored for populations affected by forced displacement with basic to minimal programming knowledge required. This activity can be expanded further to the possibilities of advanced validation and smart routing made possible using mobile devices for data collection. The questionnaire templates designed will incorporate recommended checks and verifications of question and response options, as recommended by the JDC. These questionnaires will also be able to incorporate some of the sampling applications proposed in the first activity (e.g. display of geographic boundaries, preloading of survey unit location etc.) enabling a complete and comprehensive data collection process. Building of a repository for the questionnaire templates will be useful to increase access and use of these templates by different users and help build on these templates to improve the quality of the survey instruments over time.
  3. Fieldwork and data quality assurance: This area of the activity will develop guidelines and an application to facilitate quality control through the analysis and use of survey paradata as produced by the Survey Solutions CASS, as well as the observance of global validations on survey data collected on FDPs and host communities. Real time quality monitoring and the use of paradata is a relatively new and underused source of data that is present in all electronic surveys, but often presents challenges for the survey practitioners due to the novelty and significant volume of such data. Processing paradata opens avenues for understanding the behavior of the survey respondents, interviewers and improvement of the questionnaire. This activity will also develop training materials and guidelines for implementing multiple interviewing modes during the survey.

All applications and workflows created under this activity will initially be developed on the World Bank’s Survey Solutions (CASS). These applications may subsequently be extended to either include or contain provisions for the extension to other data collection or CASSs.

Overall objectives

The objective of this activity is to develop and provide tools, guidelines and methods for improving data quality for surveys on FDPs and host communities, as well as build capacity around their use. The proposed tools will be built around free or open source solutions and designed to operate in capacity-constrained environments.

Using a flexible CASS like Survey Solutions and high-performance data processing languages like R and Python, this activity will produce the tools required to implement advanced survey designs and ensure high quality in the different stages of survey data production by still maintaining only moderate skill requirements. The outputs of this activity can be adopted to ongoing and planned JDC data collection activities as it will equip the data producers with a set of hands-on, user-friendly and affordable tools which can easily be adapted to different contexts and future needs. The tools will cover all aspects of data production on populations affected by forced displacement, with data quality at the core of each stage.

Engagement with partners

The core methodological and software development is planned to be carried out by the WB team with JDC colleagues continuing to play an active linking role as well as the provision of feedback during pilot implementations. The WB team will ensure effective and efficient knowledge exchange continues within the scope of the activity as well as wider survey data collection agenda and with all other relevant stakeholders.

Contact

For further details on this activity, please contact: