StarApple AI | Dr. Shirley Budall | November 24, 2025

The Missing Data Problem: Gender Data Gaps and AI Policy Failure in the Caribbean

Before we can build AI systems that serve Caribbean women, we must confront the uncomfortable fact that our data collection systems were never designed to see them clearly.

Data visualisation and analysis in a professional setting

Every AI system is, at its foundation, a reflection of the data it was trained on. If the training data is incomplete, skewed, or systematically biased in how it represents a population, the AI system's outputs will be incomplete, skewed, or systematically biased for that population.

My hypothesis is specific: Caribbean AI systems trained on existing datasets will systematically underserve women because the data they depend on was collected within institutional systems that undercounted, misclassified, or simply ignored women's economic activity, health experiences, and social participation.

The consequences will manifest as AI health tools that make more errors for Caribbean women, credit scoring systems that deny women loans their financial behaviour justifies, social protection AI that fails to identify women in need, and agricultural advisory systems that overlook female farmers.

Understanding the Gender Data Gap

The gender data gap arises when data collection systems treat "male" as the default human and fail to collect, analyse, or report data that specifically captures women's experiences. Economic data has historically focused on formal employment, formal business registration, and formal financial transactions, all of which underrepresent women's economic activity.

In the Caribbean, these global patterns are compounded by specific regional characteristics. The informal economy is a significant component of Caribbean economic life, and women are disproportionately represented within it. The Statistical Institute of Jamaica (STATIN) and its equivalents across CARICOM collect labour force data that captures formal employment more reliably than informal activity.

The Informal Economy and Its Data Shadow

Consider what happens when an AI credit scoring system is trained on data from a Caribbean financial institution. The system will learn to associate creditworthiness with features it can measure: salary records, formal tax contributions, documented business transactions, credit history within the formal banking system.

A woman who sells produce at Coronation Market in Kingston every Saturday morning, who has done so successfully for fifteen years, who manages her household finances with considerable skill, and who has never defaulted on an informal credit arrangement, has almost none of these measurable features. An AI credit scoring system will classify her as a poor credit risk, not because she is, but because the data available about her does not capture the economic reality of her life.

Health Data Gaps and Their Consequences

The most widely documented global health data gap affecting women is the historical underrepresentation of women in clinical drug trials. An AI diagnostic tool trained primarily on data from male patients will have lower diagnostic accuracy for equivalent conditions in female patients. This is a documented phenomenon, not a theoretical risk.

Caribbean-specific health data gaps add additional dimensions. Non-communicable disease data across CARICOM is often not fully disaggregated by sex, age, and socioeconomic status simultaneously. Mental health data for Caribbean women is particularly sparse.

Researcher examining data and statistics in a policy context

The Policy Governance Failure

Gender data gaps in the Caribbean are not primarily a technical problem; they are a governance problem. Governments have the authority to require that data collection systems be designed to capture women's economic activity, health experiences, and social participation. Most Caribbean governments have not exercised these authorities in relation to AI.

The EU AI Act's high-risk AI provisions include explicit data governance requirements for training datasets, including requirements to address known biases. The EU framework therefore already treats training data quality as a compliance requirement rather than a design preference. Caribbean AI governance frameworks should incorporate equivalent requirements.

Recommendations

  1. Commission a Caribbean Gender Data Audit within 12 months. CARICOM governments should jointly commission an audit of the major datasets used in Caribbean AI applications across financial services, health, education, agriculture, and social protection.
  2. Expand STATIN's mandate and capacity to collect gender-disaggregated informal economy data. The Statistical Institute of Jamaica should be given explicit statutory responsibility for collecting data on informal economic activity disaggregated by gender, with dedicated funding for survey methodology development.
  3. Establish minimum data representational adequacy standards for public sector AI procurement. Systems unable to demonstrate that their training data adequately represents Jamaican women should not be approved for public sector deployment.
  4. Require CARPHA to lead a regional health data standards initiative. The Caribbean Public Health Agency should develop and publish minimum standards for sex and gender disaggregation in health data used to train or validate AI systems deployed in Caribbean health settings.
  5. Fund partnerships between Caribbean national statistics offices and mobile network operators. Under appropriately designed privacy-protective frameworks, mobile network operator data could supplement national statistical collection and provide more accurate training data for AI systems serving Caribbean women.
  6. Incorporate gender data adequacy into Jamaica Data Protection Act enforcement. The Office of the Information Commissioner should publish guidance clarifying that the accuracy principle of the Jamaica Data Protection Act 2020 applies to AI training datasets.

Conclusion

The gender data gaps that pervade Caribbean statistical systems are not passive omissions. They are active sources of harm, producing policy decisions, resource allocations, and now AI systems that systematically underserve women.

You cannot build AI systems that serve Caribbean women from datasets that were not designed to see them. Closing the gender data gap is therefore not a peripheral equity concern; it is a foundational technical requirement for any Caribbean AI strategy that claims to serve the entire population rather than only those whose lives have historically been visible in the data.

About the Author

Dr. Shirley Budall is a Caribbean expert in gender, inclusion, and AI governance with demonstrated experience in the ethical, legal, social and governance dimensions of artificial intelligence and digital technologies. Contact: insights@starapple.ai