Does financial AI developed primarily on US/EU data systematically misprice African credit risk? Evidence from SA lending markets

The short answer is yes, though "mispricing" is not quite the right word. The deeper problem is that these models were never really designed to price African credit risk at all. They were built on the assumption that creditworthiness leaves a formal paper trail: credit bureau records, mortgage histories, salaried employment. In South Africa, that assumption fails for a large majority of the adult population. What follows is not bias in the technical sense of a flawed parameter; it is a fundamental data mismatch between the economy these models were built for and the economy they are being deployed in.^[1]

The training data problem

Standard credit AI relies on features that describe the minority of African consumers, not the majority. In many emerging markets, formal credit bureaus cover less than 20% of the population.^[2] When a model encounters a South African borrower with no credit bureau record, it reads absence of data as elevated risk, rather than as evidence of participation in a cash-informal economy. The result is systematic false-positive risk flags: creditworthy people are declined or offered worse terms, not because their finances are precarious, but because the model cannot see them.^[1]

This is worth sitting with for a moment. The model is not making a prediction about the person. It is making a prediction about people who look like the people it was trained on, and in this context "looking like" means having a formal banking footprint. Someone who sends money through a stokvel, buys airtime in small top-ups, and pays rent in cash is essentially invisible to a Western-trained credit model.^[3]

South Africa as a case study

South Africa is a useful lens because it combines a relatively sophisticated financial sector with extreme economic stratification. The formal and informal economies coexist at scale, which makes the consequences of this mismatch unusually visible.

The unscorable population. Roughly 40% of South African adults cannot be scored by conventional AI credit tools, not because they represent elevated default risk, but because they have no formal financial footprint these systems can read.^[2] Many of them turn to mashonisas, informal neighbourhood moneylenders operating outside the National Credit Regulator's oversight, who commonly charge between 30% and 50% per month.^[4] That is what credit exclusion costs at the individual level.

Banks acknowledging the gap. Major institutions including Capitec and Standard Bank are now explicitly pairing AI outputs with traditional credit judgment, rather than relying on AI models alone. The stated rationale is local calibration: pure AI stacks lack the contextual sensitivity needed to differentiate risk accurately across South Africa's economic diversity.^[5]

Academic and regulatory acknowledgment. A 2024 paper published in the South African Journal of Economic and Management Sciences proposed using Shapley values (SHAP) to improve the explainability of ML credit models, directly addressing the problem of black-box outputs that neither regulators nor borrowers can interrogate.^[6] The November 2025 SARB/FSCA joint AI report called on financial institutions to ensure AI systems do not erode fairness, transparency, or accountability in consumer-facing decisions.^[7]

Gender and structural compounding

The bias does not distribute evenly. Across sub-Saharan Africa, women face loan approval rates roughly 15-20% lower than male applicants with comparable profiles.^[8] The Dzreke & Dzreke (2025) audit, which tested 10 credit scoring algorithms used by digital lenders across South Africa, Kenya, and Nigeria using 1,200 synthetic business profiles with identical financial metrics but varying gender signals, found that female-coded profiles faced a 37 percentage-point lower approval rate than their male equivalents, and received loan amounts 15-30% smaller when approved.^[9] It is worth being precise about the methodology: these were synthetic profiles, not real loan applications. But the mechanism the audit exposes, sector coding and networking-pattern proxies encoding gender, is consistent with how proxy discrimination operates in production systems.

The same audit found something counterintuitive: digital-only fintechs, which market themselves as bias-free alternatives to human loan officers, actually showed a wider discrimination gap (around 29%) than traditional banks (around 18%).^[9] Human underwriters, for all their acknowledged flaws, appear to partially compensate for baseline model bias when they have enough context to override it. The AI cannot.

Why imported models fail structurally

Three mechanisms drive the mispricing:

Proxy inversion. Variables like residential postcode, device type, or social network density work as creditworthiness proxies in the US and EU because they correlate with income stability in those economies. In South African township and peri-urban contexts, those same proxies can correlate differently, or inversely. A low-end Android handset, for instance, might index poverty in a US training set, while in Soweto it might simply reflect normal consumer behaviour.^[1]

Feedback loop amplification. AI models trained on historical lending decisions inherit the exclusionary patterns of those decisions. When those models then generate new rejections, that rejection data gets added to future training sets. The discriminatory boundary does not drift toward fairness over time. It gets reinforced.^[10]

Accountability gaps. Many African DFIs and fintechs run on foreign-built AI platforms. The underlying models are not retrained on local data, and the developers carry no accountability to local regulators. Errors that would surface quickly in a domestic context can persist indefinitely when the model owner has no local presence.^[1]

What SA institutions are doing about it

Alternative data. The most direct response is building data from the informal economy that these models cannot see. In June 2025, TransUnion Africa, MTN, and Chenosis launched the TransUnion Telco Data Score, using call data records as a proxy for financial reliability for new-to-credit consumers. Pre-launch validation showed a 25-35% improvement in predictive accuracy over previous alternative data models.^[11] CredoLab, operating in SA, builds creditworthiness profiles from smartphone behavioural metadata (app usage patterns, charging habits) for gig workers and thin-file borrowers.^[12] Experian's Sigma Transcend score combines non-traditional data and ML to produce scores for consumers that bureau-based models cannot process at all.^[13]

Rejection sampling. Some SA fintechs now deliberately approve samples of algorithmically declined applicants to generate actual repayment data, building the feedback loop in reverse and creating training signal from the population the model previously ignored.^[14]

Explainable AI frameworks. SHAP and LIME-based interpretability tools are being deployed by SA banks to allow internal teams and regulators to trace which variables are driving rejections, and identify when a proxy is encoding demographic exclusion.^[15]

Regulatory pressure. The SARB/FSCA November 2025 report requires financial institutions to disclose when AI is used in consumer-impacting decisions like credit assessments, and mandates enhanced oversight to mitigate bias and consumer harm.^[7] South Africa's existing legal framework, including the National Credit Act, POPIA, and Treating Customers Fairly principles, provides partial grounds for challenging automated adverse decisions, though enforcement remains largely court-dependent and the regulatory approach is principle-based rather than prescriptive.^[16]

The gap between SA's framework and harder regulatory regimes like the EU AI Act (which classifies credit scoring as high-risk AI and mandates pre-deployment compliance audits, human oversight, and standardised documentation) is significant.^[17] SA regulators and lenders argue that direct adoption of EU-style rules would constrain the use of alternative data, since those rules were designed for formalized economies and may prohibit the unstructured data types (mobile money flows, telco records) that are most useful for scoring the unbanked.^[18] There is a real tension here, not just regulatory foot-dragging.

The two failure modes

Most discussions of algorithmic bias in credit focus on exclusion, people unfairly denied access. But South Africa illustrates a second failure mode that receives less attention: predatory inclusion.

The UP Law Clinic's research on South African microlending documented how informal and semi-formal lenders identified low-income formal-sector workers (mineworkers, factory employees) with predictable pay-cheques and low financial literacy, and targeted them with high-interest products.^[19] The mechanism was deliberate. Contemporary data-driven lending systems can automate and scale the same logic: late-night mobile data usage, depleted airtime balance, delayed digital payments can all function as signals of financial distress, which an algorithm calibrated for revenue maximization can use to identify who is desperate enough to accept exploitative terms.^[20] The CIPIT (2024) report on automated decision-making in Africa specifically flagged how apartheid-era financial exclusion data, fed into current models, encodes historical redlining as an ostensibly neutral computational output.^[21]

The broader point is this: a model that systematically excludes viable borrowers and a model that systematically targets vulnerable ones are both mispricers. They fail in opposite directions, but from the same root cause: training data that does not reflect the population being scored, deployed in a context where accountability is weak.

Where this leaves SA lending markets

Credit AI built for formal economies treats informality as noise. In South Africa, informality is the signal. Until models are trained on local behavioural and alternative data, validated against SA-specific default experience, and subject to lifecycle fairness auditing, they will continue to misclassify viable borrowers as risks, and in some cases identify the most financially stressed consumers not to help them, but to extract maximum value before default.^[1]^[15]

The institutions doing the most interesting work, TransUnion, CredoLab, Capitec's hybrid model teams, are not importing fixes to US/EU model limitations. They are building around those limitations using locally grounded data. That is probably the right approach. What is less clear is whether it will move fast enough, and at sufficient scale, to close the gap before the next generation of AI credit tools simply re-embeds the same exclusions in a more sophisticated statistical wrapper.

DoesfinancialAIdevelopedprimarilyonUS/EUdatasystematicallymispriceAfricancreditrisk?EvidencefromSAlendingmarkets