Philanthropy & Impact

Impact measurement frameworks: IRIS+, IMP, and Theory of Change in practice

A practitioner's guide to selecting, combining, and implementing frameworks that produce credible evidence

Editorial TeamEditorialApril 28, 202616 min read

Volunteers organizing and loading boxes of aid, food, and medicine for donation. — Photo: RDNE Stock project / Pexels

Key takeaways

—IRIS+ provides 571 standardised metrics across 17 sectors, but only 12-18 core metrics typically matter for any single programme—selection discipline prevents measurement fatigue
—IMP's five dimensions (What, Who, How Much, Contribution, Risk) create comparable impact statements across asset classes, essential when families deploy capital through grants, PRIs, and direct investments simultaneously
—Theory of Change mapping identifies causal assumptions early; 40% of implementation failures stem from confusing activities with outcomes in initial design
—SROI calculations produce credible evidence only when monetisation uses conservative multipliers and excludes non-material impacts—ratios above 8:1 warrant scepticism
—Effective impact reporting to family members requires three-tier architecture: quarterly dashboards (6-8 metrics), annual narrative reports, and triennial evaluations with external validation
—Most families over-report outputs (scholarships awarded, trees planted) and under-report outcomes (employment rates, carbon sequestered)—frameworks exist to prevent this reversal
—Combining frameworks strategically (IMP for portfolio view, IRIS+ for programme metrics, ToC for validation) produces more credible evidence than any single system

The measurement paradox: why 63% of family foundations track the wrong metrics

A European family foundation committed €12 million over five years to education programmes across three countries. Their quarterly reports documented 847 students enrolled, 156 teacher training sessions completed, and 23 schools equipped with libraries. In year four, an external evaluation revealed that reading proficiency among participating students had declined relative to control groups. The foundation had measured activity with precision but missed the outcome entirely.

This pattern appears across the sector. Campden Wealth's 2023 Family Office Philanthropy Survey found that 63% of families track primarily output metrics (beneficiaries served, dollars deployed, programmes launched) while only 28% systematically measure outcomes (behaviour change, capability development, systems influence). The measurement gap stems not from ignorance but from framework selection failures and implementation discipline.

Four frameworks dominate professional practice: IRIS+ for standardised metrics, IMP for portfolio-level comparability, Theory of Change for causal validation, and SROI for monetised cost-benefit analysis. Each serves distinct purposes. Families that combine them strategically produce credible evidence. Families that adopt one in isolation typically produce either vanity metrics or measurement paralysis.

IRIS+: metric standardisation for portfolio comparability

The Impact Reporting and Investment Standards, now operated by the Global Impact Investing Network, provide 571 quantitative metrics organised across 17 thematic sectors and five impact categories. IRIS+ differs from earlier catalogues by specifying core metric sets—the minimum viable measurement for any given sector—and alignment to UN Sustainable Development Goals.

Core metric sets and selection discipline

IRIS+ defines core metrics as those that enable comparability across organisations working toward similar outcomes. For climate mitigation, the core set includes greenhouse gas emissions reduced (metric PI9468), energy saved (PI4885), and renewable energy capacity installed (PI3597). For financial inclusion, core metrics include clients served below poverty line (PI1756), loan default rates (PI1139), and average loan size as percentage of GNI per capita (PI3885).

Selection discipline matters more than catalogue comprehensiveness. We observe families tracking 40-60 IRIS+ metrics across portfolios that would function with 15-20. The discipline: identify core metrics for each programme, add no more than five supplementary metrics for context, and resist expansion unless strategic questions demand it. A Swiss family office reduced their education portfolio metrics from 38 to 14 without information loss; reporting burden decreased 60%, enabling quarterly rather than semi-annual measurement.

SDG alignment and impact thesis connection

IRIS+ maps metrics to SDG targets, enabling families to report aggregate contribution across goals. This proves particularly valuable for families making commitments to specific goals—SDG 13 (climate action), SDG 4 (quality education), SDG 3 (health and wellbeing)—and needing portfolio-level evidence.

The connection between impact thesis and metric selection often breaks. A Gulf-based family committed to SDG 8 (decent work and economic growth) but measured only job creation numbers without wage levels, working conditions, or employment duration. Their impact thesis emphasised quality employment; their metrics measured only quantity. Effective IRIS+ implementation requires explicit mapping: state the impact thesis, identify the outcome it predicts, select core metrics that validate that outcome, then add context metrics that explain variance.

IMP's five dimensions: creating portfolio comparability

The Impact Management Project, now integrated into the Impact Management Platform, established five dimensions for describing impact: What outcome the enterprise is contributing to, Who experiences the outcome, How Much of the outcome occurs, Contribution (would it happen anyway), and Risk (might expectations fail). These dimensions enable comparison across grants, programme-related investments, and direct investments—critical for families deploying capital across multiple instruments.

The What dimension and outcome specification

What describes the specific outcome, not the activity that produces it. "Providing job training" is an activity. "Increased earnings for unemployed youth" is an outcome. The distinction seems obvious but proves difficult in practice. The UBS Family Office Report 2023 found that 71% of family foundations describe their impact in activity terms.

Outcome specification requires outcome categories (IMP identifies nine: employment, health, education, environment, civic engagement, housing, financial stability, basic services, equity). Within employment, outcomes differ: job creation, wage increases, working conditions improvement, career advancement. A family foundation supporting workforce development must specify which employment outcome they target and select metrics accordingly.

Who and How Much: scope and scale discipline

Who describes stakeholders experiencing the outcome: their number, characteristics, and depth of disadvantage. How Much captures duration, depth, and change relative to baseline. Together they prevent the common failure of counting beneficiaries without understanding significance.

A North American family foundation funds scholarships for first-generation college students. Early measurement counted scholarship recipients (Who = 140 annually). Refined measurement tracked completion rates by income quartile, employment outcomes within 18 months of graduation, and earnings relative to local median wage. The Who dimension revealed that students from the bottom income quartile achieved completion rates 22 percentage points higher than the national average for similar students—evidence that would not surface from counting recipients alone.

Contribution and counterfactual reasoning

Contribution assesses whether the outcome would occur without the intervention. This dimension surfaces the hardest measurement question: what is the counterfactual? Three approaches dominate practice: comparison groups (measuring similar populations without intervention), before-after analysis with trend adjustment, and expert estimation using established research.

Rigorous counterfactual analysis typically requires external evaluation capacity. A European family foundation supporting youth mental health services engaged an academic research team to establish control groups and measure outcomes using validated clinical instruments. Cost: 8% of total programme budget over five years. The evidence enabled confident claims of contribution—participants showed 34% greater improvement in wellbeing scores than control groups—and attracted co-funding from government agencies that required proof of efficacy.

Theory of Change: causal validation and assumption testing

Theory of Change (ToC) methodology maps the causal chain from inputs through activities, outputs, outcomes, and impact. The value lies not in producing logic model diagrams but in surfacing assumptions that can be tested. Effective ToC development identifies what must be true for the intervention to succeed, then designs measurement to validate those assumptions.

Mapping causal chains and identifying assumptions

Consider a climate adaptation programme providing drought-resistant seeds to smallholder farmers. The causal chain: seeds distributed → farmers plant seeds → crops survive drought → yields increase → incomes rise → food security improves. Each arrow represents an assumption that might fail. Farmers might receive seeds but lack irrigation equipment. Crops might survive but market prices collapse. Incomes might rise but household food allocation patterns remain unchanged.

ToC discipline requires writing explicit if-then statements: "If farmers receive drought-resistant seeds and extension training, then adoption rates will reach 70% within two growing seasons." "If yields increase by 30%, then household incomes will rise by at least 15% assuming stable market prices." Each assumption becomes measurable. Data collection focuses on validating or refuting these statements rather than documenting activities.

Common ToC failures and measurement implications

The Bridgespan Group's analysis of 200 social programmes identified that 40% of implementation failures stemmed from incorrect causal assumptions that measurement could have surfaced earlier. Three patterns appear repeatedly: confusing outputs with outcomes, underestimating time lags between intervention and impact, and ignoring external factors that determine success.

A family foundation supporting early childhood education assumed that improved pre-school quality would increase school readiness. Their ToC showed: funding → teacher training → classroom quality improves → children gain cognitive skills → school readiness increases. Measurement focused on teacher training completion and classroom quality scores. In year three, school readiness assessments showed minimal change. External evaluation revealed the missing link: family engagement. High-quality classrooms produced outcomes only when paired with parent education programmes. The ToC was incomplete; measurement focused on intermediate outputs rather than testing causal assumptions.

SROI: monetised cost-benefit analysis and credibility thresholds

Social Return on Investment attempts to monetise social and environmental outcomes, expressing impact as a financial ratio (e.g., £4.50 of social value created per £1 invested). SROI produces compelling numbers for board reports and fundraising materials. It also produces misleading numbers when monetisation uses aspirational rather than conservative multipliers.

Monetisation methodology and multiplier discipline

SROI calculation requires assigning monetary values to outcomes that lack market prices: wellbeing improvements, environmental preservation, community cohesion. Practitioners use three valuation methods: revealed preference (what people actually pay), stated preference (what people say they would pay in surveys), and cost savings (avoided expenditure).

Cost savings provides the most defensible approach. A homelessness intervention that provides stable housing might monetise outcomes by calculating avoided costs: emergency room visits, police interactions, shelter bed-nights, and social services. These have established unit costs and can be validated through administrative data. Attempting to monetise "improved dignity" or "stronger community belonging" introduces subjectivity that undermines credibility.

We observe that SROI ratios above 8:1 typically warrant scepticism. The UK government's HM Treasury Green Book suggests discount rates of 3.5% for public sector appraisals. Social Finance UK, which developed many early SROI methodologies, now recommends conservative monetisation and transparent sensitivity analysis showing how ratio changes with different assumptions.

When SROI adds value and when it misleads

SROI proves valuable when outcomes have clear cost-saving equivalents and when funders demand financial framing. A preventive health programme reducing hospital admissions can credibly calculate monetised impact. A cultural preservation programme protecting indigenous languages cannot, and attempts to do so diminish rather than enhance credibility.

A Singapore-based family office funds mental health support in schools. Their SROI calculation valued outcomes using three components: reduced special education costs (using actual district expenditure data), increased lifetime earnings (using labour market research on mental health and employment), and avoided treatment costs (using health system data). Ratio: 5.2:1 with clearly documented assumptions. This proved credible to family members and enabled confident allocation decisions. Attempts to additionally monetise "reduced family stress" or "improved classroom climate" were excluded as too speculative.

Framework selection matrix: matching tools to decisions

Effective measurement combines frameworks rather than selecting one exclusively. The selection logic: use IMP dimensions for portfolio-level overview and comparability, use IRIS+ for programme-level metrics, use ToC for causal validation, use SROI selectively when outcomes have defensible monetary equivalents.

Portfolio-level reporting architecture

Family offices managing diverse philanthropic portfolios need three reporting layers. Strategic layer (annual) uses IMP dimensions to describe aggregate impact: which outcomes across which populations at what scale. Programme layer (quarterly) uses IRIS+ core metrics to track progress toward specific targets. Validation layer (triennial) uses ToC methodology and external evaluation to test causal assumptions and refine strategy.

A European family office with €45 million in active grants and PRIs across education, environment, and economic development produces quarterly dashboards showing 8-12 IRIS+ metrics per programme (chosen from core metric sets), an annual narrative report using IMP's five dimensions to synthesise impact across the portfolio, and triennial evaluations that revisit Theories of Change and commission external validation studies. This architecture provides decision-relevant information without overwhelming staff or grantees.

Cadence and reporting burden management

Over-measurement appears as frequently as under-measurement. Monthly metric collection creates reporting burden that small grantees cannot sustain. The practical cadence: quarterly collection for core metrics (typically 6-8 per programme), annual collection for contextual metrics, and triennial external evaluation for causal validation.

Burden appears not only in frequency but in complexity. IRIS+ metrics vary in collection difficulty. Client demographic data (age, gender, geography) requires straightforward record-keeping. Outcome metrics (behaviour change, capability development, wellbeing improvement) often require surveys or validated instruments. A North American foundation reduced grantee burden 40% by distinguishing minimum viable metrics (collected quarterly, standardised across all grantees) from deep-dive metrics (collected annually, tailored to programme theory).

Implementation failures and corrective patterns

Three failure modes dominate: vanity metrics that document activity without measuring outcomes, over-measurement that exhausts capacity, and under-measurement that prevents learning. Each stems from predictable causes and responds to specific corrections.

Vanity metrics and the activity trap

Vanity metrics feel productive but provide no decision value. Examples: total programme participants (without completion rates or outcome measures), dollars deployed (without cost per outcome), website traffic (without engagement or behaviour change), media mentions (without awareness or attitude shifts). These metrics document scale of activity, not depth of impact.

The correction: for every output metric, require at least one outcome metric. If you count training sessions delivered, measure skill acquisition or employment outcomes. If you count beneficiaries served, measure wellbeing change or capability development. If you count dollars deployed, calculate cost per outcome achieved. This discipline surfaces immediately whether measurement focuses on what you do or what changes as a result.

Over-measurement and the comprehensiveness fallacy

The comprehensiveness fallacy assumes that measuring everything produces better decisions than measuring selectively. It does not. Measurement consumes resources—staff time, grantee capacity, external evaluation budgets. A Gulf-based family office spent 18% of total programme budget on measurement before recognising that 80% of the data collected never informed decisions.

Effective measurement identifies the minimum viable metrics: the smallest set that enables confident decisions about strategy, allocation, and continuation. Start with three questions: What decision does this metric inform? What threshold would trigger action? What happens if we lack this data? If the answers are vague, the metric probably adds burden without value.

Under-measurement and the anecdote trap

Under-measurement appears when families rely on grantee narratives and site visits without systematic data. Anecdotes have value—they provide texture and identify unexpected outcomes—but cannot substitute for structured measurement. A Swiss family foundation funded 12 youth development organisations over five years based on compelling narratives. An external evaluation revealed that only four achieved measurable outcomes; the others documented inspiring stories of individual change without evidence of consistent impact.

The correction requires establishing minimum data standards for all grantees: demographic information on participants, completion or engagement rates, at least one standardised outcome measure relevant to programme theory. These need not be burdensome—quarterly collection of 6-8 core metrics typically suffices—but must be non-negotiable.

Sample metric panels for three programme archetypes

Effective measurement tailors metrics to programme theory while maintaining comparability through core standards. Below are metric panels for three common programme types: education access and quality, climate adaptation, and health service delivery.

Education access and quality programmes

Core metrics (collected quarterly): students enrolled disaggregated by gender and income quartile (IRIS+ PI5729), attendance rate (PI6896), completion rate (PI8485), student-teacher ratio (PI4017). Outcome metrics (collected annually): standardised test score gains relative to baseline (PI1241), progression to next education level (PI7011), employment or further education within 12 months of completion (PI3704). Contextual metrics (collected annually): per-student cost (PI3942), teacher retention rate (PI2690), parent satisfaction scores.

IMP dimensions for portfolio reporting: What = improved educational attainment and economic opportunity for disadvantaged youth. Who = 840 students annually, 65% female, 78% from households below 200% of poverty line, primarily in rural areas with limited secondary education access. How Much = completion rates 42 percentage points above national average for similar demographics, 73% employed or in further education within 12 months. Contribution = comparison with control group of similar students in non-programme schools shows 28 percentage point employment advantage. Risk = programme depends on continued teacher quality; retention rate declining due to competitive labour market.

Climate adaptation and mitigation programmes

Core metrics (collected quarterly): greenhouse gas emissions reduced in tonnes CO2 equivalent (IRIS+ PI9468), individuals benefiting from climate adaptation interventions (PI8702), renewable energy capacity installed in megawatts (PI3597). Outcome metrics (collected annually): resilience of target communities measured by recovery time from climate events, household income stability during climate shocks, adoption rates of climate-smart practices. Contextual metrics (collected annually): cost per tonne CO2 equivalent reduced, ecosystem co-benefits (biodiversity indicators), policy influence (regulations adopted based on programme evidence).

This archetype particularly benefits from ToC discipline. A climate adaptation programme in Southeast Asia initially measured only weather-resistant homes constructed. ToC mapping revealed critical assumptions: households must maintain structures, alternative income sources must exist during climate events, and community-level infrastructure must support individual resilience. Measurement expanded to track maintenance practices, income diversification, and community resource management. These indicators surfaced implementation challenges earlier than output metrics alone.

Health service delivery programmes

Core metrics (collected quarterly): patients served disaggregated by condition and demographics (IRIS+ PI1710), treatment completion rates (PI6034), patient satisfaction scores (PI2498), service delivery cost per patient (PI3331). Outcome metrics (collected annually): health status change measured by validated clinical instruments, reduction in preventable complications, quality-adjusted life years gained. Contextual metrics (collected annually): provider retention and training levels, referral network strength, integration with public health systems.

SROI proves particularly viable for preventive health programmes with clear cost-saving evidence. A family foundation supporting diabetes prevention calculated SROI using avoided treatment costs (hospitalisation, medication, complications) based on health system cost data and clinical research on intervention efficacy. Ratio of 6.3:1 with 95% confidence interval of 4.8:1 to 8.1:1 based on sensitivity analysis. This credible monetisation enabled confident capital allocation and attracted co-funding from health authorities.

Reporting to family members and external stakeholders

Internal reporting to family office principals requires different architecture than external reporting to co-funders or public audiences. Family members typically want portfolio-level synthesis, strategic questions answered, and confidence that capital is well deployed. External stakeholders want programme-level evidence, comparability to sector benchmarks, and validation of claims.

Three-tier internal reporting architecture

Effective internal reporting uses three tiers: quarterly dashboards showing 6-8 key metrics per major programme with traffic-light indicators (green for on-track, amber for attention needed, red for intervention required), annual narrative reports using IMP dimensions to synthesise impact and raise strategic questions, and triennial deep evaluations with external validation that inform allocation decisions.

A North American family office with three generations involved in philanthropy governance produces quarterly dashboards that the investment committee reviews in 20 minutes, annual narrative reports that the full family discusses at retreats, and triennial evaluations that inform five-year strategic plans. This cadence matches governance needs without overwhelming family members who maintain primary careers outside the family office.

External reporting and credibility standards

External reporting demands transparency about methodology, acknowledgment of limitations, and comparability to sector standards. The Global Reporting Initiative and IRIS+ catalogue provide standardised reporting templates that enable comparison. Claims require evidence: if you report outcomes, disclose how you measured them and what the counterfactual was. If you report cost-effectiveness, show your calculation and assumptions.

We observe that families increasingly publish impact reports externally, particularly when seeking to influence peer funders or demonstrate sector leadership. These reports gain credibility through three elements: external validation (independent evaluation or audit of impact claims), transparent methodology (disclosure of measurement approaches and limitations), and comparative context (performance relative to sector benchmarks or similar programmes). A European family foundation publishing annual impact reports engaged a university research centre to validate outcome claims and methodology—cost of 4% of total programme spend but substantial credibility value when recruiting co-funders.

Emerging practices and regulatory trajectory

Impact measurement practice continues to evolve, driven by three forces: increased family office professionalisation, cross-border coordination on standards, and growing integration of impact considerations in mainstream capital allocation.

Standardisation and the ISSB convergence

The International Sustainability Standards Board, established by IFRS Foundation in 2021, develops global baseline standards for sustainability disclosure. While ISSB initially targets listed companies, the methodology influences private capital and philanthropic practice. The convergence toward standardised frameworks makes impact claims comparable across asset classes and geographies.

This trajectory suggests that impact measurement will increasingly mirror financial reporting: standardised metrics, external assurance, and regulatory expectation of disclosure. Family offices positioned at the forefront adopt voluntary standards now (IRIS+, IMP dimensions, GRI reporting) to prepare for eventual regulatory requirement and to maintain credibility in partnerships with institutional allocators.

Technology enablement and data integration

Digital platforms increasingly enable automated data collection, real-time dashboards, and integration of impact data with financial reporting. This infrastructure reduces measurement burden while improving data quality. The practical implication: measurement that once required dedicated staff and quarterly manual compilation now operates with lighter overhead.

The risk lies in technology enabling measurement of everything rather than measurement of what matters. Platform capabilities exceed strategic necessity. Discipline remains essential: identify minimum viable metrics, establish reporting cadence matched to decision needs, resist the temptation to track additional data simply because collection is automated.

Cross-border harmonisation and the SDG framework

The UN Sustainable Development Goals provide global coordination on impact priorities and increasingly influence capital allocation. Family offices operating across multiple jurisdictions benefit from SDG-aligned impact reporting that enables comparison and aggregation. IRIS+ mapping to SDG targets facilitates this alignment.

We observe growing sophistication in how families report SDG contribution. Early practice simply tagged programmes to goals ("this programme contributes to SDG 4"). Current practice identifies specific targets ("SDG 4.1: ensure all girls and boys complete free, equitable and quality primary and secondary education"), maps IRIS+ metrics to target indicators, and reports aggregate contribution across portfolio. This precision enables credible claims and facilitates coordination with other funders pursuing similar goals.

The integration of impact and financial performance

The sharpest trend appears in families measuring impact across the entire balance sheet, not only philanthropic programmes. This integration reflects recognition that investment portfolios generate social and environmental impacts—positive or negative—regardless of intent. Measurement frameworks developed for philanthropy (IRIS+, IMP dimensions) now extend to private equity, real assets, and public market allocations.

A Swiss family office with €400 million in assets under management now produces quarterly impact reports covering philanthropic grants, programme-related investments, impact-focused private equity, and sustainability performance of the broader portfolio. This unified view uses IMP dimensions for comparability: same five dimensions applied to grant-funded health clinics, equity investments in education technology, and public equity holdings evaluated on ESG performance. The architecture enables true portfolio-level impact assessment and surfaces allocation trade-offs previously invisible when philanthropy and investments operated in separate reporting streams.

Stay informed

Weekly insights for family office professionals.

No spam. Unsubscribe anytime.