The engagement
Alliance Bioversity International and CIAT is a CGIAR research centre focused on agricultural biodiversity and tropical agriculture. I joined as an operations intern for six months with a specific brief: reduce the manual overhead in operational workflows that had accumulated over years of organic growth.
The Nairobi office runs field data collection, reporting, procurement, and partner coordination across East Africa. Most of it was running on spreadsheets, email threads, and institutional knowledge held by individual staff members. The risk was obvious — any staff transition meant knowledge loss, and the manual processes were consuming hours that should have gone to research.
What I found
Four problems drove the work:
Data aggregation was entirely manual. Research staff spent significant time each week pulling data from multiple sources, normalising it, and consolidating it into reports. The process was error-prone and slow — what should have taken minutes took days.
No internal tooling for operational workflows. Requests, approvals, and status tracking happened over email and messaging apps. There was no single source of truth, no audit trail, and no way to see the state of anything without asking someone.
Report queries were slow. Several of the most-used report queries were taking 8–12 seconds to run. At that latency, staff had stopped using them and were doing the work manually instead.
Nothing was documented. Internal APIs and data contracts existed only in the heads of the people who built them. Onboarding a new staff member meant weeks of knowledge transfer that could have been a day of reading.
What I built
Python/Flask RESTful APIs
A suite of RESTful APIs automating the recurring operational workflows. Each endpoint corresponded to a specific manual task — data ingestion from field sources, report generation, status aggregation.
The application structure used Flask's factory pattern with create_app() and environment-based configuration, making the application testable without mocking the entire application context. Blueprints separated the logical areas: data, reports, and operations. Request validation ran at the blueprint level before any business logic executed.
One design decision that paid off: the APIs were built for two audiences simultaneously — the internal web applications and technical staff using Postman or curl directly. This meant clear human-readable field names, predictable pagination, and informative error messages rather than terse codes. Both audiences benefited.
PostgreSQL integration and query optimisation
The data validation pipeline rejected malformed records at the API boundary before they reached the database. Incoming records were validated against schema rules — type checks, range validation, referential integrity — and malformed records were inserted into a rejected_records table with rejection reasons for later inspection. This prevented data quality issues from contaminating downstream reports.
The slow queries were the most immediately impactful fix. Three queries that were taking 8–12 seconds were reduced to under 500ms by adding targeted indexes, removing N+1 patterns, and rewriting subqueries as joins. Staff who had stopped using the reports started using them again within a week of the fix going live.
Schema changes were managed via versioned migration scripts committed to the repository. No manual schema changes were applied directly to production — a discipline that prevented the kind of drift that had made the existing system hard to reason about.
Internal web applications
The web applications were built for non-technical users. The UX decisions that drove adoption:
- Forms over tables for data entry — validation feedback inline, not after submission
- Status visibility on every operational item — current state and last editor always visible
- Organisation SSO integration — no separate login, no password to forget
The APIs were functional within the first month. The web applications took longer to reach consistent use — not because of technical issues, but because changing how people work requires more than shipping a feature. Iterative UX refinements driven by staff feedback over the following months drove adoption from occasional use to daily use.
API documentation
A complete OpenAPI specification documented every endpoint: request and response shapes, error codes, example payloads. This became the team's primary technical reference and the document I'm most proud of from the engagement. It outlived my time there and reduced onboarding time for new staff in a way that no amount of working code could have done on its own.
What I learned
Documentation is a first-class deliverable in research organisations. Working code without documentation doesn't survive staff turnover. The OpenAPI spec outlived the engagement. The code will eventually be replaced; the documentation of what the system does and why will be referenced long after.
Validation at ingestion is worth the upfront cost. Rejecting malformed data at the API boundary prevented data quality issues from propagating into reports. The rejected_records table also gave staff visibility into what was failing and why — which turned out to be as valuable as the validation itself.
Non-technical adoption requires UX investment. The APIs were functional immediately. The web applications reached consistent daily use only after iterative refinements driven by staff feedback. Shipping a technically correct solution is not the same as shipping something people will use.
Institutional processes are harder to change than codebases. Some automation changes required extended stakeholder alignment before they could be deployed. Delivering the technical solution was half the work. The other half was organisational.
For repository and implementation details: github.com/edogola4