Services

Speech, audio and audio-visual datasets for AI and research

FYI Africa collects and processes African datasets across speech, sound, video, transcription, annotation, metadata and quality control.

Discuss a Dataset Project View Workflow & Quality

Speech

Collected

Audio Interviews, mobile recordings, noisy environments Structured

Video Speaker video, UX tasks, interaction recordings Reviewed

Text Transcripts, translations, labels, timestamps Annotated

Delivery Files, metadata, consent tracking and QC report Ready

What we do

Four core service areas

Each project is structured around the client’s use case, target languages, data type, consent requirements, metadata needs and delivery specification.

01 / Speech Data Collection

Speech datasets for systems that need to understand African speakers, languages and accents.

FYI Africa collects speech data for voice AI, speech recognition, conversational systems, call-centre AI, voice assistants and multilingual model evaluation.

ASR TTS Voice assistants Call-centre AI Accent adaptation

Includes

Read/scripted speech
Spontaneous/natural speech
Conversational speech
Call-centre/telephony-style audio
Command and control speech
Wake-word recordings
Code-switching speech

Use cases

Automatic speech recognition
Text-to-speech support
Conversational AI
Speech analytics
Voice search
Automotive voice systems
Low-resource language modelling

02 / Audio Data Collection

Audio datasets where sound, speech behaviour, environment or acoustic context matters.

FYI Africa collects audio data in controlled, semi-controlled and real-world environments, depending on the project specification.

Human voice Multi-speaker audio Mobile recordings Noisy environments

Includes

Human voice recordings
Multi-speaker audio
Interview recordings
Group discussions
Product feedback recordings
Task-based user recordings
Mobile-device recordings
Noisy-environment audio

Use cases

Speech model robustness
AI model training and evaluation
Acoustic testing
User research
Product testing
Localisation research

03 / Audio-Visual Data Collection

Audio-visual datasets for multimodal AI, user research and real-world evaluation.

FYI Africa collects consented audio-visual data where spoken content, visual context, task behaviour or interaction environment matters.

Video interviews UX recordings Multimodal AI Interaction data

Includes

Video interviews
Speaker videos paired with audio
Product interaction recordings
Customer service simulations
User experience research recordings
Instruction-following tasks
Screen-and-camera recordings
Multilingual video responses
Code-switching video responses

Use cases

Multimodal AI
Speech-plus-video model testing
Human interaction datasets
Behaviour and context-aware model evaluation
User experience research
Localisation testing

04 / Dataset Processing and Delivery

Turning raw recordings into structured, usable datasets.

FYI Africa can support the processing layer that makes collected data usable for training, testing, evaluation, localisation and research.

Transcription Translation Annotation Metadata QC

Includes

Verbatim transcription
Clean transcription
Local-language transcription
Translation into English
Speaker-labelled transcription
Timestamped transcription
Code-switching transcription

Dataset outputs

Annotation and labelling
Metadata structuring
Quality-control reporting
Structured file delivery
Consent tracking sheets
Delivery summary

Dataset delivery

Complete, usable datasets — not just recordings

FYI Africa structures delivery around the client’s technical requirements, consent needs, metadata fields and quality criteria.

Files

Audio, video or audio-visual files delivered in agreed technical formats.

Text

Transcripts, translations, speaker labels, timestamps and text outputs.

Metadata

Structured fields for language, accent, region, speaker profile, device and environment.

QC

Quality-control reporting, consent tracking and delivery summaries where required.

Start a project

Need a custom African dataset?

Tell us the data type, use case, target languages, sample design and delivery requirements. We’ll help define the right project scope.

Scope a Dataset Project