Disclaimer

The primary data collection and verification activities conducted under this project focus on the Embu and Mbeere communities, where cultural data has been collected directly through oral histories, interviews with community elders, cultural practitioners, and local historians, as well as field-based documentation of cultural practices. This primary data forms the core knowledge base used in training and validating the CIS-ETHN conversational AI system.

For the Tharaka Nithi community, the cultural materials currently presented within the system are derived from secondary data sources, including published literature, archival records, and existing scholarly works. Due to funding scope limitations at the current phase, comprehensive primary data collection for Tharaka Nithi has not yet been undertaken. However, these materials are included to ensure regional representation and comparative cultural context.

The project is designed to be iterative and expandable. Subject to the realisation of additional funding support from the National Research Fund (NRF), future phases of CIS-ETHN will incorporate primary field-based data collection for the Tharaka Nithi community, including direct engagement with elders, cultural custodians, and local institutions. This will allow for the enrichment, validation, and updating of existing materials.

The Large Language Model (LLM) underpinning the CIS-ETHN conversational interface is primarily trained on Embu and Mbeere cultural datasets, and therefore functions as a pilot study for the application of AI-driven cultural heritage systems in Kenya. Insights gained from this pilot will inform the expansion of the model to additional communities, ensuring methodological rigor, ethical compliance, and cultural accuracy..

Read More
home1