A DATA-DRIVEN APPROACH: YOUTUBE AS A RESOURCE FOR AI, ML AND ROBOTIC DEVELOPMENT

CHIBUZO BENJAMIN ONUIKE; TITILOPE EBUNOLUWA OLATUNJI; CHINONSO MICHAEL EZEKWEM; ONWUKA UKOHA EMEKWO

doi:10.70382/tijsrat.v08i9.051

VOL. 8 2025

Articles

A DATA-DRIVEN APPROACH: YOUTUBE AS A RESOURCE FOR AI, ML AND ROBOTIC DEVELOPMENT

Published 25-06-2025

CHIBUZO BENJAMIN ONUIKE
TITILOPE EBUNOLUWA OLATUNJI
CHINONSO MICHAEL EZEKWEM
ONWUKA UKOHA EMEKWO

CHIBUZO BENJAMIN ONUIKE
Department of Mechanical Engineering, Federal University of Technology, Owerri.

TITILOPE EBUNOLUWA OLATUNJI
Department of Computer Science, Tai Solarin University of Education, Ogun State.

CHINONSO MICHAEL EZEKWEM
Department of Mechanical Engineering, Federal University of Technology, Owerri.

ONWUKA UKOHA EMEKWO
Department of Mechanical Engineering, Federal University of Technology, Owerri.

PDF

DOI: 10.70382/tijsrat.v08i9.051

Article views: 166

PDF downloads: 30

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

CHIBUZO BENJAMIN ONUIKE, TITILOPE EBUNOLUWA OLATUNJI, CHINONSO MICHAEL EZEKWEM, & ONWUKA UKOHA EMEKWO. (2025). A DATA-DRIVEN APPROACH: YOUTUBE AS A RESOURCE FOR AI, ML AND ROBOTIC DEVELOPMENT. International Journal of Science Research and Technology, 8(9). https://doi.org/10.70382/tijsrat.v08i9.051

Download Citation

Abstract

YouTube has become one of the largest repositories of human knowledge, offering multimodal content across fields like education, software development, robotics, and design. Unlike traditional machine learning (ML) systems that rely on curated datasets, YouTube provides a scalable, context-rich alternative by combining speech, visuals, and on-screen text in real-world instructional settings. This study presents a modular, data-driven framework for extracting and transforming YouTube content into structured training material for intelligent systems. While the framework is broadly applicable, Python programming is used as a demonstrative case. YouTube explicitly permits the use of publicly available content for research purposes under its Terms of Service, provided that attribution and ethical use guidelines are followed. In some cases, accessing YouTube’s extended datasets (e.g., via the YouTube Data API or Research Program) requires registration as a verified academic or developer, ensuring compliance with responsible data practices. The methodology follows a mixed-methods, modular system engineering approach, combining automated video data mining with empirical system design and lightweight deployment testing. Key techniques include transcript mining, audio signal analysis, and frame-based image extraction. Videos are programmatically accessed using Selenium, and aligned audiovisual segments are extracted using FFMPEG. Natural language processing (NLP), automatic speech recognition (ASR), and community-based relevance scoring help structure and refine the data. In the case study, a chat-bot is trained using Python tutorials, with cleaned and aligned multimodal data used to fine-tune a transformer model (e.g., CodeT5 or GPT). Performance metrics show high effectiveness: 100% transcript retrieval where subtitles exist, 93% metadata alignment via NLP, 88.5% command extraction precision, and 92% sentiment/intent classification accuracy. The system processes 5-minute videos in ~2.5 minutes on a Raspberry Pi and stores each structured dataset in just 1.5MB.Despite promising results, limitations include missing transcripts in some videos, variable content quality, reliance on accurate transcription, and moderate computational demands. Nonetheless, the framework demonstrates YouTube’s potential as a cost-effective, scalable resource for training intelligent, domain-specific systems—particularly in low-resource environments.

Keywords

Multimodal Data Mining, YouTube Video Analytics, Artificial Intelligence, Machine Learning, Decentralized Data Frameworks, Natural Language Processing, Open-Source Intelligence

Downloads

Download data is not yet available.

A DATA-DRIVEN APPROACH: YOUTUBE AS A RESOURCE FOR AI, ML AND ROBOTIC DEVELOPMENT

Abstract

Keywords

Downloads

Most read articles by the same author(s)

Similar Articles

Similar Articles

SENTIMENT CLASSIFICATION FOR ONLINE BOOK REVIEWS USING ENSEMBLE CLASSIFIERS

ADVANCEMENTS AND CHALLENGES IN DEEP LEARNING FOR CYBER THREAT DETECTION

ARTIFICIAL INTELLIGENCE, MACHINE LEARNING ALGORITHM IN SUSTAINABLE CYBERSECURITY PRACTICES FOR DIGITAL AGE

AN INVESTIGATION OF THE IMPACT OF SERIES LENGTH ON FORECAST BEHAVIOUR USING ARTIFICIAL NEURAL NETWORKS

A REVIEW OF MACHINE LEARNING TECHNIQUES APPLICATIONS IN ENVIRONMENTAL SCIENCE

FLOOD HAZARD ESTIMATION AND EVALUATION IN LAGOS STATE USING MACHINE LEARNING TECHNIQUES

ENHANCED SIGN LANGUAGE DETECTION USING DEEP LEARNING TECHNIQUES: A REVIEW

THE IMPACT OF ARTIFICIAL INTELLIGENCE ON LEGAL SYSTEMS

AUTOMATION AND DIGITALISATION IN DRILLING OPERATIONS

A COMPARATIVE ANALYSIS OF MULTI-FACTOR AUTHENTICATION METHODS FOR ENHANCED ONLINE SECURITY