← Back to VOLUME 15, ISSUE 4, APRIL 2026
This work is licensed under a Creative Commons Attribution 4.0 International License.
LARGE SCALE DATA AUDIT THROUGH BIG DATA TECHNOLOGIES
π 6 viewsπ₯ 1 download
Abstract: This paper presents a comprehensive Big Data analytics pipeline for YouTube trending video data using Apache Spark, TextBlob NLP, MySQL, and Streamlit Dashboard technologies. The system integrates five primary components: the YouTube Data API v3 for automated collection of up to 200 trending videos and 4,000 user comments per pipeline run; a MySQL 8.0 relational database for structured storage; Apache Spark 3.4 (PySpark) for distributed data transformation; TextBlob NLP for lexicon-based sentiment analysis deployed as Spark User Defined Functions; and a Streamlit and Plotly multi-page interactive dashboard for analytics visualization. Results from a representative pipeline run revealed that Music and Entertainment categories dominate India's trending landscape, Gaming audiences exhibit the highest engagement scores (4.12%), and 52% of comments are Neutral, 33% Positive, and 15% Negative. The system demonstrates the practical application of Big Data engineering principles to social media analytics, delivering actionable intelligence on content trends, viewer engagement patterns, and audience sentiment for the Indian YouTube market.
Keywords: YouTube Analytics, Apache Spark, PySpark, TextBlob, Sentiment Analysis, Big Data Pipeline, MySQL, Streamlit, Engagement Score, Natural Language Processing, India Trending, Data Engineering
Keywords: YouTube Analytics, Apache Spark, PySpark, TextBlob, Sentiment Analysis, Big Data Pipeline, MySQL, Streamlit, Engagement Score, Natural Language Processing, India Trending, Data Engineering
How to Cite:
[1] Anbarasi N, Prasanna P, Praveen Kumar M, Surendar N D, Venkadesh K, βLARGE SCALE DATA AUDIT THROUGH BIG DATA TECHNOLOGIES,β International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.154236
