πŸ“ž +91-7667918914 | βœ‰οΈ ijarcce@gmail.com
International Journal of Advanced Research in Computer and Communication Engineering
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 15, ISSUE 4, APRIL 2026

LARGE SCALE DATA AUDIT THROUGH BIG DATA TECHNOLOGIES

Anbarasi N, Prasanna P, Praveen Kumar M, Surendar N D, Venkadesh K

πŸ‘ 6 viewsπŸ“₯ 1 download
Share: 𝕏 f in ✈ βœ‰
Abstract: This paper presents a comprehensive Big Data analytics pipeline for YouTube trending video data using Apache Spark, TextBlob NLP, MySQL, and Streamlit Dashboard technologies. The system integrates five primary components: the YouTube Data API v3 for automated collection of up to 200 trending videos and 4,000 user comments per pipeline run; a MySQL 8.0 relational database for structured storage; Apache Spark 3.4 (PySpark) for distributed data transformation; TextBlob NLP for lexicon-based sentiment analysis deployed as Spark User Defined Functions; and a Streamlit and Plotly multi-page interactive dashboard for analytics visualization. Results from a representative pipeline run revealed that Music and Entertainment categories dominate India's trending landscape, Gaming audiences exhibit the highest engagement scores (4.12%), and 52% of comments are Neutral, 33% Positive, and 15% Negative. The system demonstrates the practical application of Big Data engineering principles to social media analytics, delivering actionable intelligence on content trends, viewer engagement patterns, and audience sentiment for the Indian YouTube market.

Keywords: YouTube Analytics, Apache Spark, PySpark, TextBlob, Sentiment Analysis, Big Data Pipeline, MySQL, Streamlit, Engagement Score, Natural Language Processing, India Trending, Data Engineering

How to Cite:

[1] Anbarasi N, Prasanna P, Praveen Kumar M, Surendar N D, Venkadesh K, β€œLARGE SCALE DATA AUDIT THROUGH BIG DATA TECHNOLOGIES,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.154236

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.