Niruthiha S.

Logo

CV | LinkedIn | GitHub | Medium Blogs
"I design and build models from data, create and work on algorithms, and train models to predict and achieve business goals."

About Me

Welcome to my professional portfolio!

I am a Doctoral Researcher at École de technologie supérieure, where I focus on cutting-edge AI to automate software maintenance and MLOps, AI Safety and Software Engineering. My current work investigates Technical Debts in LLMs—developing neuro-symbolic tools to detect hidden bugs and security risks in AI-generated code. I NLP with program analysis to build QA systems for the AI era.

My background combines a strong foundation in Data Science and Machine Learning Systems from Northeastern University with a B.S. in Mathematics and Statistics from the University of Toronto.

This repository showcases my expertise in developing innovative solutions for complex, real-world challenges across Natural Language Processing, MLOps, advanced predictive modeling, and strategic AI system deployment.


Education


Professional Experience


Publications & Presentations

Highlighted Projects

Below are some of my key projects demonstrating my skills and experience. Each project is structured to highlight the challenge, my solution, and the impact/results achieved.

1. Named Entity Recognition for Restaurant Search Queries

Open in GitHub View The Model in Hugging Face (1000+ model downloads) Read Full Paper

Challenge: Developed an accurate Named Entity Recognition (NER) system for restaurant search queries, a critical component for enhancing search and recommendation systems.

Solution: Fine-tuned a DistilBERT model leveraging transfer learning to accurately extract structured information (ratings, cuisines, locations, amenities) from free-form text.

Impact & Results:

Screenshot 2025-03-10 185601 Screenshot 2025-03-10 201215


2. Hybrid Graph Neural Network for Financial Fraud Detection

View Project Code

Challenge: Built a production-scale fraud detection system to identify fraudulent transactions in massive financial dataset - IEEE-CIS Fraud Detection dataset from Kaggle. Processed 590,540 transactions with extreme class imbalance (3.5% fraud rate), requiring both tabular feature learning and complex network relationship modeling to surpass industry-standard gradient boosting methods.

Solution: Developed an innovative Hybrid Graph Neural Network combining GraphSAGE layers with deep tabular networks and cross-attention fusion. Engineered 200+ advanced features including temporal patterns, network connectivity metrics, and multi-dimensional risk scoring. Implemented memory-optimized graph construction handling 1.5M+ edges with fraud-aware weighting.

Impact & Results:

image


3. California Renewable Energy Forecasting & Emissions Optimization System

Open in Google Colab Read Full Paper

Challenge: Developed a comprehensive system to forecast renewable energy generation and optimize energy mix for CO2 emission minimization in California, addressing grid stability concerns.

Solution: Engineered a robust ETL pipeline for 43,800 hourly observations (2018-2023) from EIA’s Grid Monitor. Developed predictive models and a linear programming optimization framework (using PuLP) to balance renewable integration with emissions reduction.

Impact & Results:

Technical Stack: Python, SQL, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, PuLP, Git. Data Source: U.S. Energy Information Administration (EIA) “Hourly Electric Grid Monitor” dataset.


4. Climate Change Chatbot with RAG

Open in GitHub Click Here to Watch Demo Video

🏆 2nd Place Winner - Climate Resiliency Hackathon 2024 (400+ participants, 10 Northeastern University campuses across North America)

Challenge: Developed a sophisticated information retrieval and natural language processing system to enable accurate semantic search, context-aware document retrieval, and real-time information validation across vast climate science datasets, specifically focused on Canada.

Solution: Engineered a robust document processing pipeline for diverse sources (IPCC Reports, ECCC Climate Data, University Research Papers). Implemented advanced text preprocessing, custom tokenization, and domain-specific entity recognition, achieving 95% retrieval accuracy for relevant documents. The system uses a Retrieval-Augmented Generation (RAG) architecture with a vector database to provide precise, data-driven LLM responses.

Impact & Results:

image


5. MAHD: Conservative Multi-Agent System for Contextual Hateful Meme Detection Using GPT-4

Read Full Paper

Project Overview: Developed MAHD (Multi-Agent Hate Detection), a novel dual-agent system built on GPT-4 for robust hateful meme detection. MAHD employs a conservative classification approach, achieving high precision while effectively capturing subtle forms of harmful content.

Key Features & Impact:

image image


6. Multivariate Analysis of Language Impairment Patterns Using PCA and Clustering

Open in Google Colab Read Full Paper

Project Overview: Applied advanced data science techniques (PCA and K-means clustering) to analyze patterns in language impairment using a dataset of 1,163 participants with 64 linguistic features.

Key Contributions & Impact:

image


7. Maternal Health Risk Prediction (Course Project)

Project Context: Developed a machine learning system to identify high-risk pregnancies in rural Bangladesh during a graduate-level Data Science course (Prof. Junwei Huang), addressing critical healthcare challenges in resource-limited settings with incomplete and imbalanced data.

Challenges Addressed:

Solution & Impact:

image


8. Direct Preference Optimization (DPO)

Open in Google Colab

Project Overview: Focused on generating a preference dataset using PairRM and fine-tuning the Mistral-7B-Instruct model with Direct Preference Optimization (DPO), a powerful training recipe.

Key Contributions & Learning:

image image


9. Web Scraping Project: Financial Data Collection from Yahoo Finance

Technologies Used: Python, BeautifulSoup4, Pandas, Requests

Project Overview: Developed an automated web scraping system to collect comprehensive financial metrics from Yahoo Finance for major S&P500 companies (e.g., Apple, Google, Microsoft).

Key Features & Impact:

image


10. Sales Analysis Dashboard in Power BI

Power BI Dashboard

Project Overview: Developed a comprehensive Power BI dashboard for FY21 sales analysis, tracking and visualizing key performance metrics.

Key Features & Impact:

Key Technologies: Power BI, DAX, Power Query, SQL, Excel

image