Research & Projects

Literature Based Discovery In Self Driving Cars (Master Project)

Please click here to go to the github repository

This master’s project was conducted between May 2025 and September 2025 at University College London and focused on the design and implementation of an automated literature analysis pipeline for large-scale academic research. The work was carried out independently with the objective of improving the efficiency, depth, and scalability of literature reviews by leveraging Large Language Models (LLMs) and structured knowledge representations. My interest in natural language processing, machine learning systems, and research automation motivated the project, particularly the challenge of extracting meaningful insights from rapidly growing bodies of academic literature.

The project combined LLM-based information extraction with knowledge graph construction to analyse citation networks and semantic relationships between papers. Emphasis was placed on systematic pipeline design, robust evaluation, and assessing the effectiveness of automated methods in identifying research gaps and emerging themes, with a particular focus on the autonomous vehicle domain.

Objectives

The primary aim of this project was to develop and evaluate an automated framework for analysing academic literature at scale. A key challenge was accurately extracting high-level research insights , such as research questions, findings, and limitations , while preserving contextual and citation-based relationships across papers.

Key objectives included:

Designing and implementing an LLM-driven pipeline to extract research questions, results, and limitations from academic publications.

Constructing and deploying a citation-aware knowledge graph to model relationships between papers and research themes.

Enhancing literature discovery by integrating semantic search with knowledge graph-based navigation.

Evaluating the accuracy of the system in identifying research gaps and emerging trends within the autonomous vehicle research domain.

Essential considerations included ensuring extraction reliability, minimizing hallucinations from language models, and validating the usefulness of automated insights compared to traditional manual literature reviews.

Results

The project demonstrated that LLM-powered literature analysis, when combined with knowledge graph techniques, can significantly improve the efficiency and quality of academic literature exploration. The automated pipeline successfully extracted structured research insights and enabled intuitive navigation across interconnected research areas.

Key findings and outcomes included:

The literature analysis pipeline effectively identified research questions, core findings, and limitations across a diverse set of academic papers.

The knowledge graph enhanced discovery of interdisciplinary connections by mapping citation-based and semantic relationships between studies.

Evaluation results showed strong performance in detecting research gaps and emerging themes within the autonomous vehicle domain.

The integration of semantic search with graph-based representations substantially reduced the time required for comprehensive literature reviews.

Through this project, I gained practical experience in natural language processing, LLM evaluation, knowledge graph design, and research-oriented system development. The work strengthened my ability to build scalable machine learning pipelines and reinforced my interest in automating complex knowledge-intensive tasks within scientific research.

Weakly-Supervised Neural Networks For Image Segmentation

Please click here to go to the github repository

This project was conducted between March 2025 and April 2025 and focused on the design and evaluation of weakly-supervised neural networks for image segmentation. The work was carried out independently with the objective of reducing annotation costs while maintaining high segmentation performance. My interest in computer vision, scalable machine learning, and data-efficient training methods motivated the project, particularly the challenge of achieving accurate segmentation with limited labelled data.

The project leveraged PyTorch and TensorFlow to engineer and train weakly-supervised learning models capable of learning from coarse, partial, or weak annotations. Emphasis was placed on systematic experimentation, rigorous evaluation, and understanding the trade-offs between annotation effort and model performance.

Objectives

The primary aim of this project was to evaluate the effectiveness of weakly-supervised learning approaches for image segmentation tasks. A key challenge was designing models and training strategies that could extract meaningful spatial representations while relying on minimal supervision.

Key objectives included:

Engineering weakly-supervised neural network architectures using PyTorch and TensorFlow for image segmentation tasks.

Reducing dependence on pixel-level annotations by leveraging weak labels and constrained supervision strategies.

Conducting comparative experiments and ablation studies to assess the impact of architectural choices, loss functions, and supervision signals.

Evaluating segmentation performance using appropriate metrics and statistical analysis.

Essential considerations included ensuring experimental reproducibility, preventing overfitting under limited supervision, and fairly comparing weakly-supervised models against stronger baselines.

Results

The project demonstrated that weakly-supervised neural networks can achieve competitive segmentation performance while significantly reducing labeling requirements. Carefully designed training strategies and architectural choices enabled the models to learn meaningful spatial features despite limited annotation granularity.

Key findings and outcomes included:

Weakly-supervised models achieved improved segmentation performance compared to baseline approaches trained with minimal labels.

Ablation studies revealed which components contributed most to performance gains, providing insights into effective supervision mechanisms.

Statistical analysis of evaluation metrics supported the reliability of observed improvements and highlighted trade-offs between accuracy and annotation cost.

The results illustrated the practicality of weakly-supervised learning for real-world computer vision tasks where dense annotations are expensive or infeasible.

Through this project, I gained hands-on experience in computer vision, data-efficient learning, experimental design, and model evaluation. The work strengthened my ability to translate empirical results into actionable insights and reinforced my interest in scalable, annotation-efficient machine learning systems.

Formal Verification Of Python Programs With CrossHair

Please click here to go to the github repository

This project focused on the application of symbolic execution techniques to Python programs in order to formally reason about program correctness and identify subtle logical flaws. The work involved analysing programs with branching logic, recursion, and mutable data structures, using precise pre-conditions and post-conditions to specify expected behaviour. My motivation for this project stemmed from an interest in formal methods, software correctness, and the limitations of traditional testing approaches when dealing with complex execution paths and edge cases.

The project made use of CrossHair, a symbolic execution tool for Python, to automatically explore feasible execution paths and verify correctness properties. Particular emphasis was placed on interpreting tool output critically and understanding where symbolic execution succeeds, where it falls short, and how it can be complemented by other testing techniques in real-world software systems.

Objectives

The primary aim of this project was to evaluate the effectiveness of symbolic execution for verifying correctness properties and detecting logical errors in Python programs. A key challenge was applying formal reasoning techniques to realistic code constructs while managing the inherent limitations of symbolic analysis.

Key objectives included:

Applying symbolic execution with CrossHair to Python programs involving branching, recursion, and mutable state.

Defining and validating formal pre-conditions and post-conditions to specify correctness properties.

Analysing symbolic execution outputs to detect logical flaws, violated invariants, and unexpected behaviours.

Exploring complex edge-case scenarios that are difficult to uncover through conventional testing.

Identifying verification limitations such as partial coverage, path explosion, and false positives, and proposing suitable mitigation strategies.

Essential considerations included balancing analysis depth with tractability, interpreting solver results accurately, and ensuring that reported issues reflected genuine correctness concerns.

Results

The project demonstrated that symbolic execution can be highly effective for uncovering logical errors and edge cases that traditional testing may overlook, particularly in programs with complex control flow. CrossHair successfully identified violations of specified post-conditions and revealed scenarios where implicit assumptions in the code did not hold.

Key findings and outcomes included:

Symbolic execution exposed subtle logical flaws related to boundary conditions, recursive termination, and mutable data handling.

Formal specifications proved essential in guiding analysis and clarifying intended program behaviour.

Tool limitations were observed in the form of incomplete path coverage and false positives, particularly in more complex or state-heavy code.

Practical mitigation strategies were identified, including refining specifications, constraining input domains, and combining symbolic execution with complementary testing approaches such as fuzzing.

Through this project, I developed a strong understanding of formal verification concepts, symbolic analysis, and correctness reasoning, as well as the practical challenges of applying these techniques to real-world codebases. The work reinforced the importance of combining formal methods with dynamic testing to improve overall software reliability and robustness.

Fuzz Testing With Sanitizers

Please click here to go to the github repository

This project was conducted between January 2025 and February 2025 and focused on fuzz testing and vulnerability detection using runtime sanitizers. The work was carried out independently with the goal of systematically evaluating software robustness against malformed and adversarial inputs. My interest in software security, low-level program behaviour, and defensive testing methodologies motivated this project, particularly the use of modern fuzzing tools to uncover memory safety issues and latent vulnerabilities.

The project leveraged AFL++ to orchestrate multiple fuzzing campaigns across both sanitised and unsanitised builds of target programs. Emphasis was placed on understanding how different compiler sanitizers affect crash detection, coverage, and vulnerability discovery, as well as on analysing the security implications of the identified failures.

Objectives

The primary aim of this project was to assess the effectiveness of coverage-guided fuzz testing when combined with compiler-based sanitizers in identifying software defects and security vulnerabilities. A key challenge was designing fair and reproducible fuzzing experiments while accurately diagnosing the root causes of observed crashes.

Key objectives included:

Orchestrating multiple fuzzing campaigns using AFL++ across sanitised and unsanitised builds to compare crash discovery and code coverage.

Identifying and analysing input-triggered crashes by inspecting memory safety violations such as buffer overflows, use-after-free errors, and invalid memory accesses.

Evaluating differences in fuzzing effectiveness by comparing execution coverage, crash frequency, and sanitizer diagnostics.

Analysing potential CVE exposure by mapping discovered failure patterns to known vulnerability classes.

Essential considerations included experiment reproducibility, accurate crash deduplication, and ensuring that sanitizer overhead did not bias comparative results.

Results

The project successfully delivered a fully automated, cloud-based distributed pipeline capable of processing large protein structure datasets with minimal manual oversight. The system demonstrated reliable execution across multiple runs, producing consistent outputs that could be easily shared with collaborators.

Key findings and outcomes included:

The project successfully demonstrated the value of combining fuzz testing with sanitizers for uncovering subtle and security-critical software defects. Sanitised builds consistently exposed a wider range of memory violations and provided clearer diagnostic information compared to unsanitised binaries.

Sanitizers significantly improved crash interpretability by pinpointing precise memory violations and failure locations.

Differences in campaign coverage highlighted how instrumentation affects execution paths and vulnerability discovery.

Several crashes were traced to well-known vulnerability classes, underscoring realistic security risks and potential CVE relevance.

Comprehensive crash statistics and coverage metrics enabled evidence-based recommendations for improving software resilience.

The project culminated in a technical report presenting fuzzing coverage data, crash analysis, and actionable recommendations for enhancing software robustness and security posture. Through this work, I gained practical experience in software security testing, dynamic analysis, and vulnerability assessment, reinforcing the importance of automated testing techniques in modern secure software development.

Cloud-Based Distrbuted Pipeline For Image-Based Human Emotion Analysis Using YOLO

Please click here to go to the github repository

This project was conducted between December 2024 and January 2025 and focused on the design and implementation of a distributed cloud-based pipeline for large-scale image-based emotion analysis. The work was carried out independently, with the objective of building a scalable and reproducible system capable of efficiently processing large image datasets. My interest in cloud-native architectures, distributed systems, and machine learning motivated this project, particularly the challenge of performing image classification at scale using pretrained models.

The project utilised a dataset of 17,000 images (23.73GB) sourced from Kaggle, integrating a pretrained YOLO (You Only Look Once) model for feature extraction and classification. Emphasis was placed on automation, scalability, and reproducibility, ensuring that the pipeline could reliably process large volumes of image data while producing consistent and structured outputs for evaluation and analysis.

Objectives

The primary aim of this project was to design and deploy a cloud-based distributed pipeline capable of processing large-scale image datasets efficiently and reproducibly. A key challenge was orchestrating data preprocessing, model inference, and output generation across distributed cloud resources while maintaining performance and data consistency.

Key objectives included:

Successfully engineering a cloud-based distributed pipeline to process a 17,000-image dataset (23.73GB) sourced from Kaggle.

Preprocessing data by resizing images and preparing inputs for a pretrained YOLO model for feature extraction and classification.

Automating deployment, configuration, and execution workflows using Ansible to reduce manual intervention and improve reliability.

Integrating resource orchestration, logging, and cloud storage management to support scalable execution and monitoring.

Implementing image classification into six emotion categories (anger, disgust, fear, happiness, sadness, and surprise) and recording predictions for evaluation.

Producing reproducible outputs, including structured summary CSV reports displaying model predictions for user review and analysis.

Essential considerations included scalability, efficient resource utilisation, reproducibility of results, and clear logging for debugging and performance monitoring.

Results

The project successfully delivered a fully automated, cloud-based distributed pipeline capable of processing large-scale image datasets with minimal manual intervention. The system demonstrated reliable performance across multiple runs, producing consistent and reproducible outputs suitable for evaluation.

Key findings and outcomes included:

The pipeline efficiently processed the Kaggle image dataset, performing preprocessing and emotion classification using a pretrained YOLO model.

Automation through Ansible significantly reduced setup complexity and configuration errors, enabling rapid deployment and repeatable execution.

The classification system accurately categorised images into six emotion classes, with predictions recorded for further evaluation.

Structured outputs, including summary CSV files, enabled easy review of model predictions and supported downstream analysis.

Integrated logging and orchestration mechanisms improved transparency, allowing effective monitoring and identification of potential performance bottlenecks.

Through this project, I gained practical experience in distributed systems, cloud infrastructure, and machine learning workflows. The work strengthened my understanding of scalable data pipelines and reinforced my interest in applying cloud and data engineering techniques to real-world AI and data-intensive problems.

Natural Language Processing Undergraduate Final Year Project

Please click here to go to the github repository

This project was part of my final-year undergraduate research at Queen Mary, University of London, and was conducted between September 2023 and May 2024. I worked independently under the supervision of [Supervisor’s Name, if applicable], with a focus on transfer learning in Natural Language Processing (NLP) between English and Tamil. My interest in multilingual NLP and the linguistic complexities of low-resource languages like Tamil inspired me to explore how transfer learning models could bridge language-specific challenges. The project leveraged real-world data—over 42,000 YouTube comments in both languages—to investigate model accuracy, bias, and performance limitations in cross-lingual sentiment and text classification tasks.

Objectives

The primary aim of this project was to evaluate the effectiveness of transfer learning when applied between a high-resource language (English) and a low-resource language (Tamil). The initial challenge was the significant disparity in available NLP tools, datasets, and annotated corpora for Tamil compared to English.

Key objectives included:

Applying and fine-tuning pre-trained language models to perform sentiment analysis and classification in both English and Tamil.

Identifying and addressing challenges unique to Tamil, such as morphological richness, complex script, and limited annotated data.

Comparing model performance across both languages using key evaluation metrics (precision, recall, F1-score).

Improving performance in Tamil tasks through dataset manipulation, data augmentation, and multi-dataset training using Python-based NLP frameworks.

Essential considerations included handling noisy user-generated text, ensuring fairness in cross-lingual evaluation, and maintaining reproducibility of experiments.

Results

The project revealed notable performance differences between English and Tamil tasks. While transfer learning showed promising results in English, the models struggled with Tamil due to its lower resource support and syntactic complexity. Specifically:

Tamil models consistently underperformed in comparison, with lower precision and F1 scores.

Model performance was improved marginally through targeted data cleaning, bilingual training strategies, and custom preprocessing scripts.

The best-performing model architecture achieved reasonable accuracy on both tasks but highlighted the ongoing need for better Tamil-specific resources and tokenization techniques.

Unexpectedly, I observed that even small variations in script representation (e.g., Unicode normalization inconsistencies) significantly impacted model outcomes in Tamil. This underscored the importance of linguistic sensitivity in multilingual NLP.

Through this project, I gained hands-on experience in working with real-world multilingual datasets, fine-tuning transformer models, and performing error analysis. The findings not only contributed to my technical growth but also sparked an ongoing interest in low-resource NLP, which I hope to explore further in future academic or industry research.

Cryptocurrency Wallet app for Mobiles

Please click here to go to the github repository

This project was completed as part of a group Software engineering module at Queen Mary, University of London, between January and April 2023. Working collaboratively in a team of seven students, we designed and prototyped a functional cryptocurrency wallet mobile application for Android. The project was developed using Android Studio with Kotlin for the front end and an SQL database for managing secure data on the back end.

Our group aimed to explore the emerging fintech space while tackling the technical and UX challenges of digital currency management. The project gained significant recognition and was awarded Runner-Up for Best Cryptocurrency Application, highlighting both our technical execution and innovative approach. I contributed extensively to both the application’s development and its presentation, with a focus on front-end architecture and integration.

Objectives

The primary objectives of the project were to:

Develop a working prototype of a mobile cryptocurrency wallet that allows users to simulate key functions such as viewing balances, sending and receiving digital currency, and tracking transaction history.

Enhance security and data integrity by designing a reliable back-end architecture using SQL, while maintaining a smooth and responsive front-end interface.

Understand and apply key mobile development practices, including Android-specific UI/UX design, activity lifecycle management, and local data storage.

Collaborate effectively as a team, using version control tools (e.g., Git), task boards, and peer reviews to ensure alignment and progress throughout the development cycle.

Essential considerations included usability, performance, clarity in transaction flows, and addressing security concerns inherent in handling financial data—even within a simulated environment

Results

The final prototype successfully met all core functional requirements:

Users could log in, view their balance, initiate simulated transfers, and see a history of transactions through an intuitive interface.

The application integrated multiple UI components and state management strategies within Kotlin to deliver a user-friendly experience.

The SQL-based back end stored user data securely, and basic encryption measures were implemented for demonstration purposes.

The project was recognized in a university-wide showcase, where it earned Runner-Up status for its innovation, design, and attention to real-world fintech challenges.

Throughout the process, we encountered and addressed challenges such as managing group workflows, implementing secure authentication, and balancing feature complexity with time constraints. Personally, I deepened my skills in mobile development, UI design, and cross-functional teamwork—experience I plan to build on in future fintech or mobile-focused projects.

This project provided a strong foundation for understanding the intersection of mobile development and blockchain-inspired applications, and has inspired future exploration in digital finance and secure app development.

Weather app for tourists

Please click here to go to the github repository

This project was developed between January and April 2023 as part of a collaborative software development module at Queen Mary, University of London. Our team of four set out to create a weather application specifically designed for tourists, integrating real-time weather forecasting with location-based tourist hotspot recommendations. I worked primarily on the front-end development using React, along with HTML and CSS, contributing to the user interface, responsiveness, and API integration.

The project was inspired by the practical needs of travelers who want not only weather updates but also suggestions for places to visit nearby, all in one intuitive interface. It served as a strong opportunity to apply web development skills while working with external APIs and geolocation technologies.

Objectives

The main goals of the project were to:

Create a web-based application that provides real-time weather forecasts tailored to the user’s current location.

Recommend nearby tourist attractions based on geographic data, offering added value for users who may be unfamiliar with the area.

Implement geolocation and external APIs, combining weather data and points of interest in a cohesive, user-friendly platform.

Design and develop the front end using React, ensuring responsiveness, usability, and performance across devices.

Key considerations throughout the project included API reliability, UI clarity, geolocation permissions, and the seamless integration of multiple data sources.

Results

The final application successfully achieved its intended purpose:

Users were able to detect their location, view the current and upcoming weather forecast, and receive a curated list of nearby tourist destinations.

The app utilized APIs for both weather data retrieval (e.g., OpenWeatherMap) and location-based recommendations, integrating them into a clean, functional React front end.

Our team delivered a fully responsive web app with a modern interface and smooth navigation, suitable for use on both desktop and mobile browsers.

One of the key lessons from the project was handling real-time data from multiple APIs and ensuring synchronization across components in React. We also gained practical experience with state management, error handling, and group coordination using Git and project planning tools.

The project was well received within our course and helped me solidify my skills in React-based front-end engineering, API integration, and geo-aware application design. It laid the groundwork for further work in travel tech, user-focused web apps, and data-driven interface design.

Page updated

Google Sites

Report abuse