Mahim's Portfolio

Ongoing and Previous Research Papers:

Working on Neural Program Synthesis with Dr. Tilevich from Sept 2021.
Working on Program repairing with Dr. Chris Brown from Sept 2021.
Accepted in Findings of ACL,2021 paper Titled “CoDesc: A Large Code-Description Parallel Dataset” under review. (Official code link).
Accepted Paper on IST Titled “Review4Repair: Code Review Aided Automatic Program Repairing”. This paper is on Code Bug fixing using Review Comments and comparative analysis with other methods like sequencer and Tufano. (Arxiv link) (Code Associated with the paper)
Published conference paper on “A Comparative Analysis on Bangla Handwritten Digit Recognition with Data Augmentation and Non-Augmentation Process", Published in International Congress on Human-Computer Interaction, Optimization and Robotic Applications(HORA),2020.(link)
Submitted Paper on ICANN 2021 , “BERT2Code: Few-Shot Semantic Code Search with Pretrained Natural Language and Code Embeddings" We also train on CodeSearchNet (CSN) dataset for comparison. (Arxiv)

Ongoing and Previous Research Projects:

Paper on progress on SOTA Language Model’s for Bangla: Here we Trained SOTA Language models for natural language generation and understanding using best performing models like Gpt2(small), Electra-Base for Bangla on largest own crawled Bangla data using libraries like Hugging-face-Transformers and Simple-transformers with Mixed precision (Nvidia Apex) and distributed data-parallel on AWS p3.16xlarge for faster. (Code)
Undergraduate thesis on “Content-Based Image Retrieval" Image Search based on Ensembled Hand Engineered Features like LBP, LTP, LTrP and our Own proposed method. (code)

Supervisor: Dr. Md. Monirul Islam, Professor, Department of CSE, BUET.

Code Summarizer: Code summary and docstring creation from raw codes using classic transformer Encoder-Decoder Own implemented model and trained with Nvidia-apex mixed-precision library for faster. (Code)
Git Code Miner: to analyze Review comment usefulness using Gerrit and Github code repository mining for Samsung Research(SRBD). (code).
Text Elaboration: Trained Gpt2-xl(1.5B) on 8 v100 GPUs (AWS p3.16xlarge) for news text elaboration using Giga world summarization dataset.(code) Here goes some sample Generation.(link)
Gender Debiasing: Language models and their implication on stereotype dataset on mitigating dataset gender bias and comparison with simple RNNLM.(code)
Sentiment Analysis: with pretrained BERT on a kaggle competition dataset.(code).
Distributed data-parallel (DDP) model Training for Bangla Language models like XLM Roberta and BERT, Transformer-Xl using Hugging-face Library.(code)
Automated Speech Recognition: Created an Automated Speech recognition system for Bangla and trained a Convolutional recurrent neural network using CTC loss. (code)