Research

Publications

GupShup: Summarizing Open-Domain Code-Switched Conversations

EMNLP 2021 · Paper

First author. We introduce the task of abstractive summarization of Hindi-English code-switched conversations, along with GupShup — the first dataset of its kind, containing 6,800+ Hi-En conversations with human-annotated summaries. Multilingual mBART and multi-view seq2seq models achieve the best results.

LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

CIKM 2022 (DL4SR Workshop) · Paper

First author. We release two large corpora mapping keyphrases of ~1.3M and ~100K scientific articles with fully extracted text and metadata. Transformer models capable of processing long sequences (Longformer) outperform traditional approaches on keyphrase extraction from long documents.

Transformers on Sarcasm Detection with Context

ACL 2020 (FigLang Workshop) · Paper

First author. We study the effect of conversational context on sarcasm detection, extending BERT, RoBERTa, and SpanBERT with single sentence, sentence-pair, and LSTM-Transformer hybrid architectures on Twitter and Reddit threads.

Research

Publications

GupShup: Summarizing Open-Domain Code-Switched Conversations

LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

Transformers on Sarcasm Detection with Context

Projects

Distributed LLM Inference Service

RLHF on Phi-2 for Dialogues

Conditional Diffusion Model for Next Frame Generation

transformerkp

t-CRF

SpanElectra

Question Generation