HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models (2024)

Tanmay Sen, Ansuman Das, Mrinmay SenTanmay Sen is with Ericsson, Kolkata, e-mail:sentanmay518@gmail.com.Ansuman Das is with Ericsson, Kolkata,e-mail:ansumandasiiit@gmail.com.Mrinmay Sen is a joint research scholar in the Department of Artificial Intelligence, Indian Institute of Technology Hyderabad and the Department of Computing Technologies, Swinburne University ofTechnology, Australia, e-mail: msen@swin.edu.au

Abstract

Hate speech encompasses verbal, written, or behavioral communication that targets derogatory or discriminatory language against individuals or groups based on sensitive characteristics. Automated hate speech detection plays a crucial role in curbing its propagation, especially across social media platforms. Various methods, including recent advancements in deep learning, have been devised to address this challenge. In this study, we introduce HateTinyLLM, a novel framework based on fine-tuned decoder-only tiny large language models (tinyLLMs) for efficient hate speech detection. Our experimental findings demonstrate that the fine-tuned HateTinyLLM outperforms the pretrained mixtral-7b model by a significant margin. We explored various tiny LLMs, including PY007/TinyLlama-1.1B-step-50K-105b, Microsoft/phi-2, and facebook/opt-1.3b, and fine-tuned them using LoRA and adapter methods. Our observations indicate that all LoRA-based fine-tuned models achieved over 80% accuracy.

Index Terms:

Hate Speech Detection, tiny LLM, LoRA, Adapter

I Introduction

Hate speech detection [1, 2, 3] in text data has garnered significant attention in recent years, with researchers exploring various approaches to address this complex problem. The task of hate speech detection refers to identifying and categorizing language that expresses hatred, prejudice, or hostility towards individuals or groups based on attributes such as race, ethnicity, religion, gender, sexual orientation, disability, or other protected characteristics. The goal of hate speech detection is to develop automated systems or algorithms that can analyze text data, such as social media posts, comments, or news articles, and identify instances of hate speech. Efficiently detecting and mitigating hate speech can help to protect individuals and communities from the negative consequences such as discrimination, violence and social division. Various methodologies and data sets have been explored and generated for hate speech detection problem. All the previously proposed methodologies can be broadly categorized into three groups: traditional machine learning, deep learning methods that utilized word embedding and transformers-based encoders only methods. Malik et al. [4] present a comparative study on fourteen different deep learning models, concluding that transformers-based hate speech detection models exhibit more promising results than classical and embedding based deep learning models. Various deep learning models like as LSTM, biLSTM, and convolution neural network with Word2Vec embedding are employed by various researchers [5] for hate speech detection. [6] conduct comprehensive experiments utilizing various deep learning models to acquire semantic word embeddings.

Transformer models [7] , like BERT [8], ELECTRA [9], and BART [10] offer superior syntactic and semantic understanding of words within text compared to traditional word2vec or GloVe vectors for word embedding. Mozafari et. al. [11] explore BERT’s capability to capture hateful context within social media content using novel fine-tuning methods. A fine-tune multilingual BERT model for hate speech detection is proposed by Aluru et al.[12] in low-resource languages. Graph neural network (GNN) based mutlitask framework MTBullyGNN for cyberbullying detection is proposed by Maity et al. [13] for code-mixed language. They utilized a sentence encoder and BART model to represent text. Awal et. al. [14] propose HateMAML, a novel meta learning based approach for hate speech detection in low-resource languages.Recently, researchers have explored various techniques to enhance the performance and robustness of large language models (LLM) based hate speech detection. Guo et al. [15] conduct large scale study on effective prompting methods for context-aware LLM-based hate speech detection to improve accuracy. Often, hate speech detection problem suffers lack of label data, Pendze et al. [16] mitigate this issues by generating large amount of synthetic data using LLM. The authors compare two types of enocoder and decoder only transformer models. They have observed zero shot GPT3.5 performs marginally better than fine tuned BERT family of models.

Tiny LLMs are advantageous over large LLMs due to their reduced computational requirements, faster inference times, and lower memory footprint. Therefore, tiny LLMs are often more suitable for niche applications where a smaller model may suffice without sacrificing much performance. This work introduces the use of various fine-tuned decoder only based tiny large language models (LLMs) for hate speech detection. These tiny LLMs offer strong performance across various tasks while requiring fewer resources, making them ideal for practical applications. Notably, this is the first study to explore leveraging fine-tuned decoder only tiny LLMs for hate speech detection.

TweetHate Class
dalits are lowliveshate
gay people are a burden to our societyhate
Arabs are not welcomehate
i wish corona didnt break this countrynothate
asian people are the sh*tnothate
i hate wearing black in the summer!nothate
TweetHate Class
White women are stupidhate
Any woman that likes me is a bitch. All women are bitches.hate
Shut the f*ck up plshate
It must be amazing to be menothate
Only I could get kicked out of a club for just beingnothate
They’re replaceable, your time is notnothate

II Datasets

To begin, we reviewed the literature for the existing hatespeech detection datasets in English. We have considered two hatespeech datasets (DynaHate [17] and hateeval [18]) in English for our experiment. DynaHate dataset comprises approximately 41,144 entries, as shown in TableIII. This dataset is created through a collaborative human-and-model-in-the-loop process aimed at enhancing hate detection models. This approach facilitated the collection of four rounds of datasets specifically focused on hate speech. In the balanced DynaHate dataset, comprising 411,144 entries, tweets are evenly distributed with 46% percent classified as ’Not Hate’ and 54% as ’Hate,’ ensuring robust representation across categories. The HateEval dataset looks at hate speech aimed at women and immigrants on Twitter. It has about 9000 entries. Among these, 58% of the tweets are not hateful, while 42% contain hate speech.Some samples of both the DynaHate and Hateeval dataset are shown in Table I and Table II respectively. Detailed class-wise distributions of DynaHate and Hateeval datasets are also given in Table III.

ClassDynaHateHateEval
Hate221753783
NotHate189695217

III Methodology

In the following section dives into formulating the problem and unveils a framework for hate speech detection with various tiny LLMs. Zhang et al.Zhang et al. [19] introduces TinyLlama a condensed language model comprising 1.1 billion parameters. It trained on approximately 3 trillion tokens across three epochs. TinyLlama extends the architecture and tokenizer initially developed for Llama 2.Employing advancements such as flashAttention, TinyLlama achieves superior computational efficiency and exhibits impressive performance across various downstream tasks, surpassing existing open-source models of comparable sizes. By training smaller models with larger datasets, the study explores the potential of optimizing performance within specific inference constraints, challenging the preference for larger models. The pretraining process effectively combines natural language and code data, resulting in competitive performance. Through extensive experimentation and optimization, including speed enhancements like Fully Sharded Data Parallelism and flash attention, TinyLlama showcases superior training efficiency and problem-solving capabilities. The paper underscores the significance of smaller, efficient models like TinyLlama in enhancing accessibility and promoting innovative research in language model development.TinyLlama consists of 22 layers, 16 attention heads, with an embedding size of 2048.

Li et al. [20] propose Phi, represents a significant advancement in the realm of smaller-scale transformers, demonstrating impressive performance across various benchmarks without the need for an extensive parameter count. Its utilization of a diverse range of data sources, including synthetic texts and filtered websites, highlights a strategic approach to training that enriches its understanding of language and common sense. Notably, Phi-2’s ability to achieve near-state-of-the-art performance with just 2.7 billion parameters underscores the importance of efficient model design and data augmentation techniques. Moreover, its focus on safety and educational value, reflected in the careful curation of data sources, speaks to a conscientious approach to AI development.Phi model features 24 layers with 32 attention heads, each having a dimension of 32. Its context length is set as 2048.

Zhang et al. [21] present open pre-trained transformers (OPT), a collection of decoder-only pre-trained transformers with parameter sizes ranging from 125 million to 175 billion ( in our study we have used 1.3 billion parameters model) aiming to facilitate reproducible and responsible research in large language models (LLMs). The authors highlight the limited access to full model weights of existing LLMs and the significant computational cost involved in training such models. They present detailed architectural specifications and training methodologies, emphasizing transparency and efficiency. Evaluation results across various NLP tasks, dialogue datasets, bias, and toxicity benchmarks demonstrate the competitiveness of OPT-175B compared to existing models like GPT-3 Davinci and PaLM. While OPT-175B generally matches or outperforms existing models in NLP tasks and dialogue generation, it exhibits higher stereotypical biases and toxicity rates, indicating the need for further research on ethical considerations and model improvements.Opt 1.3B consists of 24 layers, each containing 32 attention heads, with an embedding size of 2048.

HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models (1)
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models (2)

We have used Low-Rank Adaptation (LoRA) and Adapter methods for our small LLm fine-tuning purpose.The paper [22] introduces LoRA as a solution to the challenge of fine-tuning large pre-trained models like GPT-3 (175 billion parameters) for specific tasks, which can be prohibitively expensive due to the sheer size of the model. The authors acknowledge the paradigm of pretraining on general domain data and adapting to particular tasks, but note that full fine-tuning becomes less feasible as models grow larger. To address this issue, the proposed LoRA approach involves freezing the weights of the pretrained model and introducing trainable rank decomposition matrices into each layer of the Transformer architecture. This effectively reduces the number of trainable parameters for downstream tasks while still allowing adaptation. The proposed architecture can be foiund in Fig1.

HyperparametersTinyLlamaphi-2opt-1.3B
Epoch333
Target Modulesk_proj’,’v_projk_proj’,’v_projk_proj’,’v_proj
Trainable Parameters0.01 %0.02 %0.03%
LoRA_alpha161616
r222
LoRA_dropout0.050.050.05
Batch Size888
weight_decay0.0010.0010.001
Training Time1.05 hour1.20 hour1.10 hour

The adapter method, as introduced by Houlsby et al. (2019) [23], offers a parameter-efficient approach to enhancing large language models (LLMs) by adding adapter layers to transformer blocks. Unlike prefix tuning, which modifies embeddings, adapter layers are inserted into two positions within each transformer block. These adapter layers consist of relatively small fully connected layers with a bottleneck structure akin to autoencoders. This design significantly reduces the number of parameters required compared to traditional methods

IV Experiments, Results and Analysis

This section describes the outcomes of raw pretrained models and our proposed finetuned models.

IV-A Baselines Setup

Utilizing all three Tiny LLMs, both before and after fine-tuning, as well as Mistral 7B,we have evaluated the classification results. The experiments were conducted in a consistent computing environment. Notably, the baseline models demonstrated an average accuracy of 0.5, and no quantization was applied to these models.All baseline model results are summarised in below tables.

IV-B Experimental Setup and Hyperparameters

The experiments were conducted sequentially, with each method executed separately by restarting the kernel to ensure independent runs. A Nvidia P100 GPU with 16 GB memory was utilized for all experiments. In bothe the methods we have chnaged the weights for k-proj and v-proj layer fo the all 3 tiny llms.TinyLlama, phi-2, and opt-1.3B. Across all methods, epochs are set to 3, target modules focus on kprojsubscript𝑘𝑝𝑟𝑜𝑗k_{p}rojitalic_k start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_r italic_o italic_j and vprojsubscript𝑣𝑝𝑟𝑜𝑗v_{p}rojitalic_v start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_r italic_o italic_j, and there’s a slight variation in the percentage of trainable parameters. Additionally, common values for LoRAalpha𝐿𝑜𝑅subscript𝐴𝑎𝑙𝑝𝑎LoRA_{a}lphaitalic_L italic_o italic_R italic_A start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT italic_l italic_p italic_h italic_a, r, LoRAdropout𝐿𝑜𝑅subscript𝐴𝑑𝑟𝑜𝑝𝑜𝑢𝑡LoRA_{d}ropoutitalic_L italic_o italic_R italic_A start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT italic_r italic_o italic_p italic_o italic_u italic_t, batch size, weight decay, and training time are maintained. In contrast, the second table presents the parameters for Adapter (LLM) and LoRA methods. While epochs are increased to 5 for both, training times differ slightly The fine-tuning process involved 5 epochs for the Adapter-based method and 3 epochs for the LoRA method, with detailed hyperparameters specified for each.In both LoRA and adapter-based fine-tuning, the AdamW optimizer is utilized. Additionally, in the adapter-based method, the negative-log-likelihood loss function is employed as the loss function.

ParametersTinyLlamaphi-2opt-1.3B
Epoch555
Trainable Parameters0.05%0.01%0.03%
Adapter Layer Added222
Training Time1.05hour1.31 hour1.05 hour

IV-C Results and Discussion

In this work , we conducted a comparative analysis of four base models’ performance on two distinct datasets: Dynahate and Hateeval with metrics including accuracy and F1 scores. From Table VI, Among the models assessed, TinyLlama demonstrated moderate performance, achieving an accuracy of 0.50 and an F1 score of 0.61 on DynaHate, while on Hateeval, its accuracy decreased to 0.29 with an F1 score of 0.24. phi-2 exhibited slightly better results, with an accuracy of 0.52 and an F1 score of 0.66 on DynaHate, and a corresponding accuracy of 0.47 and F1 score of 0.28 on Hateeval. opt-1.3b showcased comparable performance across both datasets, with an accuracy of 0.53 and an F1 score of 0.54 on DynaHate, and an accuracy of 0.45 with an F1 score of 0.17 on Hateeval. In contrast, Mistral-7B-v0.1 emerged as the top-performing model, with an accuracy of 0.58 and an F1 score of 0.52 on DynaHate, and notably higher scores on Hateeval, boasting an accuracy of 0.73 and an F1 score of 0.16. Overall, while some models displayed consistency across datasets, others demonstrated varying degrees of performance

Model nameDynaHateHateeval
AccuracyF1AccuracyF1
TinyLlama0.500.610.290.24
phi-20.520.660.470.28
opt-1.3b0.530.540.450.17
Mistral-7B-v0.10.580.520.730.16
Model nameDynaHateHateeval
AccuracyF1AccuracyF1
TinyLlama-1.1B0.710.750.690.7
phi-20.70.760.720.71
opt-1.3b0.710.710.720.74
Model nameDynaHateHateeval
AccuracyF1AccuracyF1
TinyLlama-1.1B0.800.810.790.77
phi-20.800.830.790.78
opt-1.3b0.820.830.800.81

The fine-tuning process has shown remarkable improvements across all models and methodologies compared to their respective base models. It is observed from Table VI, VIII & VII, initially, TinyLlama, although exhibiting moderate accuracy and F1 scores, underwent a substantial transformation post-fine-tuning.

The adapter-based fine-tuned models consistently displayed improvements, as shown in Table VII. For instance, TinyLlama saw its accuracy rise to 0.71 on Dynahate and 0.69 on Hateeval, with F1 scores reaching 0.75 and 0.70, respectively. Likewise, phi-2 achieved accuracies of 0.70 and 0.72 on Dynahate and Hateeval, respectively, alongside F1 scores of 0.76 and 0.71. Meanwhile, opt-1.3b attained accuracies of 0.71 on both datasets, with F1 scores of 0.71 on Dynahate and 0.74 on Hateeval. From Table VII, we note that phi2 achieves a higher F1 score with a slightly lower accuracy for the Dynahate dataset. However, for the Hateeval dataset, opt-1.3b exhibits higher accuracy and an F1 score improvement of 3-4% compared to the other two models

With the LoRa technique (see, Table VIII), its accuracy surged from 0.50 to 0.80 on Dynahate and from 0.56 to 0.79 on Hateeval, accompanied by notable F1 score enhancements, rising from 0.61 to 0.81 and from 0.36 to 0.77, respectively. Similarly, fine-tuning with LoRa significantly improved phi-2, elevating its accuracy from 0.52 to 0.80 on Dynahate and from 0.22 to 0.79 on Hateeval, with F1 scores jumping from 0.66 to 0.83 and from 0.24 to 0.78, respectively. Opt-1.3b, another model subjected to fine-tuning using LoRa, witnessed impressive accuracy increments from 0.53 to 0.82 on Dynahate and from 0.47 to 0.77 on Hateeval, with F1 scores soaring from 0.54 to 0.83 and from 0.25 to 0.70, respectively. Analysis of Table VIII reveals that the opt-1.3b model demonstrates a 2% increase in accuracy and a 1-2% improvement in F1 score for the Dynahate dataset. Additionally, for the Hateeval dataset, its accuracy improves by 1%, and the F1 score sees a boost of 3-4%, when compared to the other two models.

In general, fine-tuning, especially using the LoRa technique, significantly improved the performance of all models across both datasets. Notably, the opt-1.3b model consistently delivered strong performance, indicating its robustness in hate speech detection tasks. It is worth noting that opt-1.3b outperformed the larger phi2-2 model and also performed better than the slightly smaller model, tinyllama. Furthermore, adapter-based fine-tuned models, also exhibited consistent improvements, suggesting the effectiveness of this approach in enhancing model performance.

V Conclusion and Future Work

This study pioneers the use of various tiny GPT-based tiny large language models (LLMs) for hate speech detection. We explore two different fine-tuning approaches and demonstrate that fine-tuned LLMs significantly outperform pre-trained models.Overall, the results suggest that fine-tuning, particularly with the LoRa technique, is crucial for enhancing the performance of base models in hate speech detection tasks. Among the models evaluated, opt-1.3b consistently demonstrated strong performance across both datasets, indicating its robustness in this domain. Future work could focus on exploring additional fine-tuning techniques and conducting more extensive experiments to further improve the efficacy of hate speech detection models. Additionally, investigating the generalizability of these models across different languages and cultural contexts could be a promising direction for future research.

References

  • [1]T.Davidson, D.Warmsley, M.Macy, and I.Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the international AAAI conference on web and social media, vol.11, no.1, 2017, pp. 512–515.
  • [2]P.Fortuna and S.Nunes, “A survey on automatic detection of hate speech in text,” ACM Computing Surveys (CSUR), vol.51, no.4, pp. 1–30, 2018.
  • [3]A.Schmidt and M.Wiegand, “A survey on hate speech detection using natural language processing,” in Proceedings of the fifth international workshop on natural language processing for social media, 2017, pp. 1–10.
  • [4]J.S. Malik, G.Pang, and A.v.d. Hengel, “Deep learning for hate speech detection: a comparative study,” arXiv preprint arXiv:2202.09517, 2022.
  • [5]Z.Zhang, D.Robinson, and J.Tepper, “Detecting hate speech on twitter using a convolution-gru based deep neural network,” in The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15.Springer, 2018, pp. 745–760.
  • [6]P.Badjatiya, S.Gupta, M.Gupta, and V.Varma, “Deep learning for hate speech detection in tweets,” in Proceedings of the 26th international conference on World Wide Web companion, 2017, pp. 759–760.
  • [7]A.Vaswani, N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A.N. Gomez, Ł.Kaiser, and I.Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol.30, 2017.
  • [8]J.Devlin, M.-W. Chang, K.Lee, and K.Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  • [9]K.Clark, M.-T. Luong, Q.V. Le, and C.D. Manning, “Electra: Pre-training text encoders as discriminators rather than generators,” arXiv preprint arXiv:2003.10555, 2020.
  • [10]M.Lewis, Y.Liu, N.Goyal, M.Ghazvininejad, A.Mohamed, O.Levy, V.Stoyanov, and L.Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
  • [11]M.Mozafari, R.Farahbakhsh, and N.Crespi, “A bert-based transfer learning approach for hate speech detection in online social media,” in Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8.Springer, 2020, pp. 928–940.
  • [12]S.S. Aluru, B.Mathew, P.Saha, and A.Mukherjee, “A deep dive into multilingual hate speech classification,” in Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part V.Springer, 2021, pp. 423–439.
  • [13]K.Maity, T.Sen, S.Saha, and P.Bhattacharyya, “Mtbullygnn: a graph neural network-based multitask framework for cyberbullying detection,” IEEE Transactions on Computational Social Systems, 2022.
  • [14]M.R. Awal, R.K.-W. Lee, E.Tanwar, T.Garg, and T.Chakraborty, “Model-agnostic meta-learning for multilingual hate speech detection,” IEEE Transactions on Computational Social Systems, 2023.
  • [15]K.Guo, A.Hu, J.Mu, Z.Shi, Z.Zhao, N.Vishwamitra, and H.Hu, “An investigation of large language models for real-world hate speech detection,” in 2023 International Conference on Machine Learning and Applications (ICMLA).IEEE, 2023, pp. 1568–1573.
  • [16]S.Pendzel, T.Wullach, A.Adler, and E.Minkov, “Generative ai for hate speech detection: Evaluation and findings,” arXiv preprint arXiv:2311.09993, 2023.
  • [17]B.Vidgen, T.Thrush, Z.Waseem, and D.Kiela, “Learning from the worst: Dynamically generated datasets to improve online hate detection,” arXiv preprint arXiv:2012.15761, 2020.
  • [18]V.Basile, C.Bosco, E.Fersini, D.Nozza, V.Patti, F.M.R. Pardo, P.Rosso, and M.Sanguinetti, “Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter,” in Proceedings of the 13th international workshop on semantic evaluation, 2019, pp. 54–63.
  • [19]P.Zhang, G.Zeng, T.Wang, and W.Lu, “Tinyllama: An open-source small language model,” arXiv preprint arXiv:2401.02385, 2024.
  • [20]Y.Li, S.Bubeck, R.Eldan, A.DelGiorno, S.Gunasekar, and Y.T. Lee, “Textbooks are all you need ii: phi-1.5 technical report,” arXiv preprint arXiv:2309.05463, 2023.
  • [21]S.Zhang, S.Roller, N.Goyal, M.Artetxe, M.Chen, S.Chen, C.Dewan, M.Diab, X.Li, X.V. Lin etal., “Opt: Open pre-trained transformer language models, 2022,” URL https://arxiv. org/abs/2205.01068, vol.3, pp. 19–0, 2023.
  • [22]E.J. Hu, Y.Shen, P.Wallis, Z.Allen-Zhu, Y.Li, S.Wang, L.Wang, and W.Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  • [23]N.Houlsby, A.Giurgiu, S.Jastrzebski, B.Morrone, Q.DeLaroussilhe, A.Gesmundo, M.Attariyan, and S.Gelly, “Parameter-efficient transfer learning for nlp,” in International conference on machine learning.PMLR, 2019, pp. 2790–2799.
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models (3)Tanmay Sen is a seasoned Lead Data Scientist at Ericsson, Kolkata. He earned his B.Sc. (Hons) and M.Sc. degrees in Mathematics from the University of Calcutta in 2009 and 2011, respectively. He pursued his M.Tech in Mathematics and Computing and later obtained his PhD in Statistics from the Indian Institute of Technology Patna in 2014 and 2019, respectively. His research spans various cutting-edge domains, including Deep Learning, Federated Learning, Meta Learning, Graph Neural Networks, NLP, Time series and Survival Analysis.
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models (4)Ansuman Das is an experienced Lead Data Scientist with approximately 10 years of expertise in data science, analytics, and machine learning. He has held significant positions at prominent companies, presently serving as a Lead Data Scientist at Ericsson since 2021. He completed his B.Tech in Computer Science &\&& Engineering from IIIT Bhubaneswar and pursued his M.Tech in Software Systems from BITS Pilani. Currently, he is enrolled in an MA program in Economics at IGNOU. His research interests include Deep Learning, Generative AI, Natural Language Processing, and Cloud Computing.
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models (5)Mrinmay Sen is a joint research scholar in the Department of Artificial Intelligence, Indian Institute of Technology Hyderabad and the Department of Computing Technologies, Swinburne University of Technology, Australia. He completed his M.Tech from Indian Institute of Technology Dhanbad and B.E. from Jadavpur University, Kolkata. His research encompasses Federated Optimization and its applications in real-life scenarios, as well as Computer Vision and deep learning.
HateTinyLLM : Hate Speech Detection Using Tiny Large Language Models (2024)
Top Articles
Latest Posts
Article information

Author: Dan Stracke

Last Updated:

Views: 6428

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Dan Stracke

Birthday: 1992-08-25

Address: 2253 Brown Springs, East Alla, OH 38634-0309

Phone: +398735162064

Job: Investor Government Associate

Hobby: Shopping, LARPing, Scrapbooking, Surfing, Slacklining, Dance, Glassblowing

Introduction: My name is Dan Stracke, I am a homely, gleaming, glamorous, inquisitive, homely, gorgeous, light person who loves writing and wants to share my knowledge and understanding with you.