Muslims in ML Workshop | NeurIPS'24

Accepted Papers

“Compassionately”: Increasing Plurality Awareness through Community-powered AI

Hala Sheta, Mohamed Ahmed, Syed Ishtiaque Ahmed

Show Abstract

The prevalence of Islamophobia has resulted in the continual discrimination of Muslims in North America, where polls have found them to be viewed in a negative light, specifically as a rigid, violent and monolithic community. Furthermore, within the Muslim community, many are ignorant and intolerant of differing perspectives to their own beliefs, viewing Islamic jurisprudence as a one-sided, static truth. However, recent work in this domain rightfully reframe it as a human-centered field, and work to amplify minority voices (e.g. women) that are often undermined in textual interpretations. So, our research is centered around utilizing community-powered AI to increase plurality awareness and understanding, both within and outside of the Muslim community. The end goal is to develop a semi-automated hate-speech detection system that educates its users about the multiplicity of perspectives surrounding a topic of interest. This paper will serve as foundational work to this overarching goal by gauging the current climate surrounding the discussion of Islamic perspectives both through a qualitative analysis of online discussions on Reddit and conducting a controlled user study.

Physics Informed Model Based Reinforcement Learning for Controlling Synchronization of Weakly Coupled Kuramoto System

Alif Bin Abdul Qayyum, A N M Nafiz Abeer

Show Abstract

Kuramoto network as a representative of collective dynamics presents a challenging control task of affecting the synchronization of the interacting oscillators. As the dynamics become harder to estimate, making use of a learned model for controlling purposes is difficult. Learning through interactions with the environment enhanced by model-based reinforcement learning (MBRL) algorithms can alleviate the lack of sample efficiency involved with model-free reinforcement learning (MFRL) methods. Given prior knowledge of the underlying dynamics of the system, physics-informed MBRL can achieve even higher efficiency. In this study, we compare the performance of physics-informed MBRL, MBRL, and MFRL in synchronizing the Kuramoto network. We assess the scalability of these three reinforcement learning methods in a naturally chaotic or unsynchronized network.

CoMRIAD: A Novel Deep Learning-based Neuroimage Analysis Pipeline for Improved Alzheimer’s Disease Detection by Combining Magnetic Resonance Image Planes

Noushath Shaffi, Mufti Mahmud

Show Abstract

Three-dimensional magnetic resonance images (MRI) have emerged as a valuable tool to diagnose and characterise Alzheimer's Disease (AD). Most current MRI analysis pipelines for AD detection focus on a single plane, limiting their ability to capture subtle changes associated with different stages of the disease. This paper proposes a novel deep learning-based pipeline called CoMRIAD that combines the three MRI planes (coronal, axial and sagittal, and referred to as combiplane) for enhanced AD detection and classification. Transfer learning architectures like InceptionV3, InceptionResNetV2, Xception, DenseNet121, and CNN were separately trained and tested on individual planes as well as the combiplane. Experimental results demonstrate that CoMRIAD outperforms single-plane MRI analysis, achieving a 6-8% increase in overall accuracy for two-way and four-way classification tasks. The heatmaps generated using GradCAM and Pearson's correlation coefficient computed between the original MRI and heatmap show high affinity to the predicted class. The CoMRIAD enhances AD detection from 3D MRI, facilitating the monitoring of the disease and relevant interventions. The source code CoMRIAD implementation can be found at: https://github.com/brai-acslab/comriad.

Intelligent Departure Metering Advisory Tool (I-MATE) for Airport Airside Congestion Management

Hasnain Ali, sameer alam

Show Abstract

Airport airside taxi delays significantly impact airlines, passengers, and the environment. Departure Metering (DM) is an effective approach to contain taxi delays by controlling departure pushback timings. In this work, we demonstrate the potential of Deep Reinforcement Learning (DRL) based DM method to reduce taxi delays by effectively transferring delays from taxiways to gates. This work casts the DM problem in a markov decision process framework to train a DM policy over simulations generated using historical airport surface movement data. We further develop an Intelligent Departure Metering Assistant Tool (I-MATE) that employs the trained DM policy to recommend pushback advisories to Air Traffic Controller (ATCO). We conducted validation experiments to assess the efficacy and acceptability of I-MATE in assisting ATCOs to manage airside traffic. The results reveal a significant reduction in taxi delays (25.6\%) with increased compliance with I-MATE recommendations, which may translate to improved efficiency, cost savings for airlines, and enhanced passenger experience. While increased compliance reduced taxi delays, a slight decrease in runway throughput (3.2\%) was also observed. This suggests a potential trade-off between optimizing runway usage and minimizing delays. The study also reveals a spectrum of compliance among ATCOs, influenced by factors like experience and age. Qualitative feedback indicates high user satisfaction with I-MATE, suggesting its usefulness, reliability, and trustworthiness. This research underscores the value of AI-based decision support systems for air traffic control, thereby paving the way for further advancements in airside traffic management.

Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation

Haris Riaz, Ellen Riloff, Mihai Surdeanu

Show Abstract

We propose a simple, unsupervised method that injects pragmatic principles in retrieval-augmented generation (RAG) frameworks such as Dense Passage Retrieval (DPR). Our approach first identifies which sentences in a pool of documents retrieved by RAG are most relevant to the question at hand, cover all the topics addressed in the input question and no more, and then highlights these sentences in the documents before they are provided to the LLM. We show that this simple idea brings consistent improvements in experiments on three question answering tasks (ARC-Challenge, PubHealth and PopQA) using three different LLMs. It notably enhances accuracy by up to 19.7% compared to a conventional RAG system on PubHealth.

Adaptation of Kruskal’s Uniqueness Conditions to multiview CP

Suleiman A. Khan, Muhammad Irfan Khan

Show Abstract

Several uniqueness conditions have been formulated for the Candecomp/Parafac(CP) model. Though a single condition which is both sufficient and necessary is yet to be discovered, there exist several necessary conditions and a few sufficient ones as well. Here we observe the adaptation of the most general known necessary as well as sufficient conditions of CP uniqueness to the multiview CP case.

Deep Learning in the Wild for Industrial Scale Plastic Waste Sorting

Al Shafayet Haque Silvy, Kanishka Tyagi, Isha Kamleshbhai Maun, Bin Chen, Nalin Kumar

Show Abstract

Efficient sorting of plastic waste remains a critical bottleneck in recycling systems, with current approaches relying on manual labor or semi-automated solutions that contribute to large amounts of plastics ending up in landfills. Despite the rapid growth of the global plastic recycling market, projected to reach \$120 billion by 2030, existing sorting technologies struggle to meet demands for accuracy and throughput \cite{marketsandmarkets_recycled_plastic}. While recent ML breakthroughs show promise in waste sorting, a complete industrial-scale pipeline has been overlooked. We propose a novel, low-cost machine learning system that addresses real-world challenges in plastic sorting: varying material types, inconsistent lighting conditions, and contaminated surfaces. Our key contributions include: (1) a scalable deep learning architecture featuring two adaptive pipelines - one for data collection and another for classification, optimized for industrial deployment, (2) curation of the world's first comprehensive industrial dataset of 40,000 plastic samples, and (3) an interpretable approach leveraging Grad-CAM and t-SNE visualizations to tackle challenging cases like dark and distorted plastics. The proposed sorting system demonstrates commercial viability by processing 200 samples per hour across five plastic types common in municipal solid waste (MSW), with potential earnings of \$30 per ton.

Solving Kuramoto Oscillator Model using Physics Informed Neural Network

Alif Bin Abdul Qayyum, A N M Nafiz Abeer

Show Abstract

Physics informed machine learning has been emerged as a powerful tool with the help of deep learning as the latter has been instrumental as a data-driven function approximator. Many recent works have been focusing on solving hard to solve differential equations with the help of physics informed neural network (PINN), a tremendously simple approach which blends physics and deep learning. We explore the application of PINN in solving Kuramoto system of coupled differential equations as well as in decision making problem of synchronization state of the system. The experimental results illustrate that PINN can not only be used to solve the coupled differential equations but be applied to figure out the synchronization capability of the oscillator system in consideration.

Network Inversion of Convolutional Neural Nets

Pirzada Suhail, Amit Sethi

Show Abstract

Neural networks have emerged as powerful tools across various applications, yet their decision-making process often remains opaque, leading to them being perceived as "black boxes." This opacity raises concerns about their interpretability and reliability, especially in safety-critical scenarios. Network inversion techniques offer a solution by allowing us to peek inside these black boxes, revealing the features and patterns learned by the networks behind their decision-making processes and thereby provide valuable insights into how neural networks arrive at their conclusions, making them more interpretable and trustworthy. This paper presents a simple yet effective approach to network inversion using a meticulously conditioned generator that learns the data distribution in the input space of the trained neural network, enabling the reconstruction of inputs that would most likely lead to the desired outputs. To capture the diversity in the input space for a given output, instead of simply revealing the conditioning labels to the generator, we encode the conditioning label information into vectors and intermediate matrices and further minimize the cosine similarity between features of the generated images. Additionally, we incorporate feature orthogonality as a regularization term to boost image diversity which penalises the deviations of the Gram matrix of the features from the identity matrix, ensuring orthogonality and promoting distinct, non-redundant representations for each label.

G-RAG: Knowledge Expansion in Material Science

Radeen Mostafa, Mirza Nihal Baig, Mashaekh Tausif Ehsan, Jakir Hasan

Show Abstract

In the field of Material Science, effective information retrieval systems are essential for facilitating research. Traditional Retrieval-Augmented Generation (RAG) approaches in Large Language Models (LLMs) often encounter challenges such as outdated information, hallucinations, limited interpretability due to context constraints, and inaccurate retrieval. To address these issues, Graph RAG integrates graph databases to enhance the retrieval process. Our proposed method processes Material Science documents by extracting key entities (referred to as MatIDs) from sentences, which are then utilized to query external Wikipedia knowledge bases (KBs) for additional relevant information. We implement an agent-based parsing technique to achieve a more detailed representation of the documents. Our improved version of Graph RAG called G-RAG further leverages a graph database to capture relationships between these entities, improving both retrieval accuracy and contextual understanding. This enhanced approach demonstrates significant improvements in performance for domains that require precise information retrieval, such as Material Science.

A Novel LLM-Based Approach for Automated Seerah-Hadith Mapping: Connecting Islamic Historical Narratives Through Vector Search and Semantic Analysis

Mushfiqur Rahman Talha, Mohammad Galib Shams, Riasat Islam, Nabil Mosharraf

Show Abstract

Seerah and Hadith are essential sources of Islamic knowledge, but there has been limited research on systematically linking these two areas. This paper introduces the "Seerah-Hadith Mapping" project, which uses Large Language Models (LLMs) to map related passages between Seerah and Hadith. By adding new connections between these texts, this approach builds on existing scholarship and helps make Islamic knowledge more accessible to those without specialized knowledge in Islamic studies.

AI-Driven Demand-Oriented STEM Education Strategy for Our Muslim Community

Yan Sha, Zhao DONG, Shaokai Yang

Show Abstract

In the context of rapidly advancing global technology, Science, Technology, Engineering, and Mathematics (STEM) education in the Middle East and Muslim-majority regions is essential for driving innovation and supporting economic diversification. But, significant gaps remain between current educational practices and our vision, particularly in learning methodologies, student motivation, employment market alignment, and educational equity. We propose a comprehensive strategic framework that leverages large language models (LLMs) and virtual reality (VR) to create an AI-supported, closed-loop skills training system offering immersive and personalized learning experiences. Additionally, it promotes a mutually beneficial, cross-regional educational cooperation model that fosters resource sharing between economically developed and underdeveloped areas to support the development of Muslim communities around the world. This framework aims to establish an inclusive and efficient global STEM education system within Muslim communities, empowering the younger generation to meet future challenges while ensuring sustainable returns for sponsors in cultivating global tech talent.

A Closer Look at Sparse Training in Deep Reinforcement Learning

Muhammad Athar Ganaie, Vincent Michalski, Samira Ebrahimi Kahou, Yani Ioannou

Show Abstract

Deep neural networks have enabled remarkable progress in reinforcement learning across a variety of domains, yet advancements in model architecture, especially involving sparse training, remain under-explored. Sparse architectures hold potential for reducing computational overhead in deep reinforcement learning (DRL), where prior studies suggest that parameter under-utilization may create opportunities for efficiency gains. This work investigates adaptation of sparse training methods from supervised learning to DRL, specifically examining pruning and the RigL algorithm in value-based agents like DQN. In our experiments across multiple Atari games, we study factors neglected in supervised sparse training which are of relevance to DRL, such as the impact of the bias parameter in high-sparsity regimes and the dynamics of dormant neurons under sparse conditions. The results reveal that RigL, despite its adaptability in supervised contexts, under-performs relative to pruning in DRL. Strikingly, removing bias parameters enhances RigL's performance, reduces dormant neurons and improves stability in high sparsity, while pruning suffers the opposite effect. These empirical observations underscore the imperative to re-evaluate sparse training methodologies, particularly within the context of DRL paradigms. The results elucidate the necessity for further investigation into the applicability of sparse training techniques across more expansive architectural frameworks and diverse environments.

3D localization and autofocus of the particle field based on deep learning and depth-from-defocus

Zhao DONG, Shaokai Yang, Yan Sha

Show Abstract

Accurate three-dimensional positioning of particles is a critical task in microscopic particle research, with one of the main challenges being the measurement of particle depths. We present a novel approach for precise three-dimensional (3D) localization and autofocus of microscopic particles by integrating Depth-from-Defocus (DfD) techniques with deep learning. Our method combines You Only Look Once (YOLO) for lateral position detection with Generative Adversarial Networks (GANs) for autofocus, providing an efficient, noise-resistant, and real-time solution. Validated on synthetic datasets, static particle fields, and dynamic scenarios, the method achieved 99.9\% accuracy on synthetic datasets and performed robustly on polystyrene particles, red blood cells, and plankton. Our algorithm can process a single multi-target image in 0.008 seconds, enabling real-time applications. Future work includes integrating Diffusion Models and the latest version of YOLO to enhance depth estimation and detection accuracy. Additionally, we are developing a user-friendly pipeline equipped with a graphical user interface (GUI) to make these advanced tools accessible to researchers across different disciplines, even those without prior deep learning expertise. This evolving pipeline will be continuously updated to improve precision and efficiency, making it a powerful and accessible tool for high-precision particle analysis in a wide range of scientific applications.

Bayesian Similarity-Weighted Aggregation for Federated Brain Tumor Segmentation

Muhammad Irfan Khan, Suleiman A. Khan, Elina Kontio, Mojtaba Jafaritadi

Show Abstract

We propose a Bayesian generative approach, Bayesian Similarity-weighted Aggregation (SimAgg), for combining model weights from federated collaborators in brain lesion segmentation. This method effectively adapts to data variability and incorporates probabilistic modeling to handle uncertainty, enhancing robustness in federated learning (FL). Using a novel multi-armed bandit setup, it dynamically selects collaborators to improve aggregation quality. Simulation results on multi-parametric MRI data show that Bayesian SimAgg achieves high Dice scores across tumor regions and converges approximately twice as fast as non-Bayesian methods, providing an effective framework for federated brain tumor segmentation.

Explainable, Generalizable and Responsible AI Model to Triage Emergency Patients

Jemal A Fulli, Berihun T Alemayehu, Omer K Yasin, Abubeker S Ahmed, Muhammed A Sualih

Show Abstract

Triage helps to deliver the right level of emergency healthcare at the right time for the right person using the right resources. However, triage is vulnerable to mis-triage which causes delayed treatment, poor healthcare outcomes and ED overcrowding. This study, hence, aimed to develop an explainable, generalizable and responsible AI model that assists triage nurses. We identify the most important predictors, measure the order, direction, and effects of important predictors across triage levels, and quantify the minimum information required to develop a generalizable triage model.

Long-Tail Learning with Language Model Guided Curricula

Mohammed Adnan, Rahul Krishnan, Yani Ioannou

Show Abstract

Real-world datasets often have class imbalance and follow a long-tail distribution, in contrast to curated datasets, such as CIFAR-10/100, MNIST, etc. Learning from long-tail distributed datasets is a challenging problem due to few representative samples from the tail classes, which makes it difficult for the model to learn robust representations. We posit that curriculum learning presents a viable route to iteratively learn good predictive models that better capture predictive signals about rare classes. We propose a simple method to leverage label hierarchies to craft curricula for learning. For real-world datasets, when the label hierarchy trees are not typically available and manually creating a hierarchy is tedious and expensive, we show that LLMs can be used to compose semantic information about the labels and generate label hierarchies to serve as curricula. We perform a thorough empirical evaluation of our method, showing that across different model architectures (ResNet, ViT, and ConvNext) and on multiple datasets (ImageNet, Places365-LT, iNaturalist, etc), we show that LLMs can be used to generate meaningful hierarchies. Our method improves performance on the long-tail classes and achieves state-of-the-art results on multiple large-scale datasets.

Towards Privacy-Preserving Medical Imaging: Federated Learning with Differential Privacy and Secure Aggregation Using a Modified ResNet Architecture

Mohamad Haj Fares, Ahmed Mohamed Saad Emam Saad

Show Abstract

With increasing concerns over privacy in healthcare, especially for sensitive medical data, this research introduces a federated learning framework that combines local differential privacy and secure aggregation using Secure Multi-Party Computation for medical image classification. Further, we propose DPResNet, a modified ResNet architecture optimized for differential privacy. Leveraging the BloodMNIST benchmark dataset, we simulate a realistic data-sharing environment across different hospitals, addressing the distinct privacy challenges posed by federated healthcare data. Experimental results indicate that our privacy-preserving federated model achieves accuracy levels close to non-private models, surpassing traditional approaches while maintaining strict data confidentiality. By enhancing the privacy, efficiency, and reliability of healthcare data management, our approach offers substantial benefits to patients, healthcare providers, and the broader healthcare ecosystem.

A Contextualized BERT model for Knowledge Graph Completion

Haji Gul, Abdul Ghani Haji Naim, Ajaz A Bhat

Show Abstract

Knowledge graphs (KGs) are valuable for representing structured, interconnected information across domains, enabling tasks like semantic search, recommendation systems and inference. A pertinent challenge with KGs, however, is that many entities (i.e., heads, tails) or relationships are unknown. Knowledge Graph Completion (KGC) addresses this by predicting these missing nodes or links, enhancing the graph's informational depth and utility. Traditional methods like TransE and ComplEx predict tail entities but struggle with unseen entities. Textual-based models leverage additional semantics but come with high computational costs, semantic inconsistencies, and data imbalance issues. Recent LLM-based models show improvement but overlook contextual information and rely heavily on entity descriptions. In this study, we introduce a contextualized BERT model for KGC that overcomes these limitations by utilizing the contextual information from neighbouring entities and relationships to predict tail entities. Our model eliminates the need for entity descriptions and negative triplet sampling, reducing computational demands while improving performance. Our model outperforms state-of-the-art methods on standard datasets, improving Hit@1 by 5.3\% and 4.88\% on FB15k-237 and WN18RR respectively, setting a new benchmark in KGC.

Multi-Modal Pipeline Defect Localization

Mariam Manzoor, Zahra Arabi Narei, Henry Leung, Scott Miller

Show Abstract

This study investigates the use of Laser and Magnetic Flux Leakage (MFL) pipeline data to develop a deep learning model for accurate detection and segmentation of pipeline defects. Laser images are used to precisely identify defect regions and provide labels for training a Mask R-CNN model for localizing and segmenting defects in MFL signals. Unlike conventional datasets where ground-truth labels are pixel-wise accurate, our labels are derived from a different sensor modality, resulting in misalignment and feature discrepancies between the laser and MFL data. These discrepancies lead to label noise and domain shift. Our experiments show that training advanced object detection and segmentation models using only laser-derived labels does not achieve accurate defect localization in MFL signals. This underscores the need for models capable of handling label discrepancies and adapting across domains to ensure robust and scalable performance in real-world pipeline defect detection.

Neural Machine Translators (NMTs) as Efficient Forward and Backward Arabic Transliterators

Toyib Ogunremi, Anthony Soronnadi, Olamide Teslim Shogbamu, Olubayo Adekanmbi

Show Abstract

This study addresses the challenge in converting Romanized Arabic text back to its original Arabic script, a capability that remains largely unsupported by existing transliteration tools. We propose that both forward and backward transliteration tasks can be effectively approached as machine translation problems. To test this hypothesis, we fine-tune three HuggingFace transformer-based Neural Machine Translation (NMT) Pretrained Language Models (PLMs) on Arabic and Romanized script datasets. Experimental results demonstrate that these models perform well, achieving approximately 99 ROUGE score and 95 BLEU score. Our findings underscore the potential of NMT models to accurately handle transliteration, offering a valuable resource for improving Arabic language accessibility and communication.

MIMIC: Multimodal Islamophobic Meme Identification and Classification

S M Jishanul Islam, Sahid Hossain Mustakim, Sadia Ahmmed, Md. Faiyaz Abdullah Sayeedi, Swapnil Khandoker, Syed Tasdid Azam Dhrubo, Nahid Hossain

Show Abstract

Anti-Muslim hate speech has emerged within memes, characterized by context-dependent and rhetorical messages using text and images that seemingly mimic humor but convey Islamophobic sentiments. This work presents a novel dataset and proposes a classifier based on the Vision-and-Language Transformer (ViLT) specifically tailored to identify anti-Muslim hate within memes by integrating both visual and textual representations. Our model leverages joint modal embeddings between meme images and incorporated text to capture nuanced Islamophobic narratives that are unique to meme culture, providing both high detection accuracy and interoperability.

Oscar: The Generative AI student assistant

Mansur Ali Khan, Abdullah Khan

Show Abstract

We introduce Oscar, a personalized educational assistant serving as a companion to young students to maximize the quality of education and enable them to learn at their own pace. Our formulation uses a comprehensive set of attributes that are needed to be modelled to provide individualized learning to students. We discover these attributes using an ensemble of three LLMs. We then use the attribute-based profile to build a GenAI solution for individualized learning. Through manual human annotation, we identify that only 19\% common attributes are provided by all three LLMs, while 31.5\% are common across two LLMs; and the remaining 49.5\% are specific to only one of them; demonstrating diverging understanding across LLMs. Utilizing a consolidated attribute profile, Oscar displays highly customized responses to individual student needs. We discuss the strengths and limitations of the approach and offer recommendations for educators, GenAI developers as well as policymakers to promote the integration of GenAI tools in childhood education.

ColFlor: Towards BERT-Size Vision-Language Document Retrieval Models

Ahmed Masry, Enamul Hoque

Show Abstract

Traditional document retrieval systems for PDFs, charts, and infographics rely heavily on Optical Character Recognition (OCR) pipelines to extract textual content, a process that is both error-prone and resource-intensive. Recent advancements in multimodal models like ColPali have enabled OCR-free retrieval by processing documents directly as images, but their large size (three billion parameters) makes them computationally expensive and impractical for large-scale applications. To address this limitation, we introduce ColFlor, an efficient OCR-free visual document retrieval model with only 174 million parameters. ColFlor achieves comparable performance to ColPali on text-rich English documents—with only a 1.8% decrease in performance (measured by NDCG@5 metric)—while being significantly faster in image encoding (5.25 times faster) and query encoding (9.8 times faster). This makes OCR-free document retrieval systems more cost-effective for large-scale applications and more accessible to users with limited computational resources.

Instance weighting-based Knowledge Transfer Network for Seismic Fault Detection

Tiash Ghosh, Mohammed Fayiz Parappan, Mamata Jenamani, AUROBINDA ROUTRAY

Show Abstract

Geological Fault Detection is a crucial aspect of earthquake prediction and oil exploration. With the advancements in deep learning, the challenging task of accurate fault detection has gained popularity. While the traditional deep learning methods struggle due to the labeling process, training a model solely on synthetic data may not yield satisfactory results due to the disparities between synthetic and real seismic data. To mitigate the impact of these differences, we propose employing an instance weighting-based transfer learning. This allows the model to adapt to only the unique characteristics of the geological data. The proposed method yields satisfying results on the Indian Krishna Godavari Basin dataset.

QMorphVec: A Morphologically-Aware Embedding of Quranic Vocabulary

Doratossadat Dastgheib, Alireza Sahebi, Ehsan Khadangi, Ehsaneddin Asgari

Show Abstract

Developing effective word representations that incorporate linguistic features and capture contextual information is an essential step in natural language processing (NLP) tasks. When working with a text corpus from a specific domain with profound meanings, such as the Holy Quran, deriving word representations based on domain-specific textual contexts is particularly valuable. In this research, we employ a context-masking approach to generate separate embedding spaces for Quranic roots, lemmas, and surface forms, and then project them into a common space through linear mapping. We demonstrate that our in-domain embeddings, trained solely on Quranic text and it morphological contexts, perform comparably to—and, in some cases, better than—OpenAI's large embeddings while surpassing the multilingual XLM-R embeddings. Additionally, through qualitative analysis, we illustrate their utility in Quranic word analogy tasks. The code and the embeddings are available at: [anonymized for the double-blinded review].

Reducing Reasoning Costs - The Path of Optimization for Chain of Thought via Sparse Attention Mechanism

Libo Wang

Show Abstract

In order to address the chain of thought in the large language model inference cost surge, this research proposes to use a sparse attention mechanism that only focuses on a few relevant tokens. The researcher constructed a new attention mechanism and used GiantRabbit trained with custom GPTs as an experimental tool. The experiment tested and compared the reasoning time, correctness score and chain of thought length of this model and o1 Preview in solving the linear algebra test questions of MIT OpenCourseWare. The results show that GiantRabbit's reasoning time and chain of thought length are significantly lower than o1 Preview. It verifies the feasibility of sparse attention mechanism for optimizing chain of thought reasoning. Detailed architectural details and experimental process have been uploaded to Github, the link is:https://github.com/brucewang123456789/GeniusTrail.git.

Website theme adapted from CIOL Website (GitHub)