About 50 results
Open links in new tab
  1. Understanding Chain-of-Thought in LLMs through Information ...

    Jul 10, 2025 · Large Language Models (LLMs) have shown impressive performance in complex reasoning tasks through the use of Chain-of-Thought (CoT) reasoning, allowing models to break …

  2. Diffusion Glancing Transformer for Parallel Sequence to ...

    2023-11-29 Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

  3. MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal ...

    Feb 13, 2025 · ABSTRACT Answering questions with Chain-of-Thought (CoT) has significantly enhanced the reasoning capabilities of Large Language Models (LLMs), yet its impact on Large …

  4. Classification Done Right for Vision-Language Pre-Training

    Nov 5, 2024 · ABSTRACT We introduce SuperClass, a super simple classification method for vision-language pre-training on image-text data. Unlike its contrastive counterpart CLIP who contrast with a …

  5. Understanding Stragglers in Large Model Training Using What ...

    May 9, 2025 · ABSTRACT Large language model (LLM) training is one of the most demanding distributed computations today, often requiring thousands of GPUs with frequent synchronization …

  6. MMaDA: Multimodal Large Diffusion Language Models

    May 21, 2025 · We introduce MMaDA, a novel class of multimodal diffusion foundation models designed to achieve superior performance across diverse domains such as textual reasoning, multimodal …

  7. ByteDance Seed

    Despite the widespread applications of machine learning force fields (MLFFs) in solids and small molecules, there is a notable gap in applying MLFFs to simulate liquid electrolytes—a critical …