SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models Paper • 2412.11605 • Published Dec 16, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 106
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 42
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval Paper • 2412.15443 • Published Dec 19, 2024 • 10
ProgCo: Program Helps Self-Correction of Large Language Models Paper • 2501.01264 • Published Jan 2 • 27
SDPO: Segment-Level Direct Preference Optimization for Social Agents Paper • 2501.01821 • Published Jan 3 • 20
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Paper • 2501.02506 • Published Jan 5 • 11
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models Paper • 2501.03124 • Published Jan 6 • 14
Evaluating Sample Utility for Data Selection by Mimicking Model Weights Paper • 2501.06708 • Published Jan 12 • 5
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published Feb 10 • 61
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published Feb 25 • 58
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following Paper • 2502.14494 • Published Feb 20 • 15
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26 • 22
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published Feb 26 • 28
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 57
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens Paper • 2502.18890 • Published Feb 26 • 30
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity Paper • 2503.01506 • Published Mar 3 • 9
General Reasoning Requires Learning to Reason from the Get-go Paper • 2502.19402 • Published Feb 26 • 5
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition Paper • 2503.00735 • Published Mar 2 • 22
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval Paper • 2503.04644 • Published Mar 6 • 21
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles Paper • 2502.18968 • Published Feb 26 • 3
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention Paper • 2503.10602 • Published Mar 13 • 4
Temporal Consistency for LLM Reasoning Process Error Identification Paper • 2503.14495 • Published Mar 18 • 11
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees Paper • 2503.08893 • Published Mar 11 • 5
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base Paper • 2503.23361 • Published Mar 30 • 6
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization Paper • 2503.20286 • Published Mar 26 • 4
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published Apr 1 • 44
Heimdall: test-time scaling on the generative verification Paper • 2504.10337 • Published Apr 14 • 33
Efficient Process Reward Model Training via Active Learning Paper • 2504.10559 • Published Apr 14 • 13
AI-University: An LLM-based platform for instructional alignment to scientific classrooms Paper • 2504.08846 • Published Apr 11 • 10
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published May 1 • 26
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1 • 37
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published Apr 28 • 37
TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering Paper • 2504.20114 • Published Apr 28 • 4
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning Paper • 2504.19162 • Published Apr 27 • 18
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation Paper • 2503.12854 • Published Mar 17
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Paper • 2505.02625 • Published May 5 • 22
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Paper • 2504.21117 • Published Apr 29 • 26
CORG: Generating Answers from Complex, Interrelated Contexts Paper • 2505.00023 • Published Apr 25 • 9
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference Paper • 2505.02922 • Published May 5 • 28
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Paper • 2505.03981 • Published May 6 • 15
ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning Paper • 2505.15776 • Published May 21 • 10
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs Paper • 2505.13529 • Published May 18 • 11
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations Paper • 2505.18125 • Published May 23 • 112
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization Paper • 2505.18092 • Published May 23 • 44
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20 • 78
Can Large Language Models Infer Causal Relationships from Real-World Text? Paper • 2505.18931 • Published May 25 • 1
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists Paper • 2506.01241 • Published Jun 2 • 9
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models Paper • 2506.06485 • Published Jun 6 • 5
Cartridges: Lightweight and general-purpose long context representations via self-study Paper • 2506.06266 • Published Jun 6 • 5
Improving large language models with concept-aware fine-tuning Paper • 2506.07833 • Published Jun 9 • 3
HASHIRU: Hierarchical Agent System for Hybrid Intelligent Resource Utilization Paper • 2506.04255 • Published Jun 1 • 5
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 262