📅 2026-04-14

每日精选 · arXiv · Hacker News · GitHub Trending

计算机视觉: 5 归档 →

📄 arXiv 论文
计算机视觉 2604.10442
相关性 90/100

ReContraster: Making Your Posters Stand Out with Regional Contrast

ReContraster:通过区域对比使您的海报脱颖而出

Peixuan Zhang, Zijian Jia, Ziqi Cai, Shuchen Weng, Si Li 等 (6 位作者)

核心贡献: 提出了首个无需训练的模型ReContraster,利用区域对比原理增强海报视觉吸引力,并通过多智能体系统模拟设计师认知行为优化海报设计。
方法: 1. 采用组合式多智能体系统识别元素、组织布局并评估候选海报设计;2. 在扩散过程中集成混合去噪策略以确保区域边界的和谐过渡;3. 构建了新的基准数据集用于全面评估。
关键发现: 通过7项定量指标和4项用户研究验证,ReContraster在生成视觉冲击力强且美学吸引力高的海报方面优于现有最先进方法。
查看原文摘要

Effective poster design requires rapidly capturing attention and clearly conveying messages. Inspired by the ``contrast effects'' principle, we propose ReContraster, the first training-free model to leverage regional contrast to make posters stand out. By emulating the cognitive behaviors of a poster designer, ReContraster introduces the compositional multi-agent system to identify elements, organize layout, and evaluate generated poster candidates. To further ensure harmonious transitions across region boundaries, ReContraster integrates the hybrid denoising strategy during the diffusion process. We additionally contribute a new benchmark dataset for comprehensive evaluation. Seven quantitative metrics and four user studies confirm its superiority over relevant state-of-the-art methods, producing visually striking and aesthetically appealing posters.

计算机视觉 2604.11792
相关性 85/100

LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

LottieGPT:将矢量动画标记化以实现自回归生成

Junhao Chen, Kejun Gao, Yuehan Cui, Mingze Sun, Mingjin Chen 等 (11 位作者)

核心贡献: 提出了首个用于标记化和自回归生成矢量动画的框架,并构建了迄今为止最大、最多样化的矢量动画数据集LottieAnimation-660K。
方法: 采用广泛应用的基于JSON的动画标准Lottie,设计了一个定制的Lottie标记器,将分层的几何图元、变换和基于关键帧的运动编码为紧凑且语义对齐的标记序列。在此基础上,对Qwen-VL进行微调,创建了能够直接从自然语言或视觉提示生成连贯、可编辑矢量动画的多模态模型LottieGPT。
关键发现: 实验表明,该标记器显著减少了序列长度,同时保持了结构保真度,从而实现了对动态矢量内容有效的自回归学习。LottieGPT在多样动画风格上表现出强大的泛化能力,并在SVG生成(单帧矢量动画的特例)上优于先前的最先进模型。
查看原文摘要

Despite rapid progress in video generation, existing models are incapable of producing vector animation, a dominant and highly expressive form of multimedia on the Internet. Vector animations offer resolution-independence, compactness, semantic structure, and editable parametric motion representations, yet current generative models operate exclusively in raster space and thus cannot synthesize them. Meanwhile, recent advances in large multimodal models demonstrate strong capabilities in generating structured data such as slides, 3D meshes, LEGO sequences, and indoor layouts, suggesting that native vector animation generation may be achievable. In this work, we present the first framework for tokenizing and autoregressively generating vector animations. We adopt Lottie, a widely deployed JSON-based animation standard, and design a tailored Lottie Tokenizer that encodes layered geometric primitives, transforms, and keyframe-based motion into a compact and semantically aligned token sequence. To support large-scale training, we also construct LottieAnimation-660K, the largest and most diverse vector animation dataset to date, consisting of 660k real-world Lottie animation and 15M static Lottie image files curated from broad Internet sources. Building upon these components, we finetune Qwen-VL to create LottieGPT, a native multimodal model capable of generating coherent, editable vector animations directly from natural language or visual prompts. Experiments show that our tokenizer dramatically reduces sequence length while preserving structural fidelity, enabling effective autoregressive learning of dynamic vector content. LottieGPT exhibits strong generalization across diverse animation styles and outperforms previous state-of-the-art models on SVG generation (a special case of single-frame vector animation).

计算机视觉 2604.10940
相关性 85/100

AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling

AmodalSVG:通过语义层剥离实现非模态图像矢量化

Juncheng Hu, Ziteng Xue, Guotao Liang, Anran Qi, Buyu Li 等 (8 位作者)

核心贡献: 提出AmodalSVG框架,首次实现从自然图像生成语义组织化且几何完整的SVG表示,支持矢量域的直接对象级编辑。
方法: 1. 采用两阶段框架:先在栅格域完成语义解耦与补全,生成非模态语义层;再独立矢量化各层。2. 第一阶段提出VLM引导的语义层剥离(SLP)策略,通过混合修复恢复被遮挡对象的完整外观。3. 第二阶段提出自适应分层矢量化(ALV),通过误差预算驱动机制动态调整图元数量。
关键发现: 1. 视觉保真度显著优于现有方法。2. 生成的非模态层支持矢量域直接对象级编辑(现有方法不具备此能力)。3. 实验验证了语义解耦与几何补全的有效性。
查看原文摘要

We introduce AmodalSVG, a new framework for amodal image vectorization that produces semantically organized and geometrically complete SVG representations from natural images. Existing vectorization methods operate under a modal paradigm: tracing only visible pixels and disregarding occlusion. Consequently, the resulting SVGs are semantically entangled and geometrically incomplete, limiting SVG's structural editability. In contrast, AmodalSVG reconstructs full object geometries, including occluded regions, into independent, editable vector layers. To achieve this, AmodalSVG reformulates image vectorization as a two-stage framework, performing semantic decoupling and completion in the raster domain to produce amodally complete semantic layers, which are then independently vectorized. In the first stage, we introduce Semantic Layer Peeling (SLP), a VLM-guided strategy that progressively decomposes an image into semantically coherent layers. By hybrid inpainting, SLP recovers complete object appearances under occlusions, enabling explicit semantic decoupling. To vectorize these layers efficiently, we propose Adaptive Layered Vectorization (ALV), which dynamically modulates the primitive budget via an error-budget-driven adjustment mechanism. Extensive experiments demonstrate that AmodalSVG significantly outperforms prior methods in visual fidelity. Moreover, the resulting amodal layers enable object-level editing directly in the vector domain, capabilities not supported by existing vectorization approaches. Code will be released upon acceptance.

计算机视觉 2604.10675
相关性 85/100

HiddenObjects: Scalable Diffusion-Distilled Spatial Priors for Object Placement

HiddenObjects:基于扩散蒸馏的物体放置空间先验的规模化方法

Marco Schouten, Ioannis Siglidis, Serge Belongie, Dim P. Papadopoulos

核心贡献: 提出了一种通过蒸馏文本条件扩散模型中隐含的物体放置知识,学习显式、类别条件的空间先验的方法,并构建了包含2700万标注的大规模数据集HiddenObjects。
方法: 1. 利用基于扩散的图像修复管道,在高质量真实背景上自动化评估密集物体放置。2. 构建了包含27K不同场景和27M标注的大规模数据集,提供不同图像和物体类别的排序边界框插入。3. 将空间先验蒸馏为轻量级模型以实现快速推理。
关键发现: 1. 空间先验在下游图像编辑任务中优于稀疏人工标注(VLM-Judge评分3.90 vs. 2.68)。2. 显著超越现有物体放置基线方法和零样本视觉语言模型。3. 蒸馏后的轻量级模型推理速度提升23万倍。
查看原文摘要

We propose a method to learn explicit, class-conditioned spatial priors for object placement in natural scenes by distilling the implicit placement knowledge encoded in text-conditioned diffusion models. Prior work relies either on manually annotated data, which is inherently limited in scale, or on inpainting-based object-removal pipelines, whose artifacts promote shortcut learning. To address these limitations, we introduce a fully automated and scalable framework that evaluates dense object placements on high-quality real backgrounds using a diffusion-based inpainting pipeline. With this pipeline, we construct HiddenObjects, a large-scale dataset comprising 27M placement annotations, evaluated across 27k distinct scenes, with ranked bounding box insertions for different images and object categories. Experimental results show that our spatial priors outperform sparse human annotations on a downstream image editing task (3.90 vs. 2.68 VLM-Judge), and significantly surpass existing placement baselines and zero-shot Vision-Language Models for object placement. Furthermore, we distill these priors into a lightweight model for fast practical inference (230,000x faster).

计算机视觉 2604.10268
相关性 85/100

EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model

EditCrafter:基于预训练扩散模型的无调优高分辨率图像编辑

Kunho Kim, Sumin Seo, Yongjun Cho, Hyungjin Chung

核心贡献: 提出EditCrafter,一种无需调优的高分辨率图像编辑方法,利用预训练文本到图像扩散模型处理远超训练分辨率(如512x512或1024x1024)的图像,解决了现有方法在任意长宽比或高分辨率图像上的应用限制。
方法: 1. 通过分块反演(tiled inversion)保留输入高分辨率图像的原始特征;2. 提出噪声阻尼流形约束分类器无引导(NDCFG++),专门针对反演潜在空间的高分辨率编辑优化;3. 整个流程无需微调或优化,直接利用预训练模型实现编辑。
关键发现: 实验表明,EditCrafter能在多种分辨率下实现高质量编辑,避免传统分块编辑导致的结构失真或重复问题,且无需针对特定图像进行调优。
查看原文摘要

We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training. Leveraging the generative priors of large-scale T2I diffusion models enables the development of a wide array of novel generation and editing applications. Although numerous image editing methods have been proposed based on diffusion models and exhibit high-quality editing results, they are difficult to apply to images with arbitrary aspect ratios or higher resolutions since they only work at the training resolutions (512x512 or 1024x1024). Naively applying patch-wise editing fails with unrealistic object structures and repetition. To address these challenges, we introduce EditCrafter, a simple yet effective editing pipeline. EditCrafter operates by first performing tiled inversion, which preserves the original identity of the input high-resolution image. We further propose a noise-damped manifold-constrained classifier-free guidance (NDCFG++) that is tailored for high resolution image editing from the inverted latent. Our experiments show that the our EditCrafter can achieve impressive editing results across various resolutions without fine-tuning and optimization.

🔥 Hacker News
HN ▲ 215  💬 41
推荐度 90/100

Introspective Diffusion Language Models

by zagwdt

该帖子介绍了自省扩散语言模型,结合扩散模型与语言模型以提升文本生成质量,看点在于其创新方法及潜在应用。
HN ▲ 114  💬 40
推荐度 90/100

Rust Threads on the GPU

by PaulHoule

该帖子介绍了如何在GPU上使用Rust线程进行并行计算,展示了其性能优势和实现方法。
HN ▲ 313  💬 199
推荐度 85/100

Claude Code Routines

by matthieu_bl

该帖子介绍了Claude Code的Routines功能,展示了如何通过自动化流程提升编码效率的实用方法。
HN ▲ 69  💬 36
推荐度 85/100

Turn your best AI prompts into one-click tools in Chrome

by xnx

这篇帖子介绍了Chrome浏览器的新功能,允许用户将常用AI提示词转化为一键工具,提升使用效率。
HN ▲ 102  💬 54
推荐度 85/100

Multi-Agentic Software Development Is a Distributed Systems Problem

by tie-in

该帖子探讨了多智能体软件开发如何类似于分布式系统问题,并分析了在LLM(大型语言模型)环境下处理日志和协调的挑战与解决方案。
HN ▲ 37  💬 18
推荐度 85/100

Show HN: Kelet – Root Cause Analysis agent for your LLM apps

by almogbaku

开发者推出Kelet工具,用于自动分析LLM应用故障根因,通过整合用户反馈等信号智能定位错误模式,解决AI代理隐性失效难题。
HN ▲ 26  💬 3
推荐度 85/100

ClawRun – Deploy and manage AI agents in seconds

by afshinmeh

ClawRun 是一个快速部署和管理 AI 代理的工具,能在几秒内完成操作,简化了 AI 代理的使用流程。
HN ▲ 54  💬 25
推荐度 85/100

Lumina – a statically typed web-native language for JavaScript and WASM

by light_ideas

Lumina 是一种静态类型的 Web 原生语言,可编译为 JavaScript 和 WASM,旨在提升开发效率和性能。
🐙 GitHub Trending
Python ⭐ 84,039  +8282 today
推荐度 70/100

NousResearch/hermes-agent

Star NousResearch / hermes-agent The agent that grows with you

Hermes-Agent 是一个能随用户需求不断进化的智能代理项目,值得关注因其具备自适应学习能力,可动态优化任务处理效率。
TypeScript ⭐ 17,266  +1165 today
推荐度 60/100

jamiepine/voicebox

Star jamiepine / voicebox The open-source voice synthesis studio

Voicebox 是一个开源的语音合成工作室,值得关注因为它提供了灵活的工具和高质量的语音生成能力。
Jupyter Notebook ⭐ 40,199  +922 today
推荐度 60/100

anthropics/claude-cookbooks

Star anthropics / claude-cookbooks A collection of notebooks/recipes showcasing some fun and effective ways of using Claude.

该项目是Anthropic公司提供的Claude AI使用示例合集,通过Jupyter Notebook展示有趣且高效的Claude应用方法,值得关注因为它能帮助开发者快速掌握Claude的实用技巧。
Python ⭐ 108,361  +1672 today
推荐度 50/100

microsoft/markitdown

Star microsoft / markitdown Python tool for converting files and office documents to Markdown.

Microsoft的markitdown是一个Python工具,用于将文件和办公文档转换为Markdown格式,特别适合需要简化文档处理的用户。
TypeScript ⭐ 11,541  +769 today
推荐度 40/100

pascalorg/editor

Star pascalorg / editor Create and share 3D architectural projects.

该项目是一个基于TypeScript的3D建筑项目编辑器,支持创建和分享3D建筑设计方案,值得关注因其为建筑设计师提供了便捷的在线协作与可视化工具。
Python ⭐ 54,083  +1007 today
推荐度 40/100

virattt/ai-hedge-fund

Star virattt / ai-hedge-fund An AI Hedge Fund Team

这是一个用Python实现的AI对冲基金项目,通过人工智能技术进行投资决策和交易,值得关注因为它展示了AI在金融领域的创新应用。
HTML ⭐ 43,656  +2569 today
推荐度 40/100

shanraisshan/claude-code-best-practice

Star shanraisshan / claude-code-best-practice from vibe coding to agentic engineering - practice makes claude perfect

该项目收集了使用Claude AI进行编程的最佳实践案例,从基础编码到智能体工程,值得关注因为它展示了如何通过实践提升AI辅助编程的效率和质量。
TypeScript ⭐ 55,622  +2979 today
推荐度 30/100

sponsors/thedotmack

Sponsor Star thedotmack / claude-mem A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.

该项目是一个Claude代码插件,能自动记录编码会话中的操作,通过AI压缩内容并注入后续会话提供上下文,提升开发效率。