Open-source maintainers now review more pull requests from AI agents than humans. In 2026, automated coding tools have matured to the point where they can generate functional code from brief specifications. Jensen Huang of NVIDIA recently noted the jump from 30 million to one billion potential software creators worldwide. But this revolution creates new challenges for projects like transformers, which has over one billion downloads and thousands of contributors.
Most AI-generated pull requests fail to understand the implicit design contracts in transformers. The library prioritizes human readability with flat hierarchies and top-to-bottom model file structure. Agents often suggest refactors that break these conventions, introducing subtle bugs or performance issues. Maintainers must still manually review every submission, despite a tenfold increase in volume. The problem extends beyond AI: app store reviewers also face a surge in submissions from new developers enabled by these tools.
The MLX community built a tool to address this gap. Their new Skill automates porting models from transformers to MLX, but it includes safeguards to match human standards. Instead of creating code from scratch, the tool uses transformers as the source of truth. It sets up virtual environments, downloads relevant models, and runs comparative tests between frameworks. Only when results align with expectations does it finalize the pull request.
For contributors, the Skill handles technical tasks like detecting RoPE configurations or inferring data types from safetensors metadata. For reviewers, it generates additional artifacts including generation examples and numerical comparisons. The goal is to produce pull requests that could have come from careful human contributors, not just faster AI outputs. When a new model lands in transformers, it should become available in MLX within hours, not weeks.
The team emphasizes this is an aide for contributors and reviewers, not automation. Maintainers still make final decisions about design direction and potential side effects. The tool reduces busywork but preserves the human judgment that keeps complex libraries stable.
Source: huggingface.co