On-Policy Learning vs Naive Scaling¶