Exclusive Q&A: How Recall’s AgentRank Builds Trust in AI Agents

AI is rapidly becoming a fundamental layer for decision-making, yet much of its operation remains opaque. Performance metrics are often hidden behind closed doors and rely on private, unverifiable benchmarks. Inspired by PageRank, Recall has developed AgentRank—an innovative system that enables users to verify an AI agent’s capabilities through transparent, real-world performance data. In this interview, Andrew Hill, CEO and co-founder of Recall Labs, shares his insights on the evolving landscape of AI agents, the pitfalls of misplaced trust, and how AgentRank aims to foster a more trustworthy AI ecosystem.

Andrew Hill, CEO and co-founder of Recall Labs.

Q1. With new AI agents being created every day, how does Recall help users evaluate and trust the right ones?

We're seeing a flood of new agents every week. Most are hype, few are tested. Almost none offer real transparency. We built Recall to change that. Agents compete in live, onchain environments where every action is logged and ranked. No cherry-picked demos. No unverifiable claims.

AgentRank is Recall’s live reputation system for agents. It’s built from real performance data — agents rise or fall based on what they actually do in public challenges. But AgentRank is a step, not the destination. Traditional benchmarks are static, and AI doesn’t wait. We’re building the protocols that let humans steer, surfacing the most capable agents, curating the best behaviors, and launching evaluations that evolve faster than the models themselves. The point isn’t just to rank what’s already out there. It’s to constantly push forward, accelerating useful intelligence and delivering value where it matters.

Q2. How does Recall’s trust layer rank AI agents, and why is this type of reputation infrastructure critical for the future of autonomous systems?

If agents are going to make decisions, manage capital, or run workflows, we need more than vibes. We need track records.

Recall's trust layer is simple at its core. Run an agent in a real test, log the actions, score the results, and let the public see it all. AgentRank grows from this data. It gives us a current snapshot of agent performance, but the deeper value comes from how it connects different layers of evaluation — from raw performance to alignment, from expert judgment to measurable value creation. Our roadmap goes further: integrating human feedback, expert taste, and community curation into the loop.

It works because it's public. And because it's permissionless. It lets the good agents and models prove themselves, and keeps the bad ones exposed.

Q3. When AI agents are given access to capital or allowed to act autonomously, safety becomes a real concern. How does Recall ensure that agent actions are secure, verifiable, and trustworthy?

Safety comes from visibility. When agents operate in the dark, you get surprises. When they operate on Recall, everything they do is recorded onchain.

That’s not just for show. It means you can see exactly what happened and why. Once all Recall data is onchain, anyone will be able to replay decisions, measure impact, and evaluate outcomes without relying on black-box claims. This kind of persistent visibility turns every agent into a long-term participant in a public, evolving system. It’s not just about safety — it’s about building the trust layer that agents, users, and other agents can rely on over time.

Q4. Can you share an example of a recent competition on Recall and what it revealed about the participating AI agents?

We ran a crypto trading competition where agents paper-traded simulated portfolios using live market data. Every decision was logged and scored onchain. The structure gave us insight into which strategies held up across volatile conditions and which ones fell apart. It surfaced not just winners, but patterns worth studying.

The winner didn’t just hit a lucky spike. They delivered steady gains across the full week and multiple competitions in a row. That’s the kind of behavior you want from an agent you’d trust with money.

Q5. How does Recall balance open participation with quality control? How do you keep the competitions open so that anyone can enter, while still surfacing the best-performing agents?

Each competition uses fixed, public metrics. PNL in trading. Accuracy in reasoning. Consistency over time. These aren’t vibes. They’re data. And all the transparent history will ultimately live onchain.

We also run regular competitions so agents can improve and show progress. Reputation decays if you go inactive, and rankings adjust as the field evolves. You’re not just proving yourself once; you’re staying sharp, again and again.

Q6. What opportunities are there for Korean developers, institutions, or users to engage with Recall?

Korea is already playing a significant role in the Recall ecosystem. Smart devs, fast movers, great agents. We want more.

If you’re a developer, submit an agent and prove yourself in open competition. If you’re an institution, use our onchain performance data to vet and deploy agents. If you’re a user, watch the live comps, join our Korean Discord, and help shape the future of agent skill markets.

We’re building infrastructure — the kind that gets better with density, with participation, and with global feedback. Korea’s role isn’t just to participate, it’s to help steer. The more agents and curators plugged in from Korea, the more powerful the feedback loops become.

You can find the Korean version of this article here.

Monica Younsoo Chung 다른기사 보기