The Reflection 70B model held huge promise for AI but now its creators are accused of fraud — here's what went wrong

Adobe Firefly AI image of a Llama looking in a mirror
(Image credit: Adobe Firefly AI image/Future)

The creators of Reflection 70B, a tuned-up version of Meta Llama 70B that was recently touted as the world’s top open-source AI model, have just opened up after being accused of fraud. 

Based on independent tests run by Artificial Analysis, the model fails to deliver on the promises made by Matt Shumer, CEO of OthersideAI and HypeWrite, the company behind Reflection 70B. Shumer, who initially attributed the discrepancies to an issue with the model’s upload process, has since admitted that he may have gotten ahead of himself in the claims he had made.

But critics in the AI research community have gone as far as accusing Shumer of fraud, stating that the model is just a thin wrapper based on Anthropic’s Claude, rather than a tuned-up version of Meta Llama

Discrepancies emerge after third-party evaluation

Developed by New York startup HyperWrite AI, Reflection 70B was touted as "the world's top open-source model" by Matt Shumer, the company’s CEO. 

Yet on September 7, a day after Shumer’s announcement on X, Artificial Analysis reported that their evaluation of Reflection 70B yielded results significantly lower than Shumer's claims. Shumer attributed these to an upload error affecting the model's weights, which caused a discrepancy between Shumer’s private API and the weights uploaded to Hugging Face’s model repository.

However, further analysis by the AI community on platforms like Reddit and Github suggested that Reflection 70B’s performance mirrors closer to Meta Llama 3 rather than Llama 3.1, as claimed by Shumer. Suspicions were raised further when it was found that Shumer had an undisclosed vested interest in Glaive, the platform he claimed was used to generate the model's synthetic training data. 

Some went on to suggest that Reflection 70B was merely a "wrapper" built on top of Anthropic's proprietary AI model, Claude 3. On September 8, X user Shin Megami Boson publicly accused Matt Shumer of “fraud in the AI research community.”

HypeWrite breaks silence following fraud accusations

After initially going silent as the controversy erupted, Shumer issued a public response through X on September 10, acknowledging the skepticism around the model’s performance. He claimed a team was working to understand what went wrong and promised transparency once they had the facts.

However, Shumer did not provide a clear explanation for the performance discrepancies. Sahil Chaudhary, founder of Glaive, the platform Shumer said was used to train Reflection 70B, also admitted uncertainty about the model's capabilities and that the touted benchmark scores had not been reproducible.

Critics have remained unsatisfied with Shumer's response so far. "Shumer's explanations and apologies have failed to provide a satisfactory explanation for the discrepancies," reported analytics firm GlobalVillageSpace. Yuchen Jin, co-founder of Hyperbolic Labs, expressed disappointment in the lack of transparency and called for more thorough explanations from Shumer. 

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 85 deals
Filters
Arrow
Load more deals
Ritoban Mukherjee

Ritoban Mukherjee is a freelance journalist from West Bengal, India whose work on cloud storage, web hosting, and a range of other topics has been published on Tom's Guide, TechRadar, Creative Bloq, IT Pro, Gizmodo, Medium, and Mental Floss.