A study on “honesty pledges” became famous. Its data was fake
- A 2012 study, co-authored by Dan Ariely, suggested that placing honesty pledges at the beginning of forms makes people more honest when filling them out.
- However, an investigation from Data Colada showed that data within that study was almost certainly fabricated.
- Big Think covered the 2012 study shortly after it was published. We are now correcting the record.
There’s a very simple method people use to save money on things like insurance premiums and taxes: They lie on forms. In 2012, a paper published in PNAS offered a cheap and easy way for organizations to discourage that sort of thing: The results showed that you can get people to be more truthful by having them sign an honesty pledge — the kind that says something like, “I confirm that the information I’m reporting is accurate” — at the beginning of the form rather than at the end.
That 2012 paper has since been cited more than 500 times. Big Think published a video interview with Dan Ariely — a co-author of the 2012 paper and a high-profile name in the field of honesty research — in which he discussed the data. The paper was also mentioned in a 2016 report from the Obama administration’s Social and Behavioral Sciences Team as a way to incentivize honesty on tax forms.
But then came questions about the data reported in the study.
Honesty research is dishonest
Subsequent studies on “sign at the beginning” honesty pledges found no evidence that they increase honesty. That held true in a 2020 follow-up paper to the original study — submitted by the five authors of the 2012 paper, along with two other researchers — that failed to replicate the findings. “The current paper updates the scientific record by showing that signing at the beginning is unlikely to be a simple solution for increasing honest reporting,” the authors wrote.
So, what happened in the 2012 paper? Some of the data was fabricated “beyond any shadow of a doubt,” according to an investigative article published in 2021 by Data Colada. The article detailed how evidence of fraud was uncovered by a team of anonymous researchers who analyzed data from the paper and found several anomalies.
The most glaring was an implausible data set. The 2012 paper involved a car insurance company that asked more than 13,000 policyholders to self-report the mileage of their cars on a form. Some of the forms asked policyholders to sign a pledge: “I promise that the information I am providing is true.” Half of these forms listed the pledge at the beginning of the document; the other half at the end. Because reporting more miles would result in a more expensive premium, the idea was that the insurance company could prime people to be honest by having them sign the pledge before reporting their mileage.
It seemed to work. The results showed that policyholders who signed the pledge at the beginning reported more miles than the group who signed at the bottom of the form and also a control group that did not sign any pledge. It was possible to determine this effect because all the policyholders had provided self-reported mileage data at any earlier date, giving the researchers a baseline to compare changes in mileage trends.
“Impossibly similar” data
But the baseline data looked bizarre. As Data Colada noted, if you were to examine any data set showing the number of miles that thousands of people drove over a given timeframe, you would expect to see a normal distribution (bell curve). In other words, the data should show that a small proportion of people drove a little bit, most drove a moderate amount, and a small group drove a lot. That’s not what the data showed. Rather, the distribution was unrealistically uniform, showing that roughly the same number of people drove 5,000 miles as did 10,000, 20,000, or 30,000 miles.
The Data Colada report also found evidence of data duplication. The 2012 paper included a set of thousands of odometer readings that were remarkably similar to another set; these seem to have been duplicated and then modified with random numbers between 1 and 1,000 to cover up the fabrication. To determine the likelihood that these sets would be so similar, Data Colada ran one million simulations in an attempt to replicate the similarity. “Under the most generous assumptions imaginable, it didn’t happen once,” the article stated. “These data are not just excessively similar. They are impossibly similar.”
In 2021, PNAS retracted the 2012 paper due to these anomalies. The authors — Lisa Shu, Nina Mazar, Francesca Gino, Dan Ariely, and Max Bazerman — had asked for a retraction, acknowledging that data included in the paper had been fabricated. All authors denied faking the data.
But what is clear from the Data Colada report is that Ariely was the person who created the spreadsheet that contained the fabricated data. He was also the last to modify it. In response to the Data Colada report, Ariely wrote: “The work was conducted over ten years ago by an insurance company with whom I partnered on this study. The data were collected, entered, merged, and anonymized by the company and then sent to me. This was the data file that was used for the analysis and then shared publicly.” (You can read his complete response here.)
Ariely, author of the 2012 book The (Honest) Truth About Dishonesty: How We Lie to Everyone — Especially Ourselves, said he was the only study author who was in contact with the insurance company, the name of which he has not made public. It is possible that the insurance company fabricated the data, but why it would do such a thing is unclear. A more plausible motive is that a researcher would fabricate data in order to support their hypothesis and, by extension, their career.
In June 2023, Data Colada published a series of articles containing evidence of alleged academic fraud committed by Harvard Business School Professor Francesca Gino, a co-author of the 2012 paper. One article in the series strongly suggests that data in a separate study in the 2012 paper was fabricated by Gino, stating: “Gino, who was a professor at UNC prior to joining Harvard in 2010, was the only author involved in the data collection and analysis” of the study with the allegedly fabricated data.
“That’s right,” Data Colada wrote, “Two different people independently faked data for two different studies in a paper about dishonesty.” Harvard placed Gino on academic leave in June.
Other accusations of misconduct
The 2012 incident is not the only time Ariely’s professional conduct has been questioned. In 2021, an expression of concern was attached to a study Ariely co-authored in 2004. The letter described statistical errors in the study that, if corrected, would “substantively” alter the conclusions drawn from the original research. Ariely said he was unable to locate relevant data that might have cleared up the matter. (To be sure, more than a decade had passed since he had conducted the study in question.)
In 2018, researchers failed to replicate a widely cited study that Ariely co-authored in 2008. An Israeli investigative show called The Source contacted the researcher — Aimee Drolet Rossi, a professor at UCLA — whom Ariely said had run the experiment for him at UCLA. In email exchanges posted online, an account bearing Rossi’s name said she had no memory of conducting the study for Ariely, and that it could not have been conducted the way he had described.
Big Think, which emailed Ariely for comment but has not heard back, has published more than a dozen articles and videos that either feature Ariely or mention studies that list him as an author. Of all these pieces, only a small portion references research that is definitely or likely compromised, including this 2012 article and this 2012 video.