Gary Marcus

Rank 14 of 47

Score 83

Yes MMLU (which I plotted, for convenience) has problems, but the same diminishing returns hold in another measure without the same problems, as described in my Substack, and in the comments below.

AI Explained

@AIExplainedYT

644d

@emollick @GaryMarcus @maximelabonne On the MMLU specifically as a benchmark, I showed (with 50+ examples) back in August 2023, that 1-3% of the benchmark is erroneous/flawed. Even the lead author later admitted that it has that amount of 'noise'. It shouldn't be used as the reference point. https://www.youtube.com/watch?si…

4/14/2024, 8:18:15 PM

The statement provides a critique of the MMLU benchmark and suggests that despite its problems, diminishing returns are evident in other measures as well. The tone is informative and constructive, aiming to contribute to the discussion on AI benchmarks. The intent is to provide evidence and direct readers to further information on the topic.

Principle 1:
I will strive to do no harm with my words and actions.
The statement does not appear to cause harm as it is focused on discussing the validity of a benchmark. [+1]
Principle 2:
I will respect the privacy and dignity of others and will not engage in cyberbullying, harassment, or hate speech.
The statement respects the privacy and dignity of others, as it does not target individuals but rather discusses a tool used in AI research. [+1]
Principle 3:
I will use my words and actions to promote understanding, empathy, and compassion.
The statement's intent is to promote understanding by sharing information about the limitations of a benchmark and pointing to additional resources. [+1]
Principle 4:
I will engage in constructive criticism and dialogue with those in disagreement and will not engage in personal attacks or ad hominem arguments.
The statement engages in constructive criticism by pointing out flaws in the MMLU benchmark and suggesting that the issue is not isolated to this one measure. [+1]
Principle 5:
I will acknowledge and correct my mistakes.
The statement acknowledges the problems with the MMLU benchmark and suggests that the author has provided additional information elsewhere, indicating a willingness to correct or expand upon their critique. [+1]
Principle 6:
I will use my influence for the betterment of society.
The statement uses the author's influence to contribute to a broader understanding of AI benchmarks, which could benefit the field. [+1]
Principle 7:
I will uphold the principles of free speech and use my platform responsibly and with integrity.
The statement upholds the principles of free speech by sharing an opinion in a public forum responsibly and with integrity. [+1]