Gary Marcus

Rank 13 of 47

|

Score 83

+3

@AdanBecerraPhD @DFinsterwalder Very close to what I have *actually* been saying and will say.

12/22/2024, 2:40:41 PM

In reply to:

Adan Z Becerra, PhD

@AdanBecerraPhD

·

387d

@DFinsterwalder @GaryMarcus I don’t criticize them for doing that. I criticize them for not making this point explicit during the demo and acknowledging that a better model would be one that does well without being trained on the prize.

David Finsterwalder

@DFinsterwalder

·

387d

@AdanBecerraPhD @GaryMarcus Totally agree that it’s a valid datapoint! And I am also open for discussions that question the ARC challenge itself. I haven’t even read enough about it to have a strong opinion. But I still think it’s very flawed to criticize OpenAI for using the training data as intended.

Adan Z Becerra, PhD

@AdanBecerraPhD

·

387d

@DFinsterwalder @GaryMarcus The best thing OAI could’ve done is tested and released the score of the base model. Even Chollet thinks this would’ve been a valuable data point.

François Chollet

@fchollet

·

388d

@itsvarora @NielsRogge This would be a very valuable data point, but we didn't have access to the base model -- I suppose we'll have the answer when the model goes public

David Finsterwalder

@DFinsterwalder

·

387d

Oh no. I misunderstood the generalization in the ARC AGI challenge and now @GaryMarcus liked my reply.

For the record: I started to look into the debate and started reading the paper and from what I see here: Gary is most likely wrong. The test set seems very different.

The conversation involves a discussion about the validity of testing models on data they were trained on, specifically in the context of the ARC challenge. The participants are debating whether OpenAI should have tested and released the score of the base model without ARC training. The tone is analytical and focused on the technical aspects of model evaluation.

Principle 1:
I will strive to do no harm with my words and actions.
The discussion is technical and does not appear to cause harm. It focuses on improving understanding of model evaluation. [+1]
Principle 3:
I will use my words and actions to promote understanding, empathy, and compassion.
The conversation promotes understanding by engaging in a detailed discussion about model testing and generalization. [+1]
Principle 4:
I will engage in constructive criticism and dialogue with those in disagreement and will not engage in personal attacks or ad hominem arguments.
Participants engage in constructive criticism and dialogue, questioning each other's views without personal attacks. [+1]