Dataset to evaluate LLMs on code generation. The tasks here are algorithm-oriented.

Some argue that the tasks here are too simple to adequately represent real-world code generation tasks that utilizes extensive, external libraries.

There’s also the concern of contamination1 and overfitting, as LLMs are trained on the dataset itself.

Footnotes

  1. Model is trained on given data.