Correct by Construction

Motivation: It’s hard to use LLMs to generate datasets where certain properties of the answer is hard to be verified. (e.g. A generated API call from natural language can be easily tested to see if it is valid, but it is hard to know if it’s semantically doing what we want)

  • Start with correct API calls

  • Ask LLM to generate a description for it

  • Start with the correct answer , and create a generated response . This allows you to create a (, ,) pair that’s guaranteed to be correct

    • Assumption here is that it is easy / there many correct answers for a given answer
  • Tips:

    • Set temperature lower when you are generating a correct response , otherwise it may hallucinate and include irrelevant info in the response
    • You can set that higher if you want a negative response

Resource Overboard

Can be used in tandem with Correct by Construction.

  • Use better model to generate examples of desired outputs
  • Use cheaper model sfor evaluations