Correct by Construction
Motivation: It’s hard to use LLMs to generate datasets where certain properties of the answer is hard to be verified. (e.g. A generated API call from natural language can be easily tested to see if it is valid, but it is hard to know if it’s semantically doing what we want)
-
Start with correct API calls
-
Ask LLM to generate a description for it
-
Start with the correct answer , and create a generated response . This allows you to create a (, ,) pair that’s guaranteed to be correct
- Assumption here is that it is easy / there many correct answers for a given answer
-
Tips:
- Set temperature lower when you are generating a correct response , otherwise it may hallucinate and include irrelevant info in the response
- You can set that higher if you want a negative response
Resource Overboard
Can be used in tandem with Correct by Construction.
- Use better model to generate examples of desired outputs
- Use cheaper model sfor evaluations