Using-llamacpp-grammar-to-Constrain-LLMs-output-format
Due to the limitation in training dataset and model parameter scale, most the open source LLMs such as Llama2, Mistral, Baichuan2, etc., are unable to support features like function call
or output specified format as effectively as GPT-4.
Based on the modifications to the sampling rules ultilizing token probability predicted by LLMs, we can solve this perfectly. For example, when numerical outputs are required, the sample probabilities of non-numerical are suppressed, achieving the desired output. This can be a transitional approach in response to the insufficient capabilities of open source LLMs. Here are some project related to this purpose:
- llamacpp grammar
- guidance
- localai
- functionary
- https://github.com/1rgs/jsonformer
- lm-format-enforcer
- outlines
🚀 Using llama.cpp grammar to enforce a specified format for LLMs output
Llamacpp grammar
is based on a specific set of syntax, but we don’t need to learn it, there is a simple way to do this 😆.
- Define the data structure with
TypeScript
- Goto intrinsiclabs and translate
TypeScript
toLlamacpp grammar
For example, if our desired output format like this:
{
"game_state": string, # "game over" 或 "game on progress",
"message": string,
"active_player": string,
}
First, define data struction with TypeScript
:
interface DM {
game_state: GameState;
active_player: string;
message: string;
}
enum GameState {
GameOver = "game over",
GameOnProgress = "game on progress",
}
Then, translate it to llamacpp grammar
:
root ::= DM
GameState ::= "\"game over\"" | "\"game on progress\""
DM ::= "{" ws "\"game_state\":" ws GameState "," ws "\"active_player\":" ws string "," ws "\"message\":" ws string "}"
DMlist ::= "[]" | "[" ws DM ("," ws DM)* "]"
string ::= "\"" ([^"]*) "\""
boolean ::= "true" | "false"
ws ::= [ \t\n]*
number ::= [0-9]+ "."? [0-9]*
stringlist ::= "[" ws "]" | "[" ws string ("," ws string)* ws "]"
numberlist ::= "[" ws "]" | "[" ws string ("," ws number)* ws "]"
Finally, pass the grammar into llama.cpp model and the model output will meet our definition perfectly.
code example
import json
from llama_cpp import Llama, LlamaGrammar
model = Llama("/data/hf/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf")
grammar = LlamaGrammar.from_file("test_grammar") # save the grammar string into a file
prompt = """Player1 and Player2 are playing a game named happy ending, following is the conversation between the two players:
Player1: hi
Player2: hello, how do you do recently?
Now reponse the game state, action player and message with Json format. Message involves the environment description string."""
res = model.create_completion(prompt, max_tokens=1000, grammar=grammar)
print(json.loads(res["choices"][0]['text']))
The output is : {'game_state': 'game over', 'active_player': 'Player2', 'message': 'This is a happy ending after Player1 says hi and Player2 replies hello.'}
which match the format perfectly.
😆 A more prefered way for Python!
Instead of writing TypeScript
code, we can use Pydantic
:
from pydantic import BaseModel
from enum import Enum
class GameState(str, Enum):
game_over = "game over"
game_active = "game on progress"
class DM(BaseModel):
game_state : GameState
message: str
active_player: str
grammer = LlamaGrammar.from_json_schema(DM.schema_json())
res = model.create_completion(prompt, max_tokens=1000, grammar=grammar)
print(json.loads(res["choices"][0]['text']))
The output is : {'game_state': 'game on progress', 'active_player': 'Player1', 'message': "Player1 said 'hi', Player2 replied saying 'hello'"}
, Great!