From 3efa5bbbfd5868695da4d5d9ad23d81f48f1e5a8 Mon Sep 17 00:00:00 2001
From: Nick Hill <nickhill@us.ibm.com>
Date: Fri, 30 Dec 2022 10:31:44 -0800
Subject: [PATCH] fix(router): Include special tokens when tokenizing (#14)

There's currently a discrepancy in the tokenization between the router
and python server code. The latter includes special tokens but former
does not.

This results in a token count mismatch for seq2seq models such as mt0
where the tokenizer emits an EOS token at the end.

This in turn results in some unexpected/incorrect output, in particular
when batch concatenation is involved, because the python code uses the
input length passed from the router for each row.

As far as I can tell, it is better to include this token in the encoder
`input_ids`, so I guess it's best to just adjust on the router side.
---
 router/src/validation.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/router/src/validation.rs b/router/src/validation.rs
index ff659b3..f6da191 100644
--- a/router/src/validation.rs
+++ b/router/src/validation.rs
@@ -131,7 +131,7 @@ fn validation_worker(
         }
 
         // Get the number of tokens in the input
-        match tokenizer.encode(request.inputs.clone(), false) {
+        match tokenizer.encode(request.inputs.clone(), true) {
             Ok(inputs) => {
                 let input_length = inputs.len();