* Fixing odd tokenization self modifications on the Rust side (load and
resave in Python).
* Fixing the builds ?
* Fix the gh action?
* Fixing the location ?
* Validation is odd.
* Try a faster runner
* Upgrade python version.
* Remove sccache
* No sccache.
* Getting libpython maybe ?
* List stuff.
* Monkey it up.
* have no idea at this point
* Tmp.
* Shot in the dark.
* Tmate the hell out of this.
* Desperation.
* WTF.
* -y.
* Apparently 3.10 is not available anymore.
* Updating the dockerfile to make libpython discoverable at runtime too.
* Put back rust tests.
* Why do we want mkl on AMD ?
* Forcing 3.11 ?
* Refactor dead code.
* First working step.
* Remove a lot of duplicated code.
* More dead code.
* More cleanup.
* Fix Santacoder test.
* Fixing the simple tests.
* Fixing sharding.
* Fixes for VLM.
* Fixing santacoder (num_kv_heads hardcoded).
* Removing more dead code.
* Fixing `config.n_head`.
* Stopping earlier because of `<end_of_utterance>` in idefics2.
* Addresses comments.
* Removing the dead code.
* Fuse back mistral into FlashCausalLM.
* Finish removal.
* Fixing docs + causal_lm `batch_class`.
* Fixing docs + causal.lm.
* Add default to Gemma Causality.
* Default value for gemma/gemma2.
* Wrong default.
The router will now send the input as chunks besides as a single
string. This change modifies the server to process chunked input
rather than strings. This also allows us to remove the image
extraction code from the server.
@njhill, @yk FYI
generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.
We also remove the unused Vec.