Hi Dinesh,
the benchmark is measuring the performance of raw LLMs on the task of writing WL code.
The notebook assistant is using RAG based on language documentation to boost the relevance of its answers so it would be like cheating!
While the assistant won't have it's own row, in the future we could add a new column to measure how each model uses the same extra knowledge.
Cheers