#P14793. [NERC 2025] LLM Training

    ID: 16620 远端评测题 3000ms 1024MiB 尝试: 0 已通过: 0 难度: 8 上传者: 标签>数学2025Special Judge字典树 Trie后缀数组 SAICPC拉格朗日乘数法NERC/NEERC

[NERC 2025] LLM Training

题目描述

You are given a text dataset. Your task is to train LLM (Large Language Model) and find the minimal possible loss. No kidding.

A text dataset is an array of texts t1,t2,,tnt_1, t_2, \ldots, t_n. Each text tit_i is a sequence of tokens. We define the set of tokens TT as the set of all tokens that appear in at least one text tit_i. Additionally, for each text tit_i, there is a set of positions Li{1,2,,ti}L_i \subseteq \{1, 2, \ldots, |t_i|\}. The token ti[j]t_i[j] is generated by LLM if jLij \in L_i and is written by the user if jLij \notin L_i.

Let us define LLM with context size kk as a probabilistic model PkP_k, such that it defines the probability distribution of the next token of the sequence, depending on a context ww --- a sequence of length between 00 and kk (inclusive) whose elements are from TT. Thus the probabilistic model PkP_k is a large table of probabilities Pk(nextw)P_k(\text{next} | w), defined for any context wTw \in T^{*}, 0wk0 \leq |w| \leq k and any token nextT\text{next} \in T. Conditions 0Pk(nextw)10 \leq P_k(\text{next} | w) \leq 1 and $\sum\limits_{\text{next} \in T} P_k(\text{next} | w) = 1$ should be satisfied.

The loss function of LLM with the context size kk is the following function defined for PkP_k:

$$\mathcal{L}_k(P_k) \,\, = \,\, \sum_{i=1}^{n} \,\, \sum_{j\in L_i} \, -\log_2 P_k\!\left( \underbrace{t_i[j]}_{\text{next token}} \ \middle|\ \underbrace{t_i[\max(1,j-k)\,..\,j-1]}_{\text{context}} \right)$$

Here ti[l..r]=ti[l]ti[l+1]ti[r]t_i[l\,..\,r] = t_i[l] t_i[l+1] \ldots t_i[r] is the substring from ll-th to rr-th token, ti[1..0]t_i[1\,..\,0] is an empty string. So, for each text and for each token that is generated by LLM, we add to the loss the negative logarithm (base 2) of the probability that this token will be generated, depending on the substring of previous kk tokens (or the whole prefix, if it has length less than kk). If the probability is zero, we assume that the negative logarithm is ++\infty. This loss function is known as the (base 2) Cross Entropy Loss over the LLM-generated positions. The smaller the loss function value Lk(Pk)\mathcal{L}_k(P_k), the better LLM PkP_k is.

For each 0k<maxi=1..nti0 \leq k < \max\limits_{i=1..n} |t_i|, calculate the minimum possible loss Lk(Pk)\mathcal{L}_k(P_k) that could be obtained for some PkP_k --- LLM with context size kk. It can be proved that this minimum is reachable and is not infinite.

输入格式

The first line contains a single integer nn (1n1051 \leq n \leq 10^5) --- the number of texts in the dataset. Text descriptions follow.

The first line of the ii-th text description contains a single integer mim_i (1mi31051 \leq m_i \leq 3 \cdot 10^5) --- the length of tit_i (mi=tim_i = |t_i|).

The next line contains mim_i strings ti[1]t_{i}[1], ti[2]t_{i}[2], \ldots, ti[mi]t_{i}[m_i] (1ti[j]51 \leq |t_{i}[j]| \leq 5) --- tokens of the text tit_i. Each token consists of symbols with ASCII codes from 3333 to 126126 (printable characters).

The next line contains a string i\ell_i of mim_i letters U\texttt{U} and L\texttt{L}, which encodes the set LiL_i. All positions with the letter L\texttt{L} are generated by LLM, and all positions with the letter U\texttt{U} are written by the user. So Li={ji[j]=L}L_i = \{j\,|\,\ell_{i}[j] = \texttt{L}\}. It is guaranteed that the last token is generated by LLM, so i[mi]=L\ell_{i}[m_i] = \texttt{L}.

It is guaranteed that the sum of mim_i for all ii (1in1 \le i \le n) does not exceed 31053 \cdot 10^5.

输出格式

Print M=maxi=1..nmiM = \max\limits_{i=1..n} m_i real numbers: for each k=0,1,,M1k = 0, 1, \ldots, M-1 print the minimum possible loss Lk(Pk)\mathcal{L}_k(P_k) for all possible PkP_k --- LLM with context size kk.

Your answers will be accepted if their absolute or relative errors do not exceed 10610^{-6}; formally, if pp is your answer, and qq is the jury's answer, this should hold: pqmax{1,q}106\frac{|p - q|}{\max\{1, |q|\}} \le 10^{-6}.

4
5
1 + 1 = 2
UUUUL
5
1 + 2 = 3
UUUUL
5
2 + 1 = 3
UUUUL
5
2 + 2 = 4
UUUUL
6.000000000000
6.000000000000
4.000000000000
4.000000000000
0.000000000000
4
4
N E F <EOS>
LLLL
5
N E R C <EOS>
LLLLL
6
N E E R C <EOS>
LLLLLL
5
I C P C <EOS>
LLLLL
55.683674395584
12.490224995673
8.000000000000
8.000000000000
8.000000000000
8.000000000000
1
16
a b a c a b a d b a b d a b a c
ULLULLLLLLULLLLL
22.595941331507
12.464393446710
5.245112497837
2.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
0.000000000000
2
4
WA WA WA AC
LULL
4
AC AC WA AC
LLUL
5.509775004327
4.754887502163
4.000000000000
2.000000000000