[北大集训 2021] 基因编辑

ID: 9524

远端评测题

1000ms

512MiB

尝试: 0

已通过: 0

难度: 7

上传者:

Hydro

标签>

2021

Background

CTT2021 D1T3

Problem Description

Humans have already developed many gene-editing techniques. One of the most traditional methods uses “restriction endonucleases” (restriction enzymes). Such enzymes can recognize specific nucleotide sequences and cut the phosphodiester bond connecting adjacent nucleotides at designated positions, producing sequence cuts called “ends”. Only matching ends can be ligated by DNA ligase.

Suppose we want to use gene fragment $A$ to replace gene fragment $B$ on a vector $V$ . In restriction-enzyme-based editing, the usual procedure is:

Choose one restriction enzyme recognition site at each end of gene $A$ . These two recognition sites should also exist correspondingly at the two ends of gene $B$ .
Use the restriction enzymes corresponding to the chosen recognition sites to process gene $A$ , so that the corresponding ends are generated at both ends. Purify the processed gene $A$ .
Use the same restriction enzymes to cut the recognition sites on vector $V$ , separating gene $B$ from vector $V$ . Purify the vector $V'$ with gene $B$ removed.
Mix vector $V'$ with gene $A$ , and with the help of DNA ligase, reconnect the broken phosphodiester bonds.

It is worth noting that if the two recognition sites produce the same ends after cutting, then in step 4 the vector $V'$ might ligate by itself, producing a vector that contains neither gene $A$ nor $B$ ; it is also possible that gene $A$ is inserted into $V'$ in reverse, also producing an incorrect vector. Therefore, in practice, restriction enzymes that produce different ends are usually chosen to process gene $A$ and vector $V$ .

In the year 3032, humans discovered an alien civilization HD1048576d that had mastered gene-editing technology. Of course, this technology only applies to the genes of alien organisms living on planet HD1048576d. The smallest unit that human gene-editing technology can recognize is a single base on a DNA sequence, while the alien civilization’s technology can recognize the smallest unit as a single noicleobase on their gene sequence. For convenience, we represent one noicleobase by a positive integer starting from $1$ . Then an alien gene sequence can be represented as a corresponding sequence of positive integers.

For an alien gene sequence of length $n$ (denote its positive-integer representation as $s_1, s_2, \cdots, s_n$ ), the gene-editing process of HD1048576d is as follows:

Choose an interval to be edited $[l, r]$ , i.e. replace in place the subsequence $s_l, s_{l+1}, \cdots, s_r$ in the original sequence.
Pick an ordered pair of indices $(i, j)$ that crosses the interval to be replaced (i.e. $1\le i<l$ and $r<j\le n$ ), and mass-produce the new sequence corresponding to the subsequence $s_i, \cdots, s_j$ after editing, namely $s_i, \cdots, s_{l-1}, t_1, \cdots, t_k, s_{r+1}, \cdots, s_j$.
Using the corresponding specific recognition tool, cut the subsequence $s_i, \cdots, s_j$ out of the original sequence, and ligate $s_i, \cdots, s_{l-1}, t_1, \cdots, t_k, s_{r+1}, \cdots, s_j$ into the sequence, obtaining the target gene sequence.

Note that in step 2, the chosen pair of indices must correspond to a unique noicleobase combination. That is, the ordered pairs $\left(i', j'\right)$ satisfying $s_{i'}=s_i, s_{j'} = s_j$ and $i'<j'$ must be unique (i.e. only $(i, j)$ ). Otherwise, the specific recognition tool may cut out other segments of the gene sequence. In addition, $s_i\ne s_j$ , otherwise the specific recognition tool may cut out only a single noicleobase.

Also, because producing the new gene sequence during replacement requires nontrivial cost, the alien civilization wants to minimize the length of the gene sequence that needs to be produced. Obviously, minimizing this length is equivalent to minimizing the length of the cut-out gene subsequence, so in practice the optimal solution is computed by minimizing the length of the cut-out gene subsequence.

Now they want to test the intelligence of human civilization, so you are selected from many high school students to solve this problem.

Input Format

The first line contains three positive integers $n, l, r$ , representing the length of the gene sequence to be edited, the left endpoint of the region to be edited, and the right endpoint of the region to be edited. It is guaranteed that $3\le n\le 10^6$ and $1<l\le r < n$ .

The second line contains $n$ positive integers $s_1, \cdots, s_n$ , representing the gene sequence to be edited in the form of positive integers. It is guaranteed that each noicleobase index $s_i$ lies in $[1, 10^6]$ .

Output Format

If there exists a gene-sequence cutting plan that satisfies the constraints of the alien civilization’s gene-editing technology, output a positive integer, meaning among all valid plans, the minimum length of the cut-out gene subsequence. Otherwise, output -1, meaning that no cutting plan satisfies the constraints.

10 4 6
2 1 4 7 4 8 3 6 4 8

Hint

The optimal plan is to cut out 1 4 7 4 8 3. It can be proved that there is no better cutting plan satisfying the technical constraints.

A plan with a shorter cut length is 4 7 4 8 3, but under this plan the specific recognition tool may cut out 4 8 3, causing unexpected mutations in the resulting target gene sequence. Therefore, this cutting plan does not satisfy the technical constraints.

For $100\%$ of the testdata, it is guaranteed that $3\le n \le 10^6$ , $\forall 1\le i\le n, 1\le s_i\le 10^6$ , and $1<l\le r<n$ .

This problem has 5 subtasks. You must pass all test points in a subtask to obtain the corresponding score for that subtask. The table below lists the data scale and properties for each subtask.

Subtask ID	Score	$n\le$	$s_i\le$	Special Property
$1$	$5$	$1000$	$1000$	$\times$
$2$	$10$	$1000$	$10^6$
$3$	$25$	$10^6$	$1000$
$4$	$30$		$10^6$	$\surd$
$5$	$30$		$10^6$	$\times$

The “Special Property” in the table means: $s_1, \cdots, s_{l - 1}$ are all distinct, and $s_{r + 1}, \cdots, s_n$ are all distinct.

Translated by ChatGPT 5

#P8986. [北大集训 2021] 基因编辑

Background

Problem Description

Input Format

Output Format

Hint

还没有账户？

登录