#P8986. [北大集训 2021] 基因编辑

[北大集训 2021] 基因编辑

Background

CTT2021 D1T3

Problem Description

Humans have already developed many gene-editing techniques. One of the most traditional methods uses “restriction endonucleases” (restriction enzymes). Such enzymes can recognize specific nucleotide sequences and cut the phosphodiester bond connecting adjacent nucleotides at designated positions, producing sequence cuts called “ends”. Only matching ends can be ligated by DNA ligase.

Suppose we want to use gene fragment AA to replace gene fragment BB on a vector VV. In restriction-enzyme-based editing, the usual procedure is:

  1. Choose one restriction enzyme recognition site at each end of gene AA. These two recognition sites should also exist correspondingly at the two ends of gene BB.
  2. Use the restriction enzymes corresponding to the chosen recognition sites to process gene AA, so that the corresponding ends are generated at both ends. Purify the processed gene AA.
  3. Use the same restriction enzymes to cut the recognition sites on vector VV, separating gene BB from vector VV. Purify the vector VV' with gene BB removed.
  4. Mix vector VV' with gene AA, and with the help of DNA ligase, reconnect the broken phosphodiester bonds.

It is worth noting that if the two recognition sites produce the same ends after cutting, then in step 4 the vector VV' might ligate by itself, producing a vector that contains neither gene AA nor BB; it is also possible that gene AA is inserted into VV' in reverse, also producing an incorrect vector. Therefore, in practice, restriction enzymes that produce different ends are usually chosen to process gene AA and vector VV.

In the year 3032, humans discovered an alien civilization HD1048576d that had mastered gene-editing technology. Of course, this technology only applies to the genes of alien organisms living on planet HD1048576d. The smallest unit that human gene-editing technology can recognize is a single base on a DNA sequence, while the alien civilization’s technology can recognize the smallest unit as a single noicleobase on their gene sequence. For convenience, we represent one noicleobase by a positive integer starting from 11. Then an alien gene sequence can be represented as a corresponding sequence of positive integers.

For an alien gene sequence of length nn (denote its positive-integer representation as s1,s2,,sns_1, s_2, \cdots, s_n), the gene-editing process of HD1048576d is as follows:

  1. Choose an interval to be edited [l,r][l, r], i.e. replace in place the subsequence sl,sl+1,,srs_l, s_{l+1}, \cdots, s_r in the original sequence.
  2. Pick an ordered pair of indices (i,j)(i, j) that crosses the interval to be replaced (i.e. 1i<l1\le i<l and r<jnr<j\le n), and mass-produce the new sequence corresponding to the subsequence si,,sjs_i, \cdots, s_j after editing, namely $s_i, \cdots, s_{l-1}, t_1, \cdots, t_k, s_{r+1}, \cdots, s_j$.
  3. Using the corresponding specific recognition tool, cut the subsequence si,,sjs_i, \cdots, s_j out of the original sequence, and ligate $s_i, \cdots, s_{l-1}, t_1, \cdots, t_k, s_{r+1}, \cdots, s_j$ into the sequence, obtaining the target gene sequence.

Note that in step 2, the chosen pair of indices must correspond to a unique noicleobase combination. That is, the ordered pairs (i,j)\left(i', j'\right) satisfying si=si,sj=sjs_{i'}=s_i, s_{j'} = s_j and i<ji'<j' must be unique (i.e. only (i,j)(i, j)). Otherwise, the specific recognition tool may cut out other segments of the gene sequence. In addition, sisjs_i\ne s_j, otherwise the specific recognition tool may cut out only a single noicleobase.

Also, because producing the new gene sequence during replacement requires nontrivial cost, the alien civilization wants to minimize the length of the gene sequence that needs to be produced. Obviously, minimizing this length is equivalent to minimizing the length of the cut-out gene subsequence, so in practice the optimal solution is computed by minimizing the length of the cut-out gene subsequence.

Now they want to test the intelligence of human civilization, so you are selected from many high school students to solve this problem.

Input Format

The first line contains three positive integers n,l,rn, l, r, representing the length of the gene sequence to be edited, the left endpoint of the region to be edited, and the right endpoint of the region to be edited. It is guaranteed that 3n1063\le n\le 10^6 and 1<lr<n1<l\le r < n.

The second line contains nn positive integers s1,,sns_1, \cdots, s_n, representing the gene sequence to be edited in the form of positive integers. It is guaranteed that each noicleobase index sis_i lies in [1,106][1, 10^6].

Output Format

If there exists a gene-sequence cutting plan that satisfies the constraints of the alien civilization’s gene-editing technology, output a positive integer, meaning among all valid plans, the minimum length of the cut-out gene subsequence. Otherwise, output -1, meaning that no cutting plan satisfies the constraints.

10 4 6
2 1 4 7 4 8 3 6 4 8

6

Hint

The optimal plan is to cut out 1 4 7 4 8 3. It can be proved that there is no better cutting plan satisfying the technical constraints.

A plan with a shorter cut length is 4 7 4 8 3, but under this plan the specific recognition tool may cut out 4 8 3, causing unexpected mutations in the resulting target gene sequence. Therefore, this cutting plan does not satisfy the technical constraints.


For 100%100\% of the testdata, it is guaranteed that 3n1063\le n \le 10^6, 1in,1si106\forall 1\le i\le n, 1\le s_i\le 10^6, and 1<lr<n1<l\le r<n.

This problem has 5 subtasks. You must pass all test points in a subtask to obtain the corresponding score for that subtask. The table below lists the data scale and properties for each subtask.

Subtask ID Score nn\le sis_i\le Special Property
11 55 10001000 10001000 ×\times
22 1010 10610^6
33 2525 10610^6 10001000
44 3030 10610^6 \surd
55 ×\times

The “Special Property” in the table means: s1,,sl1s_1, \cdots, s_{l - 1} are all distinct, and sr+1,,sns_{r + 1}, \cdots, s_n are all distinct.

Translated by ChatGPT 5