#P14095. [ICPC 2023 Seoul R] Tandem Copy
[ICPC 2023 Seoul R] Tandem Copy
题目描述
Tandem copy is an operation on a DNA where a consecutive sequence of one or more nucleotides is repeated, and the repetitions are directly adjacent to each other; in other words, the tandem copy operation makes a copy of a consecutive sequence of nucleotides and pastes the copy right after the copied sequence. For example, is resulted from the tandem copy of in . Furthermore, we can continue another tandem copy on the resulted sequence and obtain . The following example illustrates a series of tandem copies from , where the underlined sequence is copied at each step.
$$\texttt{\underline{ATC}G}\Rightarrow\texttt{ATCA\underline{T}CG}\Rightarrow\texttt{\underline{AT}CATTCG}\Rightarrow\texttt{ATATC\underline{ATT}CG}\Rightarrow\dots $$We say that produces all these sequences by tandem copy. It is easy to see that can produce different sequences by selecting a different portion of the sequence to tandem copy at each step. Furthermore, in principle, can produce infinitely many sequences by continuing tandem copies as many as it needs.
Usually, it is more expensive to tandem copy a longer portion. For instance,
$$\texttt{\underline{ATC}G}\Rightarrow\texttt{ATCATCG} $$is a tandem copy of three nucleotides and thus is more expensive than
$$\texttt{ATCA\underline{T}CG}\Rightarrow\texttt{ATCATTCG} , $$which is a tandem copy of one nucleotide. In other words, the length of a copied portion at each step is crucial to determine the tandem copy cost.
Since it is easy to tandem copy a single nucleotide, it is practical for the ICPC lab to store sequences such that two consecutive nucleotides in a sequence is always different; this helps the lab to reduce the storage space. For instance, since can be produced by tandem copying twice from , it is better for the lab to only store the shorter sequence instead of .
Because of a recent budget cut, the ICPC lab can only perform the tandem copy on at most two nucleotides at one time; namely, the length of a copied portion is at most two at each step. On the other hand, the lab can continue to repeat the tandem copy as many as it desires. For example, given a sequence , we can apply the tandem copy operation on and obtain , or apply it on the sequence and obtain . But we cannot tandem copy the consecutive sequence because its length is longer than two.
Given a source string and a target string , your task is to count the number of all valid substrings of , where one can obtain a string from by applying an appropriate number of the tandem copy operations such that contains as a substring. Please note that no two consecutive nucleotides in the source string are the same, whereas two consecutive nucleotides in the target string can be the same. For example, or cannot be source strings, but they can be target strings.
Now, given and , we take a substring of and run a series of tandem copies as follows:
$$s'=\texttt{\underline{C}ATGC}\Rightarrow\texttt{C\underline{CA}TGC}\Rightarrow\texttt{CCACA\underline{T}GC}\Rightarrow\texttt{CCACAT\underline{T}GC}\Rightarrow\texttt{CCACATTTGC}, $$which contains as its substring.
Here is another substring example. For ,
$$s'=\texttt{\underline{CA}T}\Rightarrow\texttt{\underline{C}ACAT}\Rightarrow\texttt{CCACA\underline{T}}\Rightarrow\texttt{CCACA\underline{T}T}\Rightarrow\texttt{CCACATTT}=t $$which shows that we can produce the target string from by a series of tandem copies.
It is easy to verify that the total number of valid substrings of is . Note that both the first and the second in are counted as different valid substrings. Thus, you need to consider all substrings of and count all valid substrings individually.
Here is another example. When and , you can take the substring and tandem copy . Then, the resulting string is , which contains as its substring. All other substrings of 𝑠𝑠 are unable to produce as a substring, and therefore the number of valid substrings is one.
Given a source string and a target string , where no two consecutive characters in are the same character, write a program that outputs the number of valid substrings of . is a valid substring of if a series of tandem copies on can produce a string that contains as its substring, where a tandem copy is restricted to at most two consecutive characters at each step.
输入格式
Your program is to read from standard input. The input consists of two lines. The first line is the source string , and the second line is the target string . Each input consists of uppercase letters A
to Z
, and .
输出格式
Your program is to write to standard output. Print exactly one line. The line should print the number of valid substrings in which a series of tandem copies can produce a string that contains the target string as its substring.
ACGCG
CCG
9
TATCGC
TTCCG
6
ABCABC
ABC
7
ABCABC
ABCABC
1