#P9206. 不死「徐福时空」

不死「徐福时空」

Background

Xu Fu was a local alchemist (fangshi, 方士) from the Qi region during the Qin dynasty.

By order of the First Emperor of Qin, Xu Fu led three thousand young boys and girls on a journey to find the legendary "Penglai medicine". But in the end, he never returned.

Where did Xu Fu finally go? Did he find the Penglai medicine? These questions are no longer important.

Problem Description

The passage of time can be abstracted as the time spent sorting a number sequence. Different sorting strategies take different amounts of time. Here we introduce a sorting algorithm that was a milestone in human exploration of sorting: Shell sort.

Shell sort can be seen as an optimization of insertion sort. To study the running efficiency of Shell sort, we want you to implement a simple Shell sort process. Before that, we will standardize the exact procedure of insertion sort and how to measure the "cost" of an insertion sort process.

Insertion Sort

For an array a=[a1,a2,,an]a=[a_1,a_2,\cdots,a_n] of length nn, the idea of insertion sort is to scan each element from left to right and insert it into the correct position:

The figure shows a typical insertion sort process. In round ii, we insert the element at index ii before the first element in the sorted part that is greater than ai\bm{a_i}. Suppose aia_i is finally inserted into position bib_i. Then we define the cost of this round as aibi+1|a_i-b_i|+1. The cost of the whole insertion sort process is the sum of the costs of all rounds.

Shell Sort

To reduce the cost of insertion sort, we introduce Shell sort. Shell sort splits the whole sorting process into several rounds. In each round, it groups elements by a fixed gap and sorts the elements within each group separately. In the last round, Shell sort performs a final insertion sort on the entire array.

The grouping method is as follows. Choose an integer dd, and divide the indices into the following groups:

  • Elements with indices 1,1+d,1+2d,1,1+d,1+2d,\cdots.
  • Elements with indices 2,2+d,2+2d,2,2+d,2+2d,\cdots.
  • Elements with indices 3,3+d,3+2d,3,3+d,3+2d,\cdots.
  • ……
  • Elements with indices d,2d,3d,d,2d,3d,\cdots.

Below is the process of one round of Shell sort. We choose d=3d=3.

In each round, Shell sort chooses a value of dd, and in the last round it uses d=1d=1. After performing such sorting in every round, it finally obtains a sorted array.

Although it looks like many insertion sorts are performed, the total cost of all insertion sorts across all rounds may be much smaller than the cost of performing insertion sort once on the whole array (this is not very obvious in the example above; you can refer to the examples in Samples 2,32,3).

In fact, Shell sort is the first sorting algorithm discovered by humans whose worst-case complexity is lower than Θ(n2)\Theta (n^2). For example, when choosing $d=2^k-1,\ k=\lfloor\log_2 n\rfloor,\lfloor\log_2 n\rfloor-1,\lfloor\log_2 n\rfloor-2,\cdots,1$, the worst-case time complexity of the whole process is Θ(n3/2)\mathcal \Theta(n^{3/2}).

Input Format

The first line contains two integers n,mn,m, representing the length of the array to be sorted and the number of rounds of Shell sort.

The second line contains nn integers a1,a2,,ana_1,a_2,\cdots,a_n, describing the elements in the array.

The third line contains mm integers d1,d2,,dmd_1,d_2,\cdots,d_m, describing the value of dd chosen in each round. The testdata guarantees that dm=1d_m=1.

Output Format

The first line outputs an integer cost\mathrm{cost}, which is the sum of the costs produced by insertion sort during the whole Shell sort process.

The second line outputs nn integers, representing the array aa after sorting.

10 1
3 2 6 4 1 1 3 8 7 3
1

27
1 1 2 3 3 3 4 6 7 8 

15 1
15 14 13 12 10 11 9 8 7 4 5 6 3 2 1
1

116
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

15 3
15 14 13 12 10 11 9 8 7 4 5 6 3 2 1      
9 3 1
68
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Hint

Constraints and Notes

For all testdata, it is guaranteed that 1n1051\le n\le 10^5, 1m1001\le m\le 100, cost5×107\mathrm{cost}\le 5\times 10^7, 1ai1091\le a_i\le 10^9, 1din1\le d_i\le n, and dm=1d_m=1.

Translated by ChatGPT 5