The exact germplasm source of HB and LN plantation population
The distribution of haplotypes in different populations (Figures 1,2,6, Supplementary Tables S7, S8 ) refers to the phylogenetic relationship of all individuals and the cluster relationship based on the population (Figures 4&5 ), the germplasm sources of the plantation could be inferred.
The fixed ancestral status relationship among the 5 groups was: SX*-HB*-HB-LN*-LN (Figure 4A ), which showed that the SX* group had an absolute core position in northern China consistent with previous research results (Zhou et al., 2022). 21 haplotypes (nad5-1) could provide valuable information for germplasm traceability of plantations, and the other 7 haplotypes exist in someone group privately (Supplementary Table S8-2 ).
Among HB plantation populations, there was a little difference in genetic structure (Figure 4,5 ), and a similar haplotype distribution pattern (Figure 1,2,6 ), suggesting the germplasm sources background of the HB were relatively simple. SHB2 and SHB1* had almost the same haplotype distribution pattern (Supplementary Table S8 ). SHB2 and SHB1* were always in the same topology, with adjacent genetic distance (Figure 4B, 5B ), and were always in the same subgroup (Figure 4E, 5E ). At the same time, the two populations were geographically close (Figure 1,2, Supplementary Table S1 ). Based on the above explanation, it was very possible to determined that the germplasm of SHB2 came from SHB1* (Figure 7 ), which also brought the accuracy of the way of determining the germplasm sources of plantations based on haplotype distribution, genetic distance, and ancestral composition. Hap-6 was the dominant haplotype of HTLZ1, which also accounts for the majority of GDS*(9/33), and there was a close genetic distance between HTLZ1 and GDS* (Figure 5B ), suggesting the germplasm sources of HTLZ1 may came from GDS* (Figure 7 ). Hap-4 was the dominant haplotype of HTLZ2-4, GDS*(17/33), ZTS*(7/21), LLS2*(5/12), SHB1*(17/33), DWP1*(9/17), DWP2*(9/23), HLH*(5/21) and BDS*(10/34), which also accounts for the majority of GCS*(8/31) (Figure 2 ,Supplementary Table S8) , suggesting HTLZ2-4 might came from these natural populations. According to our field investigation, the size of SHB1* and BDS* populations was so small that it was difficult to provide enough seeds for the plantations. The stand age of DWP1-2* is about the same as that of HTLZ2-4, DWP1-2* might be mistaken for a natural forest, which means that DWP1-2* was unlikely to provided seeds for HTLZ2-4. Hap-6 was the secondary haplotype of HTLZ2-4, which also accounts for the majority of GDS*(9/33) (Figure 2 ,Supplementary Table S8 ), suggesting the germplasm sources of HTLZ2-4 most likely came from GDS*. GDS* and HTLZ2-4 existed in the same topology, while GDS* occupies the ancestral position (Figure 5B ), which provided strong evidence that the germplasm sources of HTLZ2-4 came from GDS*. Combined with the above evidence, we confirmed that the germplasm sources of HTLZ1-4 were most likely to came from GDS* (Figure 7 ). The geographical location between HTLZ1-4 is close, and their germplasm sources should be in the same place, which showed the accuracy of our traceability method. Based on hap-4 and hap-11 (Figure 2 , Supplementary Table S8 ), it was inferred that DWP3 was most likely to came from ZTS* (Figure 7 ), and the adjacent genetic distance between them provided strong evidence for this (Figure 4B ). Hap-4, hap-6, and hap-11 were the main haplotypes of DWP4, which were consistent with ZTS*. In addition, the GDS* was adjacent to DWP4 (Figure 5B ), and GDS* was the ancestor population of the topology where DWP4 was located (Figure 4B ). It was speculated that both ZTS* and GDS* might be the germplasm sources of DWP4. The germplasm sources of DWP4 and DWP3 should be the same, the most likely germplasm sources of DWP4 should also be ZTS* (Figure 7 ). The haplotype distribution pattern of MJB was almost the same as that of ZTS* and THS* (Supplementary Table S8 ), nailed in the same topology with the close genetic distance (Figure 4B, 5B ), and fixed in the same subgroup (Figure 4E, 5E ). It was hard to determined which of the two natural forests was more similar to the genetic background of MJB, we suggested that both ZTS* and THS* might be the germplasm sources of MJB (Figure 7 ). XF, LH, and QG fixed in the same subgroup with a consistent proportion of the dominant ancestral component, nailed in the same topology with the adjacent genetic distance, and shared a similar haplotype distribution pattern, the main haplotypes were hap-4, 11, 6, 13, and 7 in turn. GCS* showed a similar haplotype distribution pattern to these three populations. Taking genetic distance and ancestral components as auxiliary information: GCS* occupied the ancestor position of the topology and subgroup of QG, XF, and LH. It was inferred that the germplasm sources of XF, LH, and QG were most likely came from GCS* (Figure 7 ).
Among LN plantation populations, there was a large difference in genetic structure (Figure 3,4 ), and a diverse haplotype distribution pattern (Figure 1,2,6 ), suggesting the germplasm sources background of the LN was complicated. The haplotype distribution patterns of HD1, HD2, WD1, WD2, and DCY were the same, with the dominant haplotype of hap-7, which was only WF* consistent with it (Figure 2 , Supplementary Table S8 ). The adjacent genetic relationship (Figure 4B, 5B ) and similar genetic lineage between them (Figure 4E, 5E ) provided powerful evidence for the possibility that the germplasm of the five plantations came from WF*. WF* population with the large size, tall trees, convenient geographical location, and long-term artificial management, we further speculated that WF* provided germplasms for these five populations (HD1, HD2, WD1, WD2, and DCY) (Figure 7 ). The adjacent genetic relationship of DB1, DB2, and ZJS* in the NJ tree (Figure 5B ) and the consistency of genetic lineages (Figure 5E ) provided strong evidence for the hypothesis that the germplasm of these two plantations came from ZJS*. Due to the limited sample size of the ZJS* population, we did not found obvious evidence of the haplotype distribution model. Based on the distribution of haplotypes, we could not rule out that GDS* and GCS* provided germplasm for DB1 and DB2, but we did not found this evidence in the genetic distance and genetic lineage. The geographical distance between DB and ZJS* is relatively close (Figure 1,2, Supplementary Table S1 ), which providing germplasm allocation convenience. The convenient geographical location and large size of the ZJS* bring sufficient conditions for it as a germplasm allocation population. We speculated that the germplasm of DB1 and DB2 came from the local area, and ZJS* provided them with germplasm (Figure 7 ). ZZD1, ZZD2, LJG, and ZGT were located in the same topology with the adjacent genetic distance (Figure 4B, 5B ), fixed in the same cluster with a similar lineage (Figure 5E ), it showed that there was little difference in their genetic background. GDS* and WTG* nailed in the topology in which ZZD1, ZZD2, LJG, and ZGT exist, and occupied the ancestral position (Figure 5B ). ZZD1, ZZD2, LJG and ZGT shared a similar haplotype distribution pattern with GDS* and WTG*, suggesting that GDS* and WTG* might be the germplasm of these four plantations. According to our investigation, WTG* with the remote geographical location, small population size, poor growth, and few seeds, indicates that WTG* do not have enough conditions to provide germplasms for plantations construction. LJG shared a similar haplotype structure with WF*, however, the large genetic distance between them weakens the possibility of WF* providing germplasm for LJG. We immaturely judged that their germplasms came from GDS* according to genetic structure and genetic lineage. Based on the above inference, we speculated that the germplasms of these four populations (ZZD1, ZZD2, LJG, ZGT) were all came from GDS* (Figure 7 ). HQ was nailed the priority position of the NJ tree (Figure 4B, 5B ), and it was difficult to judged the origin of its germplasms according to its haplotype structure, genetic distance, and genetic lineage. We consulted the afforestation archives of HQ and determined that most of its germplasm came from the Xingcheng seed orchard (A seed orchard of Chinese pine in northern China), which was selected from superior trees in all over Liaoning Province. The complex germplasm background of HQ brought challenges to germplasm traceability, and we could laboriously determine its germplasm sources here (Figure 7 ).
To sum up, we suggesting almost all HB populations came from SX* (GDS*, ZTS*, GCS*, and THS*), which leaded to the genetic background homogeneity of HB populations. Shanxi and Hebei Provinces are geographically close, which provides convenience for germplasms allocation. Most of the germplasms of LN plantations come from LN* (ZJS*, WF*), and the other part come from GDS* (SX*), which resulted in great differences in genetic structure within the LN group.