Massive real-time fine-grained electricity consumption data has brought short-term load forecasting (STLF) into the era of big data in electricity and this leads the high computing cost of training short-term load forecasting models (SLFM) for massive STLF scenarios. To address this issue, this paper proposes a SLFM transfer method (SLFMTM), which follows the principle of transfer learning and captures the characteristics of STLF. Firstly, temporal and distribution similarity measurements are proposed, which guarantees the upper bound of transferred model performance. Secondly, SLFMTM applies hierarchical clustering method to cluster load sequences, in which temporal similarity is used for measuring load sequence distance and distribution similarity is used for determining the best number of clusters. Thirdly, feature selection results of several STLF scenarios are taken intersection as the inputs for all scenarios in the same cluster to decrease computing time. Finally, SLFMTM proposes a bagging method for fine-tuning to decrease generalization error. The evidence from an empirical study in Guiyang, China shows that: 1) SLFMTM can maintain the SLFM performance without degrade, and 2) SLFMTM greatly reduces the SLFM computing cost by minimum of 92.66%, 92.27% and 90.9% respectively.