In our earlier papers, the parallelization and implementations of Gauss-Seidel(G-S) power flow analysis have been investigated on both shared memory (SM) and distributed memory (DM) machines. The desired properties to maximize the speedup, such as the minimum communication overhead and the balancing computational load, have been described. In this paper, we investigate a two stage parallelization scheme to achieve the desired properties for the DM type machines. In the first stage, we introduce a new efficient heuristic clustering algorithm which reduces the communication time and balances the computational load. In the second stage, we devise a coloring algorithm which intends to minimize the synchronization overhead and coordinates the information exchange among processors. It is shown that the parallelization scheme effectively increases the speedups and the associated upper bound of G-S algorithm on the nCUBE2 machine.