15.3. Parallel Plans
Because each worker executes the parallel portion of the plan to completion, it is not possible to simply take an ordinary query plan and run it using multiple workers. Each worker would produce a full copy of the output result set, so the query would not run any faster than normal but would produce incorrect results. Instead, the parallel portion of the plan must be what is known internally to the query optimizer as a partial plan ; that is, it must be constructed so that each process that executes the plan will generate only a subset of the output rows in such a way that each required output row is guaranteed to be generated by exactly one of the cooperating processes. Generally, this means that the scan on the driving table of the query must be a parallel-aware scan.
15.3.1. Parallel Scans
The following types of parallel-aware table scans are currently supported.
- 
     In a parallel sequential scan , the table's blocks will be divided among the cooperating processes. Blocks are handed out one at a time, so that access to the table remains sequential. 
- 
     In a parallel bitmap heap scan , one process is chosen as the leader. That process performs a scan of one or more indexes and builds a bitmap indicating which table blocks need to be visited. These blocks are then divided among the cooperating processes as in a parallel sequential scan. In other words, the heap scan is performed in parallel, but the underlying index scan is not. 
- 
     In a parallel index scan or parallel index-only scan , the cooperating processes take turns reading data from the index. Currently, parallel index scans are supported only for btree indexes. Each process will claim a single index block and will scan and return all tuples referenced by that block; other processes can at the same time be returning tuples from a different index block. The results of a parallel btree scan are returned in sorted order within each worker process. 
Other scan types, such as scans of non-btree indexes, may support parallel scans in the future.
15.3.2. Parallel Joins
Just as in a non-parallel plan, the driving table may be joined to one or more other tables using a nested loop, hash join, or merge join. The inner side of the join may be any kind of non-parallel plan that is otherwise supported by the planner provided that it is safe to run within a parallel worker. Depending on the join type, the inner side may also be a parallel plan.
- 
     In a nested loop join , the inner side is always non-parallel. Although it is executed in full, this is efficient if the inner side is an index scan, because the outer tuples and thus the loops that look up values in the index are divided over the cooperating processes. 
- 
     In a merge join , the inner side is always a non-parallel plan and therefore executed in full. This may be inefficient, especially if a sort must be performed, because the work and resulting data are duplicated in every cooperating process. 
- 
     In a hash join (without the "parallel" prefix), the inner side is executed in full by every cooperating process to build identical copies of the hash table. This may be inefficient if the hash table is large or the plan is expensive. In a parallel hash join , the inner side is a parallel hash that divides the work of building a shared hash table over the cooperating processes. 
15.3.3. Parallel Aggregation
   
    PostgreSQL
   
   supports parallel aggregation by aggregating in
    two stages.  First, each process participating in the parallel portion of
    the query performs an aggregation step, producing a partial result for
    each group of which that process is aware.  This is reflected in the plan
    as a
   
    Partial Aggregate
   
   node.  Second, the partial results are
    transferred to the leader via
   
    Gather
   
   or
   
    Gather
    Merge
   
   .  Finally, the leader re-aggregates the results across all
    workers in order to produce the final result.  This is reflected in the
    plan as a
   
    Finalize Aggregate
   
   node.
  
   Because the
   
    Finalize Aggregate
   
   node runs on the leader
    process, queries that produce a relatively large number of groups in
    comparison to the number of input rows will appear less favorable to the
    query planner. For example, in the worst-case scenario the number of
    groups seen by the
   
    Finalize Aggregate
   
   node could be as many as
    the number of input rows that were seen by all worker processes in the
   
    Partial Aggregate
   
   stage. For such cases, there is clearly
    going to be no performance benefit to using parallel aggregation. The
    query planner takes this into account during the planning process and is
    unlikely to choose parallel aggregate in this scenario.
  
   Parallel aggregation is not supported in all situations.  Each aggregate
    must be
   
    safe
   
   for parallelism and must
    have a combine function.  If the aggregate has a transition state of type
   
    internal
   
   , it must have serialization and deserialization
    functions.  See
   
    
     CREATE AGGREGATE
    
   
   for more details.
    Parallel aggregation is not supported if any aggregate function call
    contains
   
    DISTINCT
   
   or
   
    ORDER BY
   
   clause and is also
    not supported for ordered set aggregates or when  the query involves
   
    GROUPING SETS
   
   .  It can only be used when all joins involved in
    the query are also part of the parallel portion of the plan.
  
15.3.4. Parallel Append
   Whenever
   
    PostgreSQL
   
   needs to combine rows
    from multiple sources into a single result set, it uses an
   
    Append
   
   or
   
    MergeAppend
   
   plan node.
    This commonly happens when implementing
   
    UNION ALL
   
   or
    when scanning a partitioned table.  Such nodes can be used in parallel
    plans just as they can in any other plan.  However, in a parallel plan,
    the planner may instead use a
   
    Parallel Append
   
   node.
  
   When an
   
    Append
   
   node is used in a parallel plan, each
    process will execute the child plans in the order in which they appear,
    so that all participating processes cooperate to execute the first child
    plan until it is complete and then move to the second plan at around the
    same time.  When a
   
    Parallel Append
   
   is used instead, the
    executor will instead spread out the participating processes as evenly as
    possible across its child plans, so that multiple child plans are executed
    simultaneously.  This avoids contention, and also avoids paying the startup
    cost of a child plan in those processes that never execute it.
  
   Also, unlike a regular
   
    Append
   
   node, which can only have
    partial children when used within a parallel plan, a
   
    Parallel
    Append
   
   node can have both partial and non-partial child plans.
    Non-partial children will be scanned by only a single process, since
    scanning them more than once would produce duplicate results.  Plans that
    involve appending multiple results sets can therefore achieve
    coarse-grained parallelism even when efficient partial plans are not
    available.  For example, consider a query against a partitioned table
    that can only be implemented efficiently by using an index that does
    not support parallel scans.  The planner might choose a
   
    Parallel
    Append
   
   of regular
   
    Index Scan
   
   plans; each
    individual index scan would have to be executed to completion by a single
    process, but different scans could be performed at the same time by
    different processes.
  
enable_parallel_append can be used to disable this feature.
15.3.5. Parallel Plan Tips
If a query that is expected to do so does not produce a parallel plan, you can try reducing parallel_setup_cost or parallel_tuple_cost . Of course, this plan may turn out to be slower than the serial plan that the planner preferred, but this will not always be the case. If you don't get a parallel plan even with very small values of these settings (e.g., after setting them both to zero), there may be some reason why the query planner is unable to generate a parallel plan for your query. See Section 15.2 and Section 15.4 for information on why this may be the case.
   When executing a parallel plan, you can use
   
    EXPLAIN (ANALYZE,
    VERBOSE)
   
   to display per-worker statistics for each plan node.
    This may be useful in determining whether the work is being evenly
    distributed between all plan nodes and more generally in understanding the
    performance characteristics of the plan.