Managing Spill Files Generated by Queries
Greenplum Database creates spill files, also known as workfiles, on disk if it does not have sufficient memory to run an SQL query in memory.
The maximum number of spill files for a given query is governed by the gp_workfile_limit_files_per_query server configuration parameter setting. The default value of 100,000 spill files is sufficient for the majority of queries.
If a query creates more than the configured number of spill files, Greenplum Database returns this error:
ERROR: number of workfiles per query limit exceeded
Greenplum Database may generate a large number of spill files when:
- Data skew is present in the queried data. To check for data skew, see Checking for Data Distribution Skew.
- The amount of memory allocated for the query is too low. You control the maximum amount of memory that can be used by a query with the Greenplum Database server configuration parameters max_statement_mem and statement_mem, or through resource group or resource queue configuration.
You might be able to run the query successfully by changing the query, changing the data distribution, or changing the system memory configuration. The
gp_toolkit gp_workfile_* views display spill file usage information. You can use this information to troubleshoot and tune queries. The gp_workfile_* views are described in Checking Query Disk Spill Space Usage.
Additional documentation resources:
- Memory Consumption Parameters identifies the memory-related spill file server configuration parameters.
- Using Resource Groups describes memory and spill considerations when resource group-based resource management is active.
- Using Resource Queues describes memory and spill considerations when resource queue-based resource management is active.
Parent topic:Querying Data