테스트 데이터 생성(테이블 생성 + 벌크 데이터)
CREATE TABLE public.test2m
(
id serial primary key,
timeslice timestamp ,
system_id VARCHAR(100),
group_id VARCHAR(100),
agent_name VARCHAR(100),
svc_name VARCHAR(100),
COUNT INT
);
INSERT INTO public.test2m (timeslice, system_id, group_id, agent_name, svc_name, COUNT)
SELECT
now() as timeslice,
LEFT(md5(random()::text),3) as system_id,
LEFT(md5(random()::text),5) as group_id,
md5(random()::text) as agent_name,
md5(random()::text) as svc_name,
1 as count
FROM generate_series(1, 1000000) as gs;
select * from public.test2m limit 100;
--alter table public.test2m add primary key(id);
analyze public.test2m;
explain analyze
SELECT system_id,group_id,max(agent_name),COUNT(*) FROM public.test2m
where id between 1 and 500000
GROUP BY system_id , group_id
order by system_id , group_id
LIMIT 5000
검색 조건 id 단일 조건이기때문에 id 만 인덱스를 생성한경우 16만건을 모두 읽고, group by 적용후 5000건을 잘라냅니다.
Limit (cost=45404.69..46196.17 rows=5000 width=50) (actual time=1081.514..1143.438 rows=5000 loops=1)
-> Finalize GroupAggregate (cost=45404.69..105299.76 rows=378377 width=50) (actual time=1081.513..1143.232 rows=5000 loops=1)
Group Key: system_id, group_id
-> Gather Merge (cost=45404.69..97410.53 rows=410546 width=50) (actual time=1081.509..1141.291 rows=5001 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=44404.67..49023.31 rows=205273 width=50) (actual time=965.212..968.167 rows=2122 loops=3)
Group Key: system_id, group_id
-> Sort (cost=44404.67..44917.85 rows=205273 width=43) (actual time=965.201..967.283 rows=2123 loops=3)
Sort Key: system_id, group_id
Sort Method: external merge Disk: 9696kB
Worker 0: Sort Method: external merge Disk: 8136kB
Worker 1: Sort Method: external merge Disk: 8136kB
-> Parallel Index Scan using test2m_pkey on test2m (cost=0.42..19974.72 rows=205273 width=43) (actual time=0.294..54.606 rows=166667 loops=3)
Index Cond: ((id >= 1) AND (id <= 500000))
Planning Time: 1.383 ms
Execution Time: 1165.056 ms
집계레벨 1번째 컬럼인 system_id + id 인덱스 추가후 동일 쿼리실행
create index ix_test2m_system_id_id on public.test2m(system_id,id);
explain analyze
SELECT system_id,group_id,max(agent_name),COUNT(*) FROM public.test2m
where id between 1 and 500000
GROUP BY system_id , group_id
order by system_id , group_id
LIMIT 5000
아래와 같이 5000건정도만 읽고 멈춤(excute 시간도 감소됩니다.)
Limit (cost=28.65..1768.01 rows=5000 width=50) (actual time=0.478..22.543 rows=5000 loops=1)
-> GroupAggregate (cost=28.65..131400.21 rows=377644 width=50) (actual time=0.478..22.321 rows=5000 loops=1)
Group Key: system_id, group_id
-> Incremental Sort (cost=28.65..122627.53 rows=499624 width=43) (actual time=0.473..20.306 rows=5001 loops=1)
Sort Key: system_id, group_id
Presorted Key: system_id
Full-sort Groups: 42 Sort Method: quicksort Average Memory: 30kB Peak Memory: 30kB
Pre-sorted Groups: 42 Sort Method: quicksort Average Memory: 35kB Peak Memory: 35kB
-> Index Scan using ix_test2m_system_id_id on test2m (cost=0.42..87516.65 rows=499624 width=43) (actual time=0.059..5.022 rows=5072 loops=1)
Index Cond: ((id >= 1) AND (id <= 500000))
Planning Time: 1.945 ms
Execution Time: 22.671 ms
아래쿼리는 system_id 로 먼저 group by 후에 각 system_id 별로 개별 loop join 하여 5000번 호출 합니다.
explain analyze
select * from (
select system_id from public.test2m group by system_id
) as A
INNER JOIN LATERAL(
SELECT group_id, COUNT(*) AS CNT ,max(agent_name) as agent_name FROM public.test2m AS B
WHERE A.system_id = B.system_id and id between 1 and 500000
group by system_id ,group_id
) as c ON TRUE
limit 5000
성능은 직전 쿼리와 비교하여 비슷한 수행 시간을 보여줍니다.
내부적으로 거이 비슷한 데이터 탐색방식으로 데이터를 처리할것으로 예상 됩니다.
이런형태의 데이터 조회방식을 부분 범위 처리 라고들 합니다.
Limit (cost=468.18..20131.36 rows=5000 width=50) (actual time=0.371..18.333 rows=5000 loops=1)
-> Nested Loop (cost=468.18..1966613.78 rows=499956 width=50) (actual time=0.370..18.131 rows=5000 loops=1)
-> Group (cost=0.42..28480.42 rows=4098 width=4) (actual time=0.009..1.366 rows=42 loops=1)
Group Key: test2m.system_id
-> Index Only Scan using ix_test2m_system_id_id on test2m (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.008..0.804 rows=10020 loops=1)
Heap Fetches: 0
-> GroupAggregate (cost=467.75..470.50 rows=122 width=50) (actual time=0.341..0.391 rows=119 loops=42)
Group Key: b.system_id, b.group_id
-> Sort (cost=467.75..468.06 rows=122 width=43) (actual time=0.339..0.343 rows=119 loops=42)
Sort Key: b.group_id
Sort Method: quicksort Memory: 32kB
-> Bitmap Heap Scan on test2m b (cost=5.98..463.52 rows=122 width=43) (actual time=0.033..0.120 rows=121 loops=42)
Recheck Cond: (((test2m.system_id)::text = (system_id)::text) AND (id >= 1) AND (id <= 500000))
Heap Blocks: exact=5027
-> Bitmap Index Scan on ix_test2m_system_id_id (cost=0.00..5.95 rows=122 width=0) (actual time=0.022..0.022 rows=121 loops=42)
Index Cond: (((system_id)::text = (test2m.system_id)::text) AND (id >= 1) AND (id <= 500000))
Planning Time: 0.224 ms
Execution Time: 18.498 ms
'PostgreSQL' 카테고리의 다른 글
DB-Engines Ranking (PostgreSQL) (0) | 2023.01.26 |
---|---|
PostgreSQL composite types in SQL queries (0) | 2022.12.29 |
Postgresql 변수 / 바인드 / 매개변수 처리 (0) | 2022.12.28 |
[postgreSQL] Function 구조 (0) | 2022.12.27 |
PostgreSQL 비 결정적 변수에 대한 실행 계획 (1) | 2022.12.26 |