2024년 2월 22일

오라클 퍼포먼스 튜닝

by 병아리 엔지니어 2024. 2. 22. 17:38

책읽기, 일기쓰기

알고리즘 공부는 죽을 때까지 ♡

1교시

어제 배운 내용을 복습해보자.

★ 조인 조건 푸시다운 pushdown

- 조인조건절을 뷰 쿼리 블록 안으로 넣어서, 조인 수행시 드라이빙 테이블에서 읽은 조인 컬럼 값을

inner 쪽 뷰 쿼리블록 내에서 참조할 수 있도록 하는 기능

select /*+ gather_plan_statistics */ d.department_id, d.department_name, e.avg_sal
from departments d, (select /*+ no_merge */ department_id, avg(salary) avg_sal
from employees
group by department_id) e
where d.department_id = e.department_id
and d.location_id = 1800;

(코드 이따위로 짜지말자... 고 선생님이 말씀해 주심)

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

6행 sort aggregate :

select /*+ gather_plan_statistics */ d.department_id, d.department_name, e.avg_sal
from departments d, (select /*+ no_merge */ department_id, avg(salary) avg_sal
from employees
group by department_id) e --- group by 대신 넣기 (push)
where d.department_id = e.department_id --- 조인 조건 술어를 ↑
and d.location_id = 1800;

select /*+ gather_plan_statistics */ d.department_id, d.department_name, e.avg_sal
from departments d, (select /*+ no_merge no_push_pred */ department_id, avg(salary) avg_sal
from employees
group by department_id) e
where d.department_id = e.department_id
and d.location_id = 1800;

--- no_push_pred: 조인 조건을 인라인뷰 안에 pushdown 하지말라는 뜻

인라인 뷰는 인라인 뷰대로, 메인쿼리절은 메인쿼리절대로 수행

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

1800위치에 있는 정보를 가지고 빌드테이블인 해시테이블을 만들고

probe 테이블은 (6, 7, 8) 2와 5는 생각하지 말자.

where d.department_id = e.department_id 그리고 빌드 쪽 deptid 와 probe 쪽 deptid 가 조인되어야 하는데

build

해시 키	dept_id
20	20
50	50

↑ 해시 키 값을 찾으러 감 - 10, 20, 30, 40, 50 을 하나씩 던져서 리턴되는 해시 값이 있는지 보기

probe

10
20
30
40
50

있으면 따로 keep, 없으면 버림 (비효율적)

이럴 때는 probe 에서 build 로 올라가야 할 값들(20, 50)만 따로 모아서 전달하는 것이 낫다.

20

50

(그게 바로 7행 join filter use / 2행 join filter create /

그리고 8행 bloom filter 는 알고리즘 이름)

★ 블룸 필터 Bloom Filter

- 해시 조인시 probe(후행) 집합에서 조인에 참여하는 건수를 줄임으로써 조인 시간을 단축시키는 알고리즘

2교시

★ 조건절 푸시다운 pushdown

- group by 절에 포함된 복합 뷰 merging 에 실패했을 경우에

쿼리블록 밖에 있는 조건절을 쿼리블록 안쪽으로 넣음으로써 group by 해야 할 데이터 양을 줄일 수 있다.

select /*+ gather_plan_statistics */ *
from (select department_id, sum(salary) sum_sal from employees group by department_id)
where department_id = 20;

(인라인 뷰를 잘못 씀 - 쓸데없이 다른 부서까지 써서 집계값 구함, 불필요한 I/O 발생

그래서 오라클은 트랜스포밍을 해서 바꿔버린다.)

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

사원 테이블의 모든 부서 집계값을 다 구했는데

where 절에는 20 번만 들어있다.

select /*+ gather_plan_statistics */ *
from (select department_id, sum(salary) sum_sal from employees group by department_id)
where department_id = 20;

department_id 가 인라인 뷰 안으로 들어가고

sort는 하지 않음

그럼 다음의 쿼리문은 어떤 문제가 있을까?

★ 조건절 풀업 pull up

- 조건절을 쿼리블록 안으로 밀어넣을 뿐만 아니라, 안쪽에 있는 조건들을 바깥쪽으로 끄집어내기도 하는 것

select /*+ gather_plan_statistics */ a.department_id, a.sum_sal, b.max_sal, b.min_sal
from (select department_id, sum(salary) sum_sal from employees where department_id = 20 group by department_id) a,
(select department_id, max(salary) max_sal, min(salary) min_sal from employees group by department_id) b
where a.department_id = b.department_id;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

문제점:

인라인 뷰 a 와 인라인 뷰 b 를 조인하는데

어차피 20번 부서 집계값들만 구하는데도

b 안에는 전체 부서 집계값이 다 들어 있다.

a 안에 있는 조건절을 바깥으로 빼고

걔를 인라인 뷰 안에 푸시다운

(조건절 풀업: 다른 인라인 뷰 안에 들어있는 조건절을 밖으로 뺐다가 다른 인라인 뷰 안에 넣는 것)

※ 잠깐 복습

rollup:

cube: 롤업 기능을 포함하고, 조합 가능한 집계값을 모두 구함

grouping sets:

select /*+ gather_plan_statistics */ a.department_id, a.sum_sal, b.max_sal, b.min_sal
from (select department_id, sum(salary) sum_sal from employees where department_id = 20 group by department_id) a,
(select department_id, max(salary) max_sal, min(salary) min_sal from employees where department_id =20 group by department_id) b
where a.department_id = b.department_id;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG' or department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

or 때문에 대용량 테이블을 풀스캔할 수밖에 없다.

IT_PROG 따로, 부서번호 20 번 따로 한번 뽑아보자.

두 컬럼에는 각각 인덱스가 걸려 있기 때문에 따로 볼 때는 인덱스 스캔이 돌아가게 된다.

1. IT_PROG 따로

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG';

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics */ *
from employees
where department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

풀 스캔보다는 인덱스 스캔을 이용하는 것이 훨씬 나은데

OR 조건을 쓰면 인덱스를 쓸 수 없게 되어버린다.

이럴 때는 OR 말고 UNION ALL을 쓰는 것이 좋다.

1. UNION 만 썼을 때 (소트 돌아감)

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG'
UNION
select *
from employees
where department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

2. UNION ALL 을 썼을 때
select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG'
UNION ALL
select *
from employees
where department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

하지만 UNION ALL을 쓰면 중복되는 데이터가 있을 수 있다.

3교시

그러면 union all을 쓰더라도 중복이 없게 만들고 싶다면?

IT_PROG 에서 20번 부서 아이디에 해당하는 데이터가 빠져야 하고

IT_PROG 인 사원들 중에서 부서아이디가 null 인 데이터

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG'
AND (department_id <> 20 or department_id is null);

and 와 or 를 같이 쓰면 우선순위는 and 가 높으므로 and 가 먼저 돌아간다.
(그러니 and 와 or 를 같이 쓸 때는 주의하자!!!!!!!!!!!)
위에서는 괄호를 묶어줌

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG'
and LNNVL(department_id = 20);

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

위에서 수행했던 Predicate Information (identified by operation id): 부분의

"DEPARTMENT_ID"<>20 OR "DEPARTMENT_ID" IS NULL 가

LNNVL(department_id = 20); 여기서 돌아가고 있는 것

(LNNVL 함수는 현장에서는 대부분 모른다. 하지만 이걸 사용하면 훨씬 더 간결하게 코드를 작성할 수 있다)

lnnvl 함수로 원래의 쿼리문 개선하기

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG'
and LNNVL(department_id = 20)
UNION ALL
select *
from employees
where department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

만약 LNNVL 함수를 쓰지 않았을 경우에는 아래와 같이 쿼리문을 작성해야 한다:

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG'
and (department_id <> 20 or department_id is null)
UNION ALL
select *
from employees
where department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG' OR department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

직업과 부서번호 둘 다에 인덱스가 걸려 있었는데도 OR 를 쓰면

인덱스를 활용하지 못하고 풀 테이블 스캔이 발생했었다.

★ or 에 대한 확장 (or - expansion)

- 사용자의 쿼리를 직접 바꾸지 않아도 옵티마이저가 or 조건을 full table scan 처리가 아닌

index range scan 을 수행하면서 union all 형태로 변경처리해 주는 기능

- 힌트: use_concat (or-expansion 을 유도) / no_expand (or-expansion 을 유도하지 않겠다)

select /*+ gather_plan_statistics use_concat */ * --- use_concat 이라는 힌트를 쓰면
from employees
where job_id = 'IT_PROG' OR department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

conatenation = union all

select /*+ gather_plan_statistics no_expand */ * --- no_expand 힌트: 풀 테이블 스캔 유도
from employees
where job_id = 'IT_PROG' OR department_id = 20;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

인덱스 스캔이 무조건 좋은 것만은 아니다. (데이터 분포도에 따라 다르다)

테이블의 전체 데이터가 100건일 때,

10번 부서 데이터는 2건 / 20번 부서 데이터는 1건 / 30번 부서 데이터는 5건 / 40번 부서 데이터는 70건이라고 하자.

찾고자 하는 데이터는 10번 부서 데이터인데

풀 테이블 스캔이 발생하면: 전체 블록을 모두 읽어서 2건 데이터만 찾으므로 매우 비효율적이다.

이때는 인덱스 스캔을 이용하는 것이 훨씬 낫다.

하지만 40번 부서 데이터를 찾는데 인덱스로 찾는다고 하면 어떨까?

인덱스

부서번호	rowid
40	#1
40	#2
40	#3
40	#4
...	...

↓

테이블

. . . #1	. . . #2	. . .
. . . #4	. . .	. . . #3

인덱스에서 부서번호가 40번인 행으로 액세스 (I/O 수 1)

부서번호 40번 행의 첫번째 rowid #1 을 가지고 테이블로 스캔하러 가고 (I/O 수 2)

또 인덱스의 그다음 행으로 액세스하고 (I/O 수 3)

rowid #2 를 가지고 테이블로 스캔하러 가고 (I/O 수 4)

...

이런 식이면 부서번호 40번인 데이터가 70건이므로

버퍼 피닝이 돌아가지 않는 경우 I/O가 최대 70번까지 발생할 수 있다.

이럴 때는 테이블 풀 스캔이 더 나을 수 있다.

(그래서 현장에서는 no_expand 를 쓰는 것)

4교시

부서별로 그룹핑해서 건수 세기

select department_id, count(*) from hr.employees group by department_id order by 2 desc;

값의 분포도가 매우 들쭉날쭉하다.

직업별로 그룹핑해서 건수 세기

select job_id, count(*) from hr.employees group by job_id order by 2 desc;

얘도 분포도가 별로 좋지 않다.

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG' or department_id = 50;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics use_concat */ *
from employees
where job_id = 'IT_PROG' or department_id = 50;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

힌트로 use_concat 을 썼지만 분포도가 좋지 않아서 오라클이 풀 스캔을 유도했다.

select /*+ gather_plan_statistics */ *
from employees
where job_id = 'IT_PROG'
and LNNVL(department_id = 50)
UNION ALL
select /*+ index(e emp_department_ix */ *
from employees e
where department_id = 50;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics */ *
from employees
where department_id = 10;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics */ *
from employees
where department_id = 50;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select /*+ gather_plan_statistics index(e emp_department_ix) */ *
from employees
where department_id = 50;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

클러스터링 팩터가 좋지 않으면 버퍼 피닝을 찍을 수가 없다.

인덱스의 부서번호 첫번째 행 데이터가 있는 블록과 두번째 행이 있는 블록이 서로 다르고

또 두번째 행이 들어있는 블록과 세번째 행이 들어있는 블록이 서로 다르고... 할 경우에는

최악의 경우 부서번호의 행 수만큼 I/O 가 발생할 수도 있다. (버퍼 피닝을 걸 수가 없기 때문)

집합연산자는 실행계획을 분리하고자 할 때에도 쓴다.

5교시

문제

자신의 부서 평균 급여보다 더 많이 받는 사원들의 employee_id, salary, department_name 을 출력해 주세요.

정답

select /*+ gather_plan_statistics */ e.employee_id, e.salary, d.department_name
from employees e, departments d
where e.department_id = d.department_id
and e.salary > (select avg(salary) from employees where department_id = d.department_id);

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

2행의 window buffer : 오라클의 트랜스포메이션 기능 때문에 문장이 저절로 튜닝됨

위 쿼리문의 문제점:

employees가 큰 테이블인데 2번 액세스하고 있다.

그런데 실행계획을 보면 employees 테이블에 한 번만 액세스함

그리고 2행에 window buffer 라고 되어있는 부분이 있는데 이건 분석함수를 썼다는 뜻

그리고 1행의 view 는 인라인 뷰를 분석함수를 이용해서 해결해버린 것

이런 식의 악성프로그램이 들어오면 오라클이 알아서 바꿔준다.

히든 파라미터 값을 설정해보자.

alter session set "_remove_aggr_subquery" = false;

기본값은 true 인데 false 로 바꾸고

HR@ora11g> alter session set "_remove_aggr_subquery" = false;
Session altered.

아래 쿼리문 다시 수행해보기

select /*+ gather_plan_statistics */ e.employee_id, e.salary, d.department_name
from employees e, departments d
where e.department_id = d.department_id
and e.salary > (select avg(salary) from employees where department_id = d.department_id);

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

I/O가 나빠진 것이 보인다.

자기가 속한 부서의 평균월급보다 많이 받는 사원을 구하고 싶을 때는

부서별로 평균값을 가지고 있는 집합만 있으면 되는데

그걸 7행에서 hash group by 로 만든 것

select /*+ gather_plan_statistics */ e2.employee_id, e2.last_name, d.department_name
from (select department_id, avg(salary) avg_sal
from employees
group by department_id) e1, employees e2, departments d
where e1.department_id = d.department_id
and e2.department_id = d.department_id
and e2.salary > e1.avg_sal;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

실행계획 똑같이 나온다.

사원 테이블이 2번 액세스하는 문제를 해결할 수 있는 방법:

히든 파라미터 값을 다시 true 로 바꾸기

HR@ora11g> alter session set "_remove_aggr_subquery" = true;
Session altered.

select /*+ gather_plan_statistics */ e.employee_id, e.salary, d.department_name, case when e.salary > avg(salary) over (partition by e.department_id) then e.rowid end vw_col_4
from employees e, departments d
where e.department_id = d.department_id;

자기가 속한 부서의 평균급여보다 많이 받는 사원들만 rowid 가 찍히고 나머지 사원들은 null 로 나온다.

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

여기서 rowid 가 찍힌 사원들만 뽑아내려면: 위 쿼리를 인라인 뷰에 다시 넣어주면 된다.

select /*+ gather_plan_statistics */ employee_id, salary, department_name
from (
select e.employee_id, e.salary, d.department_name, case when e.salary > avg(salary) over (partition by e.department_id) then e.rowid end vw_col_4
from employees e, departments d
where e.department_id = d.department_id)
where vw_col_4 is not null;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

위 실행계획과 결과 똑같이 나온다.

그래서

★ _remove_agg_subquery 파라미터는

- 상호 관련 서브쿼리절 (correlate subquery) 을 사용할 때, 분석함수를 사용하여 서브쿼리를 제거한다. (10g 버전부터)

6교시

★ 옵티마이저 optimizer

- 사용자가 요청한 SQL문을 가장 효율적이고 빠르게 수행할 수 있도록

최저 비용의 처리 경로를 선택해주는 엔진

1. 룰 베이스 옵티마이저 Rule Based Optimizer, RBO

- 10g 버전부터 지원 중단

- 미리 정해진 규칙에 의한 순위에 따라 실행계획을 결정 (순위 결정 방식 - 참고로 1순위는 rowid scan)

- 데이터에 대한 통계 내지는 실제 SQL 문을 수행할 경우에 소요될 비용을 고려하지 않는다.

예: 인덱스가 있으면 무조건 사용한다. (테이블의 크기나 인덱스 사용 시에 효율이 어떨지 등은 안중에도 없다)

- 조건절 컬럼에 인덱스가 있으면 무조건 인덱스를 사용한다. (분포도 참고하지 않는다)

- order by 절에 사용된 컬럼에 인덱스가 걸려 있으면 무조건 인덱스 스캔한다.

- 부등호 조건의 인덱스보다는 between and 조건의 인덱스가 우선한다.

예: select *

from emp

where department_id >= 100

and salary between 1000 and 10000;

부등호를 사용한 department_id 컬럼 인덱스보다 between and 연산자를 쓴 salary 컬럼의 인덱스가 우선한다.

salary 컬럼 인덱스를 쓰면 범위가 1000부터 10000까지라서 너무 넓은데도 무조건 저 컬럼의 인덱스가 우선함...

2. 코스트 베이스 옵티마이저 Cost Based Optimizer, CBO

- 오라클 7 버전부터 지원

- 비용(cost)을 기반으로 SQL 수행 최적화

- 실제 SQL을 수행할 때 소요될 비용을 예측하고, 그 값을 기준으로 실행계획을 설정한다.

- Object 에 대한 통계정보가 꼭 필요하다.

테이블 통계 (select * from dba_tables, sys.tab$)

컬럼 통계 (select * from dba_tab_columns, sys.col$)

인덱스 통계 (select * from dba_indexes, sys.ind$)

- System 에 대한 통계정보도 필요하다.

(현재 CPU 속도, 디스크 I/O 발생량, ...)

★ 시스템 통계

- 9i 버전부터, 하드웨어 및 애플리케이션 특성에 맞는 시스템 통계를 수집하고 활용

- I/O, CPU 같은 하드웨어 특성 고려

- CPU 속도, 평균적인 싱글 블록 I/O 속도, 멀티 블록 I/O 속도

- 평균적인 멀티 블록 I/O 시의 개수도 고려해야 한다. (I/O call 이 1번 발생할 때 몇 개의 블록이나 읽어들이는지)

- I/O 서브시스템의 최대 처리량 (throughput)

- 병렬 slave 의 평균 처리량 (throughput)

... 등등을 고려한다.

select * from sys.aux_stats$;

통계수집을 하지 않았을 경우에 아래의 정보를 사용한다.

noworkload 시스템 통계 (10g) : 명시적으로 시스템 통계 수집을 하지 않더라도 CPU 비용 모델을 사용할 수 있도록 하기 위해서 오라클이 내부적으로 시스템 통계를 설정한 것

PNAME PVAL1

----------------------------------------------------

CPUSPEEDNW 2734.11764705882
IOSEEKTIM 10
IOTFRSPEED 4096

CPUSPEEDNW: CPU 속도 (찾는 속도, 백만분의 1초)

IOSEEKTIM: 데이터를 읽으려고 디스크 헤드를 옮기는 데 걸리는 시간, 보통 5 - 15 밀리세컨드(ms)

IOTFRSPEED: OS 프로세스 I/O서브 시스템으로부터 데이터를 읽는 속도 (byte/ms)

workload 시스템 통계 (9i) : 실제 애플리켄이션에서 발생하는 부하를 측정한 값

SREADTIM: 싱글 블록 I/O 의 평균 속도 (ms, 천분의 1초)
MREADTIM: 멀티 블록 I/O 의 평균 속도 (ms, 천분의 1초)
CPUSPEED: 단일 CPU가 초당 수행할 수 있는 오퍼레이션 수 (백만분의 1초)
MBRC: 멀티 블록 I/O 발생시에 평균적으로 읽은 블록의 수
MAXTHR: I/O 서브시스템의 초당 최대 처리량 (byte / 초)
SLAVETHR: 병렬 slave 의 평균적인 초당 처리량 (byte / 초)

# NOWORKLOAD 시스템 통계 수집

SYS@ora11g> execute dbms_stats.gather_system_stats(gathering_mode => 'NOWORKLOAD')
PL/SQL procedure successfully completed.

select * from sys.aux_stats$;

시간대가 지금 시간으로 바뀌어 있는 것을 확인할 수 있다.

# WORKLOAD 시스템 통계 수집: 기간을 꼭 명시해야 한다. (interval=>1은 1분간 수집하겠다는 뜻, 분 단위)

execute dbms_stats.gather_system_stats(gathering_mode => 'interval', interval=>1)

PL/SQL procedure successfully completed.

얘는 바쁠 때 돌리는 것이 좋다. (낮 시간에, SQL 문이 많이 들어올 때)

select * from sys.aux_stats$;

# 시간 지정해서 통계 수집

execute dbms_stats.gather_system_stats(gathering_mode => 'interval', interval=>120)

execute dbms_stats.gather_system_stats(gathering_mode => 'start')

execute dbms_stats.gather_system_stats(gathering_mode => 'stop')

--- interval : 분단위 설정

※ 테스트 데이터베이스 (개발 DB) 시스템 통계 정보를 운영 시스템 통계 정보로 반영

begin
dbms_stats.set_system_stats('SREADTIM', 1.2);
dbms_stats.set_system_stats('MREADTIM', 1.3);
dbms_stats.set_system_stats('MBRC', 16);
dbms_stats.set_system_stats('CPUSPEED', 700);
dbms_stats.set_system_stats('MAXTHR', 40580544);
dbms_stats.set_system_stats('SLAVETHR', 32224);
end;
/

★ 옵티마이저 모드 optimizer_mod 5가지

SYS@ora11g> show parameter optimizer_mode

NAME TYPE VALUE
-----------------------------------------------------------------------------
optimizer_mode string ALL_ROWS

1. choose: 통계 정보가 있으면 all_rows, 없으면 rule (9i 버전까지의 기본값)

2. rule: RBO 사용 중이라는 뜻 (통계정보와 상관없다)

3. all_rows: 전체 처리율의 최적화 (10g 기본값) 전체 분포도가 고르다고 생각하고 돌아간다.

시간은 조금 걸리더라도 정확한 실행계획을 짠다.

4. first_rows: 최초 응답 속도를 최적화 (빨리 찾기)

5. first_rows_n (1, 10, 100, 1000): 1건/10건/100건/1000건을 찾는 최적화 방법 찾기 - 처음 결과가 나올 때까지의 시간을 줄이기 위해 최적화

(n 에는 1, 10, 100, 1000 이외의 수는 들어갈 수 없다 / n 이 100이 넘어가면 속도가 all_rows 와 다를 바가 없어진다.)

first_rows 와 first_rows_n 은 OLTP 성 환경에 적합하고

all_rows는 DSS(의사 결정 시스템) 와 batch 성 업무가 있는 환경에 적합하다.

하지만 옵티마이저를 전체 시스템 레벨로 바꾸는 것 (아래처럼) 은 위험부담이 있다.

~~옵티마이저 모드 시스템 레벨로 바꾸기~~ (하지마!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!)

alter system set optimizer_mode = first_rows;

함부로 바꿨다가 잘 돌아가던 SQL 문이 갑자기 이상하게 돌아가는 상황을 맞이하게 될 수도 있다.

옵티마이저 모드 세션 레벨로 바꾸기

alter session set optimizer_mode = first_rows;

하지만 보통 옵티마이저 모드 변경은 힌트를 가지고 유도한다.

select /*+ gather_plan_statistics all_rows */ *
from hr.employees
where department_id = 50;

select /*+ gather_plan_statistics all_rows(10) */ *
from hr.employees
where department_id = 50;

수행해보기 (I/O는 같은데 구체적인 실행 계획은 서로 다르다)

select /*+ gather_plan_statistics all_rows */ *
from hr.employees
where department_id = 50;

all_rows 힌트를 쓰면: 속도를 생각하지 않고 정확한 계획으로 수행해달라는 뜻

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

multi block I/O 발생, 래치가 덜 발생함

select /*+ gather_plan_statistics all_rows(10) */ *
from hr.employees
where department_id = 50;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

10건을 최대한 빨리 찾아주는 실행계획을 만들어서 수행

인덱스 범위 스캔이 일어남, 래치를 잡는 시간이 길다

옵티마이저 모드에 따라 실행계획이 다르다는 것을 알 수 있다.

SQL 문

↓

PARSER: 쿼리 트랜스포머 1. Query Transformer > 2. Estimator > 3. Plan Generator

1. Query Transformer: SQL을 최적화하기 쉬운 형태로 변환

2. Estimator : selectivity / cardinality / cost

- selectuvity : 전체 대상 행 중에 특정한 조건에 의해 선택될 것으로 예상되는 row의 비율, 1/NUM_DISTINCT

- cardinality : 특정 access 단계를 거치고 나서 출력될 것으로 예상되는 결과 건수

총 row의 수 X 선택도 (selectivity) = num_rows / num_distinct

- cost (특정 명령문을 실행하는 데 필요한, 표준화된 I/O 에 대한 옵티마이저의 최적 예측비용 /

_optimizer_cost_model = { io | cpu | choose })

- I/O 비용 (8i)

- CPU 비용 (10g)

- choose: 시스템 통계가 있으면 CPU 비용, 없으면 I/O 비용 (9i)

3. Plan Generator: 후보군이 될 만한 실행계획들을 생성하고, 그 중에서 수행비용이 가장 저렴한 계획을 리턴해준다.

7교시

HR@ora11g> drop table emp purge;
Table dropped.

HR@ora11g> create table emp as select * from employees;
Table created.

HR@ora11g> select t.num_rows, c.column_name, c.num_nulls, c.num_distinct, 1/c.num_distinct selectivity, num_rows/c.num_distinct cardinality
from user_tables t, user_tab_columns c
where t.table_name = 'EMP'
and c.table_name = t.table_name;

통계정보 수집

HR@ora11g> exec dbms_stats.gather_table_stats('HR','EMP',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
PL/SQL procedure successfully completed.

METHOD_OPT=>'FOR ALL COLUMNS SIZE 1' 는 히스토그램을 생성하지 않겠다는 뜻

HR@ora11g> select t.num_rows, c.column_name, c.num_nulls, c.num_distinct, 1/c.num_distinct selectivity, num_rows/c.num_distinct cardinality
from user_tables t, user_tab_columns c
where t.table_name = 'EMP'
and c.table_name = t.table_name;

HR@ora11g> select job_id, count(*) from emp group by job_id;

예상 실행계획 보기

HR@ora11g> explain plan for select * from emp where job_id = 'IT_PROG';
Explained.

예측된 row 의 수 보기 = 6 (이 값은 selectivity 와 cardinality 로부터 나옴)

HR@ora11g> select * from table(dbms_xplan.display);

selectivity = 1/num_distinct = 1/19 = .052631579

cardinality = num_rows * selectivity = 107 * .052631579 = 5.631578953 = 반올림하면 6

예상 실행계획 보기

HR@ora11g> explain plan for select * from emp where job_id = 'SA_REP';
Explained.

HR@ora11g> select * from table(dbms_xplan.display);

HR@ora11g> select t.num_rows, c.column_name, c.num_nulls, c.num_distinct, 1/c.num_distinct selectivity, num_rows/c.num_distinct cardinality
from user_tables t, user_tab_columns c
where t.table_name = 'EMP'
and c.table_name = t.table_name;

HR@ora11g> select job_id, count(*) from emp group by job_id;

JOB_ID 처럼 값의 분포도가 들쭉날쭉한 컬럼들은

히스토그램을 꼭 만들어야만 정확한 selectivity와 cardinality 를 확인할 수 있다.

통계정보 수집

HR@ora11g> exec dbms_stats.gather_table_stats('HR','EMP',METHOD_OPT=>'FOR ALL COLUMNS SIZE 1')
PL/SQL procedure successfully completed.

METHOD_OPT=>'FOR ALL COLUMNS SIZE 1' 는 히스토그램을 생성하지 않겠다는 뜻

이번에는 히스토그램을 만들어보자.

HR@ora11g> exec dbms_stats.gather_table_stats('HR','EMP',METHOD_OPT=>'FOR COLUMNS SIZE 20 JOB_ID')
PL/SQL procedure successfully completed.

HR@ora11g> select column_name, num_distinct, num_buckets, histogram
from user_tab_col_statistics
where table_name = 'EMP';

FREQUENCY: 각각의 값 별로 빈도수를 저장하는 도수분포 히스토그램

HR@ora11g> explain plan for select * from emp where job_id = 'IT_PROG';
Explained.

HR@ora11g> select * from table(dbms_xplan.display);

HR@ora11g> explain plan for select * from emp where job_id = 'SA_REP';
Explained.

HR@ora11g> select * from table(dbms_xplan.display);

값의 분포도가 좋지 않은 컬럼에 변수처리가 되어 있으면 어떤 일이 일어나는지 한번 보자.

(스포일러: 히스토그램을 무시한다)

HR@ora11g> explain plan for select * from emp where job_id = :B1;
Explained.

HR@ora11g> select * from table(dbms_xplan.display);

히스토그램이 있는 컬럼에 바인드변수를 설정하는 실행계획은 좋지 않다.

값의 분포도에 맞지 않는 실행계획을 수행하는 경우가 발생할 수도 있다.

HR@ora11g> explain plan for select * from emp where job_id > :B1;
Explained.

HR@ora11g> select * from table(dbms_xplan.display);

위에서는

# selectivity = 1/num_distinct

# cardinality = num_rows * selectivity

라고 했는데

# selectivity = 0.05 (5%) --- 오라클이 도저히 selectivity 계산할 수 없을 경우에는 5%로 계산한다.

# cardinality = num_rows * selectivity

HR@ora11g> explain plan for select * from emp where salary > :B1;
Explained.

HR@ora11g> select * from table(dbms_xplan.display);

HR@ora11g> explain plan for select * from emp where salary between :B1 and :B2;
Explained.

HR@ora11g> select * from table(dbms_xplan.display);

# selectivity = 0.025

# cardinality = num_rows * selectivity

'오라클 퍼포먼스 튜닝' 카테고리의 다른 글

2024년 2월 28일 (0)	2024.02.28
2024년 2월 27일 (1)	2024.02.27
2024년 2월 15일 (1)	2024.02.15
2024년 2월 14일 5교시 + 6교시 (0)	2024.02.14
2024년 2월 14일 4교시 (0)	2024.02.14