[Python] Tip - 최적화하기 전에 프로파일

언어/파이썬 & 장고 2017. 1. 15. 20:00

파이썬의 동적 특징은 런타임 성능에서 놀랄 만한 동작을 보여줍니다. 느릴 것이라고 예상한 연산이 실제로는 엄청나게 빠릅니다(문자열 처리, 제너레이터). 빠를 것이라고 예상한 언어늬 특성은 실제로 매우 느립니다(속성 접근, 함수 호출). 파이썬 프로그램을 느리게 만드는 요인이 불분명할 수도 있습니다.

가장 좋은 방법은 최적화하기 전에 직관을 무시하고 직접 프로그램의 성능을 측정해보는 것입니다. 파이썬은 프로그램의 어느 부분이 실행 시간을 소비하는지 파악할 수 있도록 내장 프로파일러(profiler)를 제공합니다. 프로파일러를 이용하면 문제의 가장 큰 원인에 최적화 노력을 최대한 집중할 수 있고, 속도에 영향을 주지 않는 부분은 무시할 수 있습니다.

예를 들어 프로그램의 알고리즘이 느린 이유를 알고 싶다고 가정합니다. 다음은 삽입 정렬로 데이터 리스트를 정렬하는 함수입니다.

def insertion_sort(data):
    result = []
    for value in data:
        insert_value(result, value)
    return result


# 삽입 정렬의 핵심메커니즘은 각 데이터의 삽입 지점을 찾는 함수임.
# 아래는 극히 비효율적인 insert_value 함수로 입력 배열을 순차적으로 스캔
def insert_value(array, value):
    for i, existing in enumerate(array):
        if existing > value:
            array.insert(i, value)
            return
    array.append(value)


# insertion_sort와 insert_value를 프로파일하려고 난수로 구성된 데이터 집ㅈ합을 생성하고, 프로파일러에 넘길 test 함수를 정의

from random import randint

max_size = 10 ** 4
data = [randint(0, max_size) for _ in range(max_size)]
test = lambda: insertion_sort(data)

파이썬은 두 가지 내장 프로파일러를 제공하는데, 하나는 순수 파이썬(profile)이며 다른 하나는 C 확장 모듈(cProfile)입니다. cProfile은 프로파일링 동안에 프로그램의 성능에 미치는 영향을 최소화할 수 있어서 더 좋습니다. 순수 파이썬 프로파일러는 결과를 왜곡할 수도 있을 만큼 부하가 큽니다.

파이썬 프로그램을 프로파일할 때 측정 대상이 코드 자체이지 외부 시스템이 아니라는 점을 명확하게 해야 합니다. 네트워크나 디스크의 리소스에 접근하는 함수를 주의해야합니다. 이런 함수는 하부 시스템이 느리기 때문에 프로그램의 실행 시간에 큰 영향을 줄 수 있습니다.
프로그램에서 이처럼 느린 리소스의 지연을 막으려고 캐시를 사용한다면, 프로파일링을 시작하기 전에 캐시가 적절히 동작하도록 준비해야 합니다.

cProfile 모듈의 Profile 객체를 생성하고 runcall 메서드로 테스트 함수를 실행해보겠습니다.

from cProfile import Profile
from pstats import Stats


def insertion_sort(data):
    result = []
    for value in data:
        insert_value(result, value)
    return result


# 삽입 정렬의 핵심메커니즘은 각 데이터의 삽입 지점을 찾는 함수임.
# 아래는 극히 비효율적인 insert_value 함수로 입력 배열을 순차적으로 스캔
def insert_value(array, value):
    for i, existing in enumerate(array):
        if existing > value:
            array.insert(i, value)
            return
    array.append(value)


# insertion_sort와 insert_value를 프로파일하려고 난수로 구성된 데이터 집ㅈ합을 생성하고, 프로파일러에 넘길 test 함수를 정의

from random import randint

max_size = 10 ** 4
data = [randint(0, max_size) for _ in range(max_size)]
test = lambda: insertion_sort(data)


profiler = Profile()
profiler.runcall(test)

# 테스트 함수의 실행이 끝나면 내장 모듈 pstats와 Stats 클래스로 함수의 성능 통계를 뽑을 수 있음.
# Stats 객체의 다양한 메서드를 이용하면 프로파일 정보를 선택하고 정렬하는 방법을 조절해서 관심 있는 정보만 볼 수 있음

stats = Stats(profiler)
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats()

# 결과는 함수로 구성된 정보의 테이블. 데이터 샘플은 runcall 메서드가 실행되는 동안 프로파일러가 활성화되어 있을 때만 얻어옴

# 결과
#          20003 function calls in 2.074 seconds
# 
#    Ordered by: cumulative time
# 
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.000    0.000    2.074    2.074 my_module.py:28(<lambda>)
#         1    0.003    0.003    2.074    2.074 my_module.py:5(insertion_sort)
#     10000    2.056    0.000    2.071    0.000 my_module.py:14(insert_value)
#      9992    0.015    0.000    0.015    0.000 {method 'insert' of 'list' objects}
#         8    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

프로파일러 통계 컬럼의 의미를 간략하게 알아보겠습니다.

ncalls: 프로파일링 주기 동안 함수 호출 횟수
tottime: 함수가 실행되는 동안 소비한 초 단위의 시간으로, 다른 함수 호출을 실행하는 데 걸린 시간은 배제
tottime percall: 함수를 호출하는 데 걸린 평균 시간이며 초 단위. 다른 함수의 호출 시간은 배제. tottime을 ncalls로 나눈 값
cumtime: 함수를 실행하는 데 걸린 초 단위 누적 시간이며, 다른 함수를 호출하는 데 걸린 시간도 포함
cutime percall: 함수를 호출할 때마다 걸린 시간에 대한 초 단위 편균 시간이며, 다른 함수를 호출하는 데 걸린 시간도 포함. cumtime을 ncalls로 나눈 값

앞의 프로파일러 통계 테이블을 보면 테스트에서 CPU를 가장 많이 사용하는 부분은 insert_value 함수에서 소비한 누적 시간입니다. 이번에는 내장 모듈 bisect를 사용하도록 insert_value 함수를 재정의합니다.

from cProfile import Profile
from pstats import Stats
from bisect import bisect_left


def insertion_sort(data):
    result = []
    for value in data:
        insert_value(result, value)
    return result


def insert_value(array, value):
    i = bisect_left(array, value) # 수정
    array.insert(i, value)


from random import randint

max_size = 10 ** 4
data = [randint(0, max_size) for _ in range(max_size)]
test = lambda: insertion_sort(data)


profiler = Profile()
profiler.runcall(test)

# 테스트 함수의 실행이 끝나면 내장 모듈 pstats와 Stats 클래스로 함수의 성능 통계를 뽑을 수 있음.
# Stats 객체의 다양한 메서드를 이용하면 프로파일 정보를 선택하고 정렬하는 방법을 조절해서 관심 있는 정보만 볼 수 있음

stats = Stats(profiler)
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats()

# 결과
#    30003 function calls in 0.026 seconds
# 
#    Ordered by: cumulative time
# 
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.000    0.000    0.026    0.026 my_module.py:22(<lambda>)
#         1    0.002    0.002    0.026    0.026 my_module.py:6(insertion_sort)
#     10000    0.004    0.000    0.023    0.000 my_module.py:13(insert_value)
#     10000    0.013    0.000    0.013    0.000 {method 'insert' of 'list' objects}
#     10000    0.006    0.000    0.006    0.000 {built-in method bisect_left}
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

프로파일러를 실행하여 새 프로파일러 통계를 생성했습ㅂ니다. 새로운 함수는 더 빨라졌고 누적 시간은 이전의 insert_value 함수에 비해 거의 100배 이상 줄었습니다. 때로는 전체 프로그램을 프로파일할 때 공통 유틸리티 함수에서 대부분의 실행 시간을 소비할 수도 있습니다. 프로파일러의 기본 출력은 유틸리티 함수가 프로그램의 다른 부분에서 얼마나 많이 호출되는지 보여주지 않기 때문에 이해하기 어렵습니다.

예를 들어 다음 my_utility 함수는 프로그램에 있는 다른 두 함수에서 반복적으로 호출됩니다.

from cProfile import Profile
from pstats import Stats


def my_utility(a, b):
    pass


def first_func():
    for _ in range(1000):
        my_utility(4, 5)


def second_func():
    for _ in range(10):
        my_utility(1, 3)


def my_program():
    for _ in range(20):
        first_func()
        second_func()


test = lambda: my_program()

profiler = Profile()
profiler.runcall(test)


stats = Stats(profiler)
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats()

# 결과
#    20243 function calls in 0.005 seconds
# 
#    Ordered by: cumulative time
# 
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.000    0.000    0.005    0.005 upload_cron.py:25(<lambda>)
#         1    0.000    0.000    0.005    0.005 upload_cron.py:19(my_program)
#        20    0.004    0.000    0.005    0.000 upload_cron.py:9(first_func)
#     20200    0.001    0.000    0.001    0.000 upload_cron.py:5(my_utility)
#        20    0.000    0.000    0.000    0.000 upload_cron.py:14(second_func)
#         1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

my_utility 함수가 대부분의 실행 시간을 소비하는 원인이라는 게 명확하지만, 이 함수가 이렇게 많이 호출되는 이유는 명확하게 알기 어렵습니다. 프로그램 코드에서 찾는다면 my_utility가 여러 번 호출되고 있다는 사실을 알겠지만 여전히 이해하기힘듭니다.

파이썬 프로파일러는 이 문제를 처리하려고 각 함수의 프로파일링 정보에 기여하는 호출자를 찾는 방법을 제공합니다.

stats.print_callers()

이 프로파일러 통계 테이블은 호출된 함수를 왼쪽에 보여주며, 누가 이런 호출을 하는지를 오른쪽에 보여줍니다. 다음 통계 테이블은 my_utility가 first_func에서 가장 많이 사용되었음을 명확하게 보여줍니다.

 Ordered by: cumulative time
Function                                          was called by...
                                                      ncalls  tottime  cumtime
test.py:25(<lambda>)                       <- 
test.py:19(my_program)                     <-       1    0.000    0.005  test.py:25(<lambda>)
test.py:9(first_func)                      <-      20    0.004    0.005  test.py:19(my_program)
test.py:5(my_utility)                      <-   20000    0.001    0.001  test.py:9(first_func)
                                                         200    0.000    0.000  test.py:14(second_func)
test.py:14(second_func)                    <-      20    0.000    0.000  test.py:19(my_program)
{method 'disable' of '_lsprof.Profiler' objects}  <-

요약

성능 저하를 일으키는 원인이 때로는 불분명하므로 파이썬 프로그램을 최적화하기 전에 프로파일을 해야 함
cProfile이 더 정확한 프로파일링 정보를 제공하므로 profile 모듈 대신에 cProfile을 사용
Profile 객체의 runcall 메서드는 함수 호출 트리를 프로파일하는 데 필요한 모든 기능을 제공
Stats 객체는 프로그램 성능을 이해하는 데 필요한 프로파일링 정보를 선택하고 출력하는 기능을 제공

ABOUT ME

불곰 불곰

요약

티스토리툴바

ABOUT ME

요약

관련글 관련글 더보기

티스토리툴바