글을 작성하게 된 계기

로컬 환경에서 자바의 ArrayList와 LinkedList 성능 비교를 하며, 결과를 정리하기 위해 글을 작성하게 되었습니다.

  
$ system_profiler SPHardwareDataType

Hardware:

    Hardware Overview:
      Model Name: MacBook Pro
      Model Identifier: MacBookPro16,1
      Processor Name: 8-Core Intel Core i9
      Processor Speed: 2.3 GHz
      Number of Processors: 1
      Total Number of Cores: 8
      L2 Cache (per Core): 256 KB
      L3 Cache: 16 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 1968.100.17.0.0 (iBridge: 20.16.4252.0.0,0)
      OS Loader Version: 577~129
      Serial Number (system): ${SERIAL_NUMBER}
      Hardware UUID: ${HARDWARE_UUID}
      Provisioning UDID: ${PROVISIONING_UUID}
      Activation Lock Status: Enabled
      

결과는 각 컴퓨터의 사양 및 환경에 따라 달라질 수 있습니다.

1. 조회

인덱스가 어느 위치에 있는지에 따라 성능이 차이가 날 수 있지만, 특정 인덱스의 데이터를 조회하는 경우 ArrayList는 LinkedList보다 빠른 성능을 보입니다.

  
public class Main {

    private static final int LIMIT = 100_000_000;
    private static final List<Integer> arrayList = new ArrayList<>();
    private static final List<Integer> linkedList = new LinkedList<>();

    public static void main(String[] args) throws Exception {
        initData(arrayList, linkedList);
        print(
            ">>>>> Access Data",
            accessData(arrayList, 800_000),
            accessData(linkedList, 800_000)
        );
    }

    private static void print(
        String title,
        double arrayListMSeconds,
        double linkedListMSeconds
    ) {
        System.out.println(title);
        System.out.println("ArrayList:  " + arrayListMSeconds + "m/s");
        System.out.println("LinkedList: " + linkedListMSeconds + "m/s");
        System.out.println();
        arrayList.clear();
        linkedList.clear();
    }

    public static double accessData(
        List<Integer> list,
        int index
    ) {
        long start = System.nanoTime();
        list.get(index);
        long end = System.nanoTime();
        return (end - start) / 1_000_000.0;
    }
}

ArrayList는 내부적으로 배열 을 사용해 데이터를 저장하므로 인덱스를 통한 무작위 접근이 가능하며, 이로 인해 데이터 조회 속도가 빠릅니다. ArrayList의 탐색은 O(1)의 시간 복잡도를 가지는데, 이는 ArrayList가 연속적 메모리 구조를 사용하기 때문입니다. 이 연속적인 구조는 컴퓨터의 캐시 메모리와 잘 호환되어 데이터 조회 성능을 높여줍니다.

  
public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    @java.io.Serial
    private static final long serialVersionUID = 8683452581122892189L;
    private static final int DEFAULT_CAPACITY = 10;
    private static final Object[] EMPTY_ELEMENTDATA = {};
    private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
    transient Object[] elementData;

    ......

}

CPU는 캐시를 라인 단위로 읽어오며, 연속적인 값을 읽어올 수록 성능이 더 좋아집니다.

반면 LinkedList는 연결 리스트 구조를 사용하며, 특정 인덱스의 데이터에 접근하기 위해서는 시작 노드부터 차례대로 탐색해야 합니다. 따라서 데이터 조회에서 최악의 경우 O(n)의 시간 복잡도를 가집니다.

  
public class LinkedList<E>
    extends AbstractSequentialList<E>
    implements List<E>, Deque<E>, Cloneable, java.io.Serializable
{
      
      ......

      Node<E> node(int index) {
        // assert isElementIndex(index);

        if (index < (size >> 1)) {
            Node<E> x = first;
            for (int i = 0; i < index; i++)
                x = x.next;
            return x;
        } else {
            Node<E> x = last;
            for (int i = size - 1; i > index; i--)
                x = x.prev;
            return x;
        }
    }

    ......

}

LinkedList의 노드들은 메모리 내의 불연속적 위치에 존재할 수 있기 때문에 조회 과정에서 여러 메모리 주소를 점프할 수 있으며, 이로 인해 캐시 메모리의 효율성이 감소할 수 있습니다. 따라서 특정 인덱스의 데이터를 조회하는 경우 ArrayList는 LinkedList보다 빠른 성능을 보여줍니다.

물론 이는 인덱스가 어느 위치에 있는지에 따라 성능이 차이가 날 수 있기 때문에 이를 고려해야 합니다.

2. 순차 추가

순차적으로 데이터를 추가하는 상황에서는 일반적으로 ArrayList가 더 좋은 성능을 보입니다. 이는 구현 방식과 메모리 사용 방식에서 차이를 가집니다.

  
public class Main {

    private static final int LIMIT = 100_000_000;
    private static final List<Integer> arrayList = new ArrayList<>(100_000_000);
    private static final List<Integer> linkedList = new LinkedList<>();

    public static void main(String[] args) throws Exception {
        print(
            ">>>>> Sequential Addition",
            addSequentially(arrayList),
            addSequentially(linkedList));
    }

    private static void print(
        String title,
        double arrayListMSeconds,
        double linkedListMSeconds
    ) {
        System.out.println(title);
        System.out.println("ArrayList: " + arrayListMSeconds + "m/s");
        System.out.println("LinkedList: " + linkedListMSeconds + "m/s");
        System.out.println();
        arrayList.clear();
        linkedList.clear();
    }
}

ArrayList는 내부적으로 동적 배열 을 사용해 데이터를 저장합니다. 배열은 필요에 따라 크기를 조정할 수 있으나, 기본적으로 모든 요소는 연속된 메모리 공간에 위치합니다. 이 연속된 메모리 구조는 데이터 추가에 있어서 매우 효율적인데, 만약 용량을 충분히 설정해 두면 데이터의 빠른 추가가 가능해집니다.

  
public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable {


    ......
    public void add(int index, E element) {
        rangeCheckForAdd(index);
        modCount++;
        final int s;
        Object[] elementData;
        
        if ((s = size) == (elementData = this.elementData).length) {
            elementData = grow();
        }

        System.arraycopy(
          elementData,
          index,
          elementData,
          index + 1,
          s - index
        );
        elementData[index] = element;
        size = s + 1;
    }

    ......

}

만약 기존 용량을 초과해서 데이터를 저장해야 할 경우, 일반적으로 현재 크기의 1.5배로 배열의 크기를 자동으로 늘립니다. 배열 크기가 계속해서 증가할 때 성능이 감소할 수 있습니다.

  
public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    ......

    private Object[] grow(int minCapacity) {
        int oldCapacity = elementData.length;
        if (oldCapacity > 0 || elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {

            // 1.5배 확장
            int newCapacity = ArraysSupport.newLength(
                    oldCapacity,
                    minCapacity - oldCapacity,
                    oldCapacity >> 1
            );
            return elementData = Arrays.copyOf(elementData, newCapacity);
        } else {
            return elementData = new Object[Math.max(DEFAULT_CAPACITY, minCapacity)];
        }
    }

    ......

}

ArrayList에는 로드 팩터(load factor) 가 직접 적용되지 않지만 이에 대해 간략히 살펴보겠습니다. 로드팩터는 데이터 구조의 용량 대비 현재 데이터가 차지하는 비율을 나타내는 척도 입니다. 이는 데이터 구조가 언제 내부적으로 메모리 사이즈를 확장해야 할지 결정하는 기준으로 작용합니다. 일반적인 기본값은 0.75f로, 75%가 차게 되면 사이즈 확장을 하는 것을 의미합니다.

A load factor is a critical statistic of a hash table, and is defined as follows.

이는 해시 내부에 존재하는데, 생성자를 보면 loadFactor를 함께 받는 것을 볼 수 있습니다.

  
public class HashMap<K,V> extends AbstractMap<K,V>
    implements Map<K,V>, Cloneable, Serializable {

    ......

    final float loadFactor;

    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

    ......

}

반면 ArrayList는 로드 팩터라는 개념이 직접 존재하지는 않는다고 했는데, 내부에는 capacity만 존재합니다. 즉 일정 임계치를 넘어가면 내부 배열이 1.5배 증가하지만, 여기에 로드 팩터가 적용되는 것은 아닙니다. 굳이 따지면 로드 팩터가 1로 설정됐다고 보면 될 것 같습니다.

  
public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    ......

    private Object[] grow(int minCapacity) {
        int oldCapacity = elementData.length;
        if (oldCapacity > 0 || elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {

            // 1.5배 확장
            int newCapacity = ArraysSupport.newLength(
                    oldCapacity,
                    minCapacity - oldCapacity,
                    oldCapacity >> 1
            );
            return elementData = Arrays.copyOf(elementData, newCapacity);
        } else {
            return elementData = new Object[Math.max(DEFAULT_CAPACITY, minCapacity)];
        }
    }

    ......

}

임계점(Threshold) 은 로드팩터와 함께 사용되는 개념으로, 로드팩터를 사용하여 계산된 값입니다. 이는 자료구조가 내부적으로 사이즈를 확장해야 하는 시점을 나타내는데, 계산 방식은 임계점 = 현재 용량 * 로드팩터 입니다. 데이터가 임계점에 도달하면 사이즈가 확장됩니다.

Threshold for rehashing is calculated by multiplying capacity and load factor.

여담으로 StringBuilder는 내부적으로 문자열을 저장하기 위한 char 배열을 사용합니다. 그러나 StringBuilder는 ArrayList와 마찬가지로 로드팩터와 임계점의 개념을 사용하지는 않으며, 대신 내부 배열이 찼을 때 해당 배열의 크기를 늘리는 로직이 있습니다.

  
abstract class AbstractStringBuilder implements Appendable, CharSequence {

      ......

      private void ensureCapacityInternal(int minimumCapacity) {
        int oldCapacity = value.length >> coder;
        if (minimumCapacity - oldCapacity > 0) {
            value = Arrays.copyOf(value,
                    newCapacity(minimumCapacity) << coder);
        }
    }

    ......

}

3. 중간 삽입

데이터를 중간에 삽입하는 경우 일반적으로 LinkedList가 성능이 더 좋습니다. 단, 중간 삽입이 아닌 마지막 위치에 삽입할 경우 ArrayList가 LinkedList보다 빠를 수 있습니다.

  
public class Main {

    private static final int LIMIT = 100_000_000;

    // 배열 복사가 일어나지 않는 경우 결과가 달라질 수 있습니다.
    private static final List<Integer> arrayList = new ArrayList<>();
    private static final List<Integer> linkedList = new LinkedList<>();

    public static void main(String[] args) throws Exception {
        initData(arrayList, linkedList);
        print(
            ">>>>> In The Middle Addition",
            addInTheMiddle(arrayList),
            addInTheMiddle(linkedList)
        );
    }

    private static void print(
        String title,
        double arrayListMSeconds,
        double linkedListMSeconds
    ) {
        System.out.println(title);
        System.out.println("ArrayList:  " + arrayListMSeconds + "m/s");
        System.out.println("LinkedList: " + linkedListMSeconds + "m/s");
        System.out.println();
        arrayList.clear();
        linkedList.clear();
    }

    public static double addInTheMiddle(List<Integer> list) {
        long start = System.nanoTime();
        list.add(500_000, 1);
        long end = System.nanoTime();
        return (end - start) / 1_000_000.0;
    }
}

LinkedList에서 중간에 원소를 추가할 때 해당 노드의 연결만 변경하면 되므로 비교적 간단하지만, ArrayList에서 중간에 원소를 추가할 때는 그 위치 이후의 모든 원소를 뒤로 이동시키기 때문에 상대적으로 비용이 더 발생하기 때문입니다.

  
public class ArrayList<E> extends AbstractList<E>
        implements List<E>, RandomAccess, Cloneable, java.io.Serializable
{
    ......

    private Object[] grow(int minCapacity) {
        int oldCapacity = elementData.length;
        if (oldCapacity > 0 || elementData != DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {

            // 1.5배 확장
            int newCapacity = ArraysSupport.newLength(
                    oldCapacity,
                    minCapacity - oldCapacity,
                    oldCapacity >> 1
            );
            return elementData = Arrays.copyOf(elementData, newCapacity);
        } else {
            return elementData = new Object[Math.max(DEFAULT_CAPACITY, minCapacity)];
        }
    }

    ......

}

데이터를 중간에 삽입하기 위해 탐색하는 비용도 발생하기 때문에 데이터가 어느 위치에 삽입되는지도 고려해야 합니다.

4. 순차 삭제

이번 테스트는 데이터를 10만 건으로 축소해 보겠습니다. 데이터를 순차삭제하는 과정에서 ArrayList는 최악의 경우 배열 이동으로 인해 O(n^2) 의 시간 복잡도를 가질 수 있기 때문입니다. 이렇게 해서 앞에서부터 삭제하는 경우, 뒤에서부터 삭제하는 두 가지 케이스를 살펴보겠습니다. 먼저 앞에서부터 삭제하는 경우입니다.

  
public class Main {

    private static final int LIMIT = 100_000;
    private static final List<Integer> arrayList = new ArrayList<>();
    private static final List<Integer> linkedList = new LinkedList<>();

    public static void main(String[] args) throws Exception {
        initData(arrayList, linkedList);
        print(
            ">>>>> Remove Sequentially From Start",
            removeSequentiallyFromStart(arrayList),
            removeSequentiallyFromStart(linkedList)
        );
    }

    private static void print(
        String title,
        double arrayListMSeconds,
        double linkedListMSeconds
    ) {
        System.out.println(title);
        System.out.println("ArrayList:  " + arrayListMSeconds + "m/s");
        System.out.println("LinkedList: " + linkedListMSeconds + "m/s");
        System.out.println();
        arrayList.clear();
        linkedList.clear();
    }

    public static double removeSequentiallyFromStart(List<Integer> list) {
        long start = System.nanoTime();
        for (int index = 0; index < list.size(); index++) {
            list.remove(index);
        }
        long end = System.nanoTime();
        return (end - start) / 1_000_000.0;
    }
}

데이터가 10만 건이어서 그런지 ArrayList가 더 빠른 결과를 보입니다. 이론상 ArrayList가 더 느릴 것 같았는데 조금 의외의 결과였습니다.

ArrayList에서 원소를 앞에서부터 순차적으로 삭제할 경우, 삭제된 원소 이후의 모든 원소들을 앞으로 한 칸씩 이동시킵니다. 이로 인해 매번 삭제할 때마다 O(n)의 시간이 걸리며, 따라서 O(n^2)의 시간 복잡도를 가지게 됩니다. 반면 LinkedList의 원소를 앞에서부터 순차적으로 원소를 삭제할 경우 탐색 O(1), 실제 삭제 작업 O(1)의 복잡도를 가지며, 전체 작업은 O(n)의 시간 복잡도를 가지게 됩니다.

끝에서부터 삭제할 경우 원소를 이동시킬 필요가 없기에 ArrayList가 빠른 것을 볼 수 있습니다.

  
public class Main {

    // 데이터 1억 건으로 복구
    private static final int LIMIT = 100_000_000;
    private static final List<Integer> arrayList = new ArrayList<>();
    private static final List<Integer> linkedList = new LinkedList<>();

    public static void main(String[] args) throws Exception {
        initData(arrayList, linkedList);
        print(
            ">>>>> Remove Sequentially From End",
            removeSequentiallyFromStart(arrayList),
            removeSequentiallyFromStart(linkedList)
        );
    }

    private static void print(
        String title,
        double arrayListMSeconds,
        double linkedListMSeconds
    ) {
        System.out.println(title);
        System.out.println("ArrayList:  " + arrayListMSeconds + "m/s");
        System.out.println("LinkedList: " + linkedListMSeconds + "m/s");
        System.out.println();
        arrayList.clear();
        linkedList.clear();
    }

    public static double removeSequentiallyFromEnd(List<Integer> list) {
        long start = System.nanoTime();
        for (int index = list.size() - 1; index >= 0; index--) {
            list.remove(index);
        }
        long end = System.nanoTime();
        return (end - start) / 1_000_000.0;
    }
}

5. 중간 삭제

이론적으로 모두 O(n) 의 시간 복잡도를 가지지만, 일반적으로 LinkedList가 ArrayList보다 빠릅니다. LinkedList가 원소의 참조만 변경하면 되므로 ArrayList의 데이터 이동 작업에 비해 더 빠르기 때문입니다. 하지만 삭제하려는 노드가 어디에 위치하였는지에 따라 다르므로 이를 고려해야 합니다.

  
public class Main {

    private static final int LIMIT = 100_000_000;
    private static final List<Integer> arrayList = new ArrayList<>(100_000_000);
    private static final List<Integer> linkedList = new LinkedList<>();

    public static void main(String[] args) throws Exception {
        initData(arrayList, linkedList);
        print(
            ">>>>> Remove In The Middle",
            removeInTheMiddle(arrayList),
            removeInTheMiddle(linkedList)
        );
    }

    private static void print(
        String title,
        double arrayListMSeconds,
        double linkedListMSeconds
    ) {
        System.out.println(title);
        System.out.println("ArrayList:  " + arrayListMSeconds + "m/s");
        System.out.println("LinkedList: " + linkedListMSeconds + "m/s");
        System.out.println();
        arrayList.clear();
        linkedList.clear();
    }

    public static double removeInTheMiddle(List<Integer> list) {
        long start = System.nanoTime();
        list.remove(50_000_000);
        long end = System.nanoTime();
        return (end - start) / 1_000_000.0;
    }
}

6. 정리

ArrayList와 LinkedList가 가지는 특성으로 인해 특정 상황, 환경에서 성능상 차이가 존재합니다. 다만 이는 현재 상황에 따라 변할 수 있기 때문에 절대적인 지표가 아닙니다.

ArrayList와 LinkedList의 성능 비교

글을 작성하게 된 계기

1. 조회

2. 순차 추가

3. 중간 삽입

4. 순차 삭제

5. 중간 삭제

6. 정리

Further Reading

serialVersionUID의 변경 조건은 어떻게 될까?

Exception

Lombok의 동작 원리