线程池的 10 个坑 - 南巷清风

转载：https://mp.weixin.qq.com/s/BhfjVuMwXkFFQhgvw2KypA

日常开发中，为了更好管理线程资源，减少创建线程和销毁线程的资源损耗，我们会使用线程池来执行一些异步任务。但是线程池使用不当，就可能会引发生产事故。

线程池默认使用无界队列，任务过多导致OOM

JDK 为开发者提供了线程池的实现类，我们基于 Executors 组件，就可以快速创建一个线程池。日常工作中，一些小伙伴为了开发效率，反手就用Executors 新建个线程池。写出类似以下的代码：

public class NewFixedTest {

    public static void main(String[] args) {
        ExecutorService executor = Executors.newFixedThreadPool(10);
        for (int i = 0; i < Integer.MAX_VALUE; i++) {
            executor.execute(() -> {
                try {
                    Thread.sleep(10000);
                } catch (InterruptedException e) {
                    //do nothing
                }
            });
        }
    }
}

使用 newFixedThreadPool 创建的线程池，是会有坑的，它默认是无界的阻塞队列，如果任务过多，会导致 OOM 问题。运行一下以上代码，出现了 OOM 。

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:416)
 at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
 at com.example.dto.NewFixedTest.main(NewFixedTest.java:14)

这是因为 newFixedThreadPool 使用了无界的阻塞队列的 LinkedBlockingQueue ，如果线程获取一个任务后，任务的执行时间比较长(比如，上面 demo 代码设置了10秒)，会导致队列的任务越积越多，导致机器内存使用不停飙升，最终出现 OOM 。

看下 newFixedThreadPool 的相关源码，是可以看到一个无界的阻塞队列的，如下：

//阻塞队列是LinkedBlockingQueue，并且是使用的是无参构造函数
public static ExecutorService newFixedThreadPool(int nThreads) {
    return new ThreadPoolExecutor(nThreads, nThreads,
                                  0L, TimeUnit.MILLISECONDS,
                                  new LinkedBlockingQueue<Runnable>());
}
    
//无参构造函数，默认最大容量是Integer.MAX_VALUE，相当于无界的阻塞队列的了
public LinkedBlockingQueue() {
    this(Integer.MAX_VALUE);
}

因此，工作中，建议大家自定义线程池，并使用指定长度的阻塞队列。

线程池创建线程过多，导致OOM

既然 Executors 组件创建出的线程池 newFixedThreadPool ，使用的是无界队列，可能会导致 OOM 。那么，Executors 组件还可以创建别的线程池，如 newCachedThreadPool ，我们用它也不行吗？

我们可以看下 newCachedThreadPool 的构造函数：

public static ExecutorService newCachedThreadPool() {
        return new ThreadPoolExecutor(0, Integer.MAX_VALUE,
                                      60L, TimeUnit.SECONDS,
                                      new SynchronousQueue<Runnable>());
}

它的最大线程数是 Integer.MAX_VALUE 。大家应该意识到使用它，可能会引发什么问题了吧。没错，如果创建了大量的线程也有可能引发 OOM ！

所以我们使用线程池的时候，还要当心线程创建过多，导致 OOM 问题。大家尽量不要使用newCachedThreadPool ，并且如果自定义线程池时，要注意一下最大线程数。

共享线程池，次要逻辑拖垮主要逻辑

要避免所有的业务逻辑共享一个线程池。比如你用线程池A来做登录异步通知，又用线程池A来做对账。如下图：

如果对账任务 checkBillService 响应时间过慢，会占据大量的线程池资源，可能直接导致没有足够的线程资源去执行 loginNotifyService 的任务，最后影响登录。就这样，因为一个次要服务，影响到重要的登录接口，显然这是绝对不允许的。因此，我们不能将所有的业务一锅炖，都共享一个线程池，因为这样做，风险太高了，犹如所有鸡蛋放到一个篮子里。应当做线程池隔离。

线程池拒绝策略的坑，使用不当导致阻塞

线程池主要有四种拒绝策略，如下：

AbortPolicy: 丢弃任务并抛出 RejectedExecutionException 异常。(默认拒绝策略)
DiscardPolicy：丢弃任务，但是不抛出异常。
DiscardOldestPolicy：丢弃队列最前面的任务，然后重新尝试执行任务。
CallerRunsPolicy：由调用方线程处理该任务。

如果线程池拒绝策略设置不合理，就容易有坑。我们把拒绝策略设置为 DiscardPolicy 或DiscardOldestPolicy 并且在被拒绝的任务，Future 对象调用 get() 方法,那么调用线程会一直被阻塞。

public class DiscardThreadPoolTest {

    public static void main(String[] args) throws ExecutionException, InterruptedException {
        // 一个核心线程，队列最大为1，最大线程数也是1.拒绝策略是DiscardPolicy
        ThreadPoolExecutor executorService = new ThreadPoolExecutor(1, 1, 1L, TimeUnit.MINUTES,
                new ArrayBlockingQueue<>(1), new ThreadPoolExecutor.DiscardPolicy());

        Future f1 = executorService.submit(()-> {
            System.out.println("提交任务1");
            try {
                Thread.sleep(3000);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        });

        Future f2 = executorService.submit(()->{
            System.out.println("提交任务2");
        });

        Future f3 = executorService.submit(()->{
            System.out.println("提交任务3");
        });

        System.out.println("任务1完成 " + f1.get());// 等待任务1执行完毕
        System.out.println("任务2完成" + f2.get());// 等待任务2执行完毕
        System.out.println("任务3完成" + f3.get());// 等待任务3执行完毕

        executorService.shutdown();// 关闭线程池，阻塞直到所有任务执行完毕

    }
}

运行结果：一直在运行中。。。

这是因为 DiscardPolicy 拒绝策略，是什么都没做，源码如下：

public static class DiscardPolicy implements RejectedExecutionHandler {
    /**
      * Creates a {@code DiscardPolicy}.
      */
    public DiscardPolicy() { }

    /**
      * Does nothing, which has the effect of discarding task r.
      */
    public void rejectedExecution(Runnable r, ThreadPoolExecutor e) {
    }
}

我们再来看看线程池 submit 的方法：

public Future<?> submit(Runnable task) {
    if (task == null) throw new NullPointerException();
    //把Runnable任务包装为Future对象
    RunnableFuture<Void> ftask = newTaskFor(task, null);
    //执行任务
    execute(ftask);
    //返回Future对象
    return ftask;
}
    
public FutureTask(Runnable runnable, V result) {
  this.callable = Executors.callable(runnable, result);
  this.state = NEW;  //Future的初始化状态是New
}

我们再来看看 Future 的 get() 方法：

  //状态大于COMPLETING，才会返回，要不然都会阻塞等待
  public V get() throws InterruptedException, ExecutionException {
        int s = state;
        if (s <= COMPLETING)
            s = awaitDone(false, 0L);
        return report(s);
    }
    
    FutureTask的状态枚举
    private static final int NEW          = 0;
    private static final int COMPLETING   = 1;
    private static final int NORMAL       = 2;
    private static final int EXCEPTIONAL  = 3;
    private static final int CANCELLED    = 4;
    private static final int INTERRUPTING = 5;
    private static final int INTERRUPTED  = 6;

阻塞的真相水落石出啦，FutureTask 的状态大于 COMPLETING 才会返回，要不然都会一直阻塞等待。又因为拒绝策略啥没做，没有修改 FutureTask 的状态，因此 FutureTask 的状态一直是 NEW，所以它不会返回，会一直等待。

这个问题，可以使用别的拒绝策略，比如 CallerRunsPolicy ，它让主线程去执行拒绝的任务，会更新 FutureTask 状态。如果确实想用 DiscardPolicy ，则需要重写 DiscardPolicy 的拒绝策略。

温馨提示，日常开发中，使用 Future.get() 时，尽量使用带超时时间的，因为它是阻塞的。

future.get(1, TimeUnit.SECONDS);

使用别的拒绝策略，就万无一失了吗？

不是的，如果使用 CallerRunsPolicy 拒绝策略，它表示拒绝的任务给调用方线程用，如果这是主线程，那会不会可能也导致主线程阻塞呢？总结起来，大家日常开发的时候，多一份心眼吧，多一点思考吧。

Spring内部线程池的坑

工作中，个别开发者，为了快速开发，喜欢直接用 Spring 的 @Async ，来执行异步任务。

@Async
public void testAsync() throws InterruptedException {
    System.out.println("处理异步任务");
    TimeUnit.SECONDS.sleep(new Random().nextInt(100));
}

Spring 内部线程池，其实是 SimpleAsyncTaskExecutor ，这玩意有点坑，它不会复用线程的，它的设计初衷就是执行大量的短时间的任务。有兴趣的小伙伴，可以去看看它的源码：

SimpleAsyncTaskExecutor.java

也就是说来了一个请求，就会新建一个线程！大家使用 Spring 的 @Async 时，要避开这个坑，自己再定义一个线程池。正例如下：

@Bean(name = "threadPoolTaskExecutor")
public Executor threadPoolTaskExecutor() {
    ThreadPoolTaskExecutor executor=new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(5);
    executor.setMaxPoolSize(10);
    executor.setThreadNamePrefix("tianluo-%d");
    // 其他参数设置
    return new ThreadPoolTaskExecutor();
}

使用线程池时，没有自定义命名

使用线程池时，如果没有给线程池一个有意义的名称，将不好排查回溯问题。这不算一个坑吧，只能说给以后排查埋坑。我还是单独把它放出来算一个点，因为个人觉得这个还是比较重要的。反例如下：

public class ThreadTest {

    public static void main(String[] args) throws Exception {
        ThreadPoolExecutor executorOne = new ThreadPoolExecutor(5, 5, 1, 
                TimeUnit.MINUTES, new ArrayBlockingQueue<Runnable>(20));
        executorOne.execute(()->{
            System.out.println("Hello，Java");
            throw new NullPointerException();
        });
    }
}

运行结果

Hello，Java
Exception in thread "pool-1-thread-1" java.lang.NullPointerException
 at com.example.dto.ThreadTest.lambda$main$0(ThreadTest.java:17)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)

可以发现，默认打印的线程池名字是 pool-1-thread-1 ，如果排查问题起来，并不友好。因此建议大家给自己线程池自定义个容易识别的名字。其实用 CustomizableThreadFactory 即可，正例如下：

public class ThreadTest {

    public static void main(String[] args) throws Exception {
        ThreadPoolExecutor executorOne = new ThreadPoolExecutor(5, 5, 1,
                TimeUnit.MINUTES, new ArrayBlockingQueue<Runnable>(20),new CustomizableThreadFactory("Tianluo-Thread-pool"));
        executorOne.execute(()->{
            System.out.println("Hello，Java");
            throw new NullPointerException();
        });
    }
}

线程池参数设置不合理

线程池最容易出坑的地方，就是线程参数设置不合理。比如核心线程设置多少合理，最大线程池设置多少合理等等。当然，这块不是乱设置的，需要结合具体业务。

比如线程池如何调优，如何确认最佳线程数？

最佳线程数目 = （（线程等待时间+线程CPU时间）/ 线程CPU时间）* CPU数目

服务器CPU核数为8核，一个任务线程cpu耗时为20ms，线程等待（网络IO、磁盘IO）耗时80ms，那最佳线程数目：( 80 + 20 )/20 * 8 = 40。也就是设置 40个线程数最佳。

线程池异常处理的坑

public class ThreadTest {

    public static void main(String[] args) throws Exception {
        ThreadPoolExecutor executorOne = new ThreadPoolExecutor(5, 5, 1,
                TimeUnit.MINUTES, new ArrayBlockingQueue<Runnable>(20),new CustomizableThreadFactory("Tianluo-Thread-pool"));
        for (int i = 0; i < 5; i++) {
            executorOne.submit(()->{
                System.out.println("current thread name" + Thread.currentThread().getName());
                Object object = null;
                System.out.print("result## " + object.toString());
            });
        }

    }
}

按道理，运行这块代码应该抛空指针异常才是的，对吧。但是，运行结果却是这样的;

current thread nameTianluo-Thread-pool1
current thread nameTianluo-Thread-pool2
current thread nameTianluo-Thread-pool3
current thread nameTianluo-Thread-pool4
current thread nameTianluo-Thread-pool5

这是因为使用 submit 提交任务，不会把异常直接这样抛出来。大家有兴趣的话，可以去看看源码。可以改为 execute 方法执行，当然最好就是 try...catch 捕获，如下：

public class ThreadTest {

    public static void main(String[] args) throws Exception {
        ThreadPoolExecutor executorOne = new ThreadPoolExecutor(5, 5, 1,
                TimeUnit.MINUTES, new ArrayBlockingQueue<Runnable>(20),new CustomizableThreadFactory("Tianluo-Thread-pool"));
        for (int i = 0; i < 5; i++) {
            executorOne.submit(()->{
                System.out.println("current thread name" + Thread.currentThread().getName());
                try {
                    Object object = null;
                    System.out.print("result## " + object.toString());
                }catch (Exception e){
                    System.out.println("异常了"+e);
                }
            });
        }

    }
}

其实，我们还可以为工作者线程设置 UncaughtExceptionHandler ，在 uncaughtException 方法中处理异常。大家知道这个坑就好啦。

线程池使用完毕后，忘记关闭

如果线程池使用完，忘记关闭的话，有可能会导致内存泄露问题。所以，大家使用完线程池后，记得关闭一下。同时，线程池最好也设计成单例模式，给它一个好的命名，以方便排查问题。

public class ThreadTest {

    public static void main(String[] args) throws Exception {

        ThreadPoolExecutor executorOne = new ThreadPoolExecutor(5, 5, 1,
                TimeUnit.MINUTES, new ArrayBlockingQueue<Runnable>(20), new CustomizableThreadFactory("Tianluo-Thread-pool"));
        executorOne.execute(() -> {
            System.out.println("Hello，Java");
        });

        //关闭线程池
        executorOne.shutdown();
    }
}

ThreadLocal与线程池搭配，线程复用，导致信息错乱。

使用 ThreadLocal 缓存信息，如果配合线程池一起，有可能出现信息错乱的情况。先看下一下例子：

private static final ThreadLocal<Integer> currentUser = ThreadLocal.withInitial(() -> null);

@GetMapping("wrong")
public Map wrong(@RequestParam("userId") Integer userId) {
    //设置用户信息之前先查询一次ThreadLocal中的用户信息
    String before  = Thread.currentThread().getName() + ":" + currentUser.get();
    //设置用户信息到ThreadLocal
    currentUser.set(userId);
    //设置用户信息之后再查询一次ThreadLocal中的用户信息
    String after  = Thread.currentThread().getName() + ":" + currentUser.get();
    //汇总输出两次查询结果
    Map result = new HashMap();
    result.put("before", before);
    result.put("after", after);
    return result;
}

按理说，每次获取的 before 应该都是null ，但是呢，程序运行在 Tomcat 中，执行程序的线程是Tomcat 的工作线程，而 Tomcat 的工作线程是基于线程池的。

线程池会重用固定的几个线程，一旦线程重用，那么很可能首次从 ThreadLocal 获取的值是之前其他用户的请求遗留的值。这时，ThreadLocal 中的用户信息就是其他用户的信息。

把tomcat的工作线程设置为 1 ：

server.tomcat.max-threads=1

用户 1 ，请求过来，会有以下结果，符合预期：

用户 2 请求过来，会有以下结果，「不符合预期」：

因此，使用类似 ThreadLocal 工具来存放一些数据时，需要特别注意在代码运行完后，显式地去清空设置的数据，正例如下：

@GetMapping("right")
public Map right(@RequestParam("userId") Integer userId) {
    String before  = Thread.currentThread().getName() + ":" + currentUser.get();
    currentUser.set(userId);
    try {
        String after = Thread.currentThread().getName() + ":" + currentUser.get();
        Map result = new HashMap();
        result.put("before", before);
        result.put("after", after);
        return result;
    } finally {
        //在finally代码块中删除ThreadLocal中的数据，确保数据不串
        currentUser.remove();
    }
}