谈谈dubbo集群容错-蒲公英云

从继承图可以看出有以下几种策略:

Failover Cluster(失败转移) 当调用提供者的服务器发生错误时，再试下一个服务器。用于读操作。重试有延时。设置重试次数

具体代码实现

public Result doInvoke(Invocation invocation, final List<Invoker<T>> invokers, LoadBalance loadbalance)  {
        List<Invoker<T>> invoked = new ArrayList<Invoker<T>>(copyinvokers.size()); 
        Set<String> providers = new HashSet<String>(len);
        for (int i = 0; i < len; i++) {//5
            if (i > 0) {
                checkWhetherDestroyed();
                copyinvokers = list(invocation);
                checkInvokers(copyinvokers, invocation);
            }
            Invoker<T> invoker = select(loadbalance, invocation, copyinvokers, invoked);//1
            invoked.add(invoker);//2
            RpcContext.getContext().setInvokers((List) invoked);
            try {
                Result result = invoker.invoke(invocation);//3
                if (le != null && logger.isWarnEnabled()) {logger.warn
                return result;
            } catch (RpcException e) {//4
                if (e.isBiz()) { throw e;
                le = e;
            } catch (Throwable e) {
                le = new RpcException(e.getMessage(), e);
            } finally
                providers.add(invoker.getUrl().getAddress());
            }
        }//for-end 
        throw new RpcException(le.getCode()//6
    }
复制代码

实现思想：利用for循环，成功直接return。这样就可以多次调用。

每一次调用先找一个服务器，加入到已调用结合。开始调用。拿到结果。如果出错，catch住，分析异常类型。当超出规定次数，直接抛异常。

Failfast Cluster (快速失败) 只发起一次调用，失败立即报错。一般对应于非幂等性的写操作。比如新增。

public Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {

        checkInvokers(invokers, invocation);
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        try {
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            if (e instanceof RpcException && ((RpcException) e).isBiz()) { // biz exception.
                throw (RpcException) e;
            }
            throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0,
        }
    }

复制代码

实现思想：直接调用。处理异常。简单

Failback Cluster (失败自动恢复) 失败自动恢复，后台记录失败请求，定时重发。比如消息通知

protected Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {

        try {
            checkInvokers(invokers, invocation);
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            logger.error(
            addFailed(invocation, this);
            return new RpcResult(); // ignore
        }
    }
private void addFailed(Invocation invocation, AbstractClusterInvoker<?> router) {
        if (retryFuture == null) {
            synchronized (this) {
                if (retryFuture == null) {
                    retryFuture = scheduledExecutorService.scheduleWithFixedDelay(new Runnable() {
                        @Override
                        public void run() {
                            // collect retry statistics
                            try {
                                retryFailed();
                            } catch (Throwable t) { // Defensive fault tolerance
                                logger.error("Unexpected error occur at collect statistic", t);
                            }
                        }
                    }, RETRY_FAILED_PERIOD, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
                }
            }
        }
        failed.put(invocation, router);
    }

复制代码

实现思想：先直接调用。失败后，放到map直接保存。然后用定时线程池去重试map里的信息。

FailSafe Cluster (失败安全) 失败了，直接忽略。比如写入日志啥的

public Result doInvoke(Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {

        try {
            checkInvokers(invokers, invocation);
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable e) {
            logger.error("Failsafe ignore exception: " + e.getMessage(), e);
            return new RpcResult(); // ignore
        }
    }

复制代码

Forking Cluster (并行调用) 同时调用多个服务器，有一个成功，就返回。用于实时性较高的情况

public Result doInvoke(final Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {

        try {
            checkInvokers(invokers, invocation);
            final List<Invoker<T>> selected;
            final int forks = getUrl().getParameter(Constants.FORKS_KEY, Constants.DEFAULT_FORKS);
            final int timeout = getUrl().getParameter(Constants.TIMEOUT_KEY, Constants.DEFAULT_TIMEOUT);
            if (forks <= 0 || forks >= invokers.size()) {
                selected = invokers;
            } else {
                selected = new ArrayList<>();
                for (int i = 0; i < forks; i++) {
                    // TODO. Add some comment here, refer chinese version for more details.
                    Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                    if (!selected.contains(invoker)) {
                        //Avoid add the same invoker several times.
                        selected.add(invoker);
                    }
                }
            }
            RpcContext.getContext().setInvokers((List) selected);
            final AtomicInteger count = new AtomicInteger();
            final BlockingQueue<Object> ref = new LinkedBlockingQueue<>();
            for (final Invoker<T> invoker : selected) {
                executor.execute(new Runnable() {
                    @Override
                    public void run() {
                        try {
                            Result result = invoker.invoke(invocation);
                            ref.offer(result);
                        } catch (Throwable e) {
                            int value = count.incrementAndGet();
                            if (value >= selected.size()) {
                                ref.offer(e);
                            }
                        }
                    }
                });
            }
            try {
                Object ret = ref.poll(timeout, TimeUnit.MILLISECONDS);
                if (ret instanceof Throwable) {
                    Throwable e = (Throwable) ret;
                    throw new RpcException(e instanceof RpcException ? ((RpcException) e).getCode() : 0, "Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e.getCause() != null ? e.getCause() : e);
                }
                return (Result) ret;
            } catch (InterruptedException e) {
                throw new RpcException("Failed to forking invoke provider " + selected + ", but no luck to perform the invocation. Last error is: " + e.getMessage(), e);
            }
        } finally {
            // clear attachments which is binding to current thread.
            RpcContext.getContext().clearAttachments();
        }
    }

复制代码

实现思想：利用list保存多个服务器。用线程池同时提交调用。将结果放到LinkedBlockingQueue。然后从队列里取结果。

Broadcast Cluster (广播调用) 广播调用所有提供者，琢一调用。任意一台报错，则报错。用于通知服务提供者更新资源啥的。

public Result doInvoke(final Invocation invocation, List> invokers, LoadBalance loadbalance) throws RpcException {

          checkInvokers(invokers, invocation);
          RpcContext.getContext().setInvokers((List) invokers);
          RpcException exception = null;
          Result result = null;
          for (Invoker<T> invoker : invokers) {
              try {
                  result = invoker.invoke(invocation);
              } catch (RpcException e) {
                  exception = e;
                  logger.warn(e.getMessage(), e);
              } catch (Throwable e) {
                  exception = new RpcException(e.getMessage(), e);
                  logger.warn(e.getMessage(), e);
              }
          }
          if (exception != null) {
              throw exception;
          }
          return result;
      }