skywalking实战--agent异常日志监控
skywalking-agentskywalkingskywalking-agent异常监控
本源码来自于skywalking-agent 8.9.0版本
背景
由于skywalking-agent客户端改为sidecar部署,每次更新skywalking-agent会使所有的项目都更新。 在各个服务中,并且很多业务服务的请求量很大、业务很是复杂,每次更新客户端时我们没办法每个业务流程都测试到位,我们担心某次的客户端更新导致测试环境出现问题,甚至是生产环境出现问题。如果能及时发现还能做对应的补救,如果是我们无法自己发现问题,需要业务方来联系我们那么问题的严重性就要被升级了。所以我们想是否有种机制可以让我们主动发现skywlaking-agent出现问题?如果我们在skywalking-agent的error出现时去报警是否可行?
实现历程
最初我们的想法是直接在AbstractLogger的error方法处进行promethues打点记录。代码案例如下:
@Override
public void error(String message, Throwable throwable) {
if (this.isErrorEnable()) {
Metrics.counter("skywalking-agent_error_log","");
this.logger(LogLevel.ERROR, message, throwable);
}
}
这种实现方式在本地进行测试时还可以正常运行,但是到了容器环境就出现了报错,报错的意思就是说在tomcat容器还未初始化完成就进行了 promethues 的注册。
12/09 14:28:19 {"instant":{"epochSecond":1670567299,"nanoOfSecond":713000000},"thread":"main","level":"ERROR","loggerName":"org.springframework.boot.diagnostics.LoggingFailureAnalysisReporter","message":"\n\n***************************\nAPPLICATION FAILED TO START\n***************************\n\nDescription:\n\nAn attempt was made to call a method that does not exist. The attempt was made from the following location:\n\n io.micrometer.prometheus.PrometheusMeterRegistry.<init>(PrometheusMeterRegistry.java:67)\n\nThe following method did not exist:\n\n io.micrometer.prometheus.PrometheusConfig.requireValid()V\n\nThe method's class, io.micrometer.prometheus.PrometheusConfig, is available from the following locations:\n\n jar:file:/www/root-spring-boot.jar!/BOOT-INF/lib/micrometer-registry-prometheus-1.6.3.jar!/io/micrometer/prometheus/PrometheusConfig.class\n\nThe class hierarchy was loaded from the following locations:\n\n io.micrometer.prometheus.PrometheusConfig: jar:file:/www/root-spring-boot.jar!/BOOT-INF/lib/micrometer-registry-prometheus-1.6.3.jar!/\n\n\nAction:\n\nCorrect the classpath of your application so that it contains a single, compatible version of io.micrometer.prometheus.PrometheusConfig\n","endOfBatch":false,"loggerFqcn":"org.apache.commons.logging.LogAdapter$Log4jLog","skyWalkingDynamicField":{"traceId":"N/A"},"threadId":1,"threadPriority":5,"requestId":"${ctx:requestId}","traceId":"${ctx:traceId}"}
出现这个问题后想到的就是能否在tomcat实例化后再进行 promethues 注册,其实也是不合理的,因为有的项目可能使用的不是tomcat容器。
最终想到的就是 AbstractLogger的error方法 内不使用 promethues 打点,但是用一个 AtomicLong 记录 error日志的次数,在 promethues 进行注册时拦截,并暴露 error日志的次数 的指标。
代码如下:
下面是 AbstractLogger 的大致代码
public abstract class AbstractLogger implements ILog {
public static AtomicLong incr = new AtomicLong(0);
@Override
public void error(Throwable throwable, String message, Object... objects) {
if (this.isErrorEnable()) {
incr.incrementAndGet();
this.logger(LogLevel.ERROR, replaceParam(message, objects), throwable);
}
}
}
插件定义的代码如下:
public class MetricsInstrumentation extends ClassStaticMethodsEnhancePluginDefine {
/**
* Enhance class.
*/
private static final String ENHANCE_CLASS = "io.micrometer.core.instrument.Metrics";
/**
* The intercept class for "invoke" method in the class "org.apache.catalina.core.StandardWrapperValve"
*/
private static final String INTERCEPT_CLASS = "org.apache.skywalking.apm.plugin.metrics.v1.MetricsAddRegistryInterceptor";
@Override
public ConstructorInterceptPoint[] getConstructorsInterceptPoints() {
return null;
}
@Override
public StaticMethodsInterceptPoint[] getStaticMethodsInterceptPoints() {
return new StaticMethodsInterceptPoint[]{
new StaticMethodsInterceptPoint() {
@Override
public ElementMatcher<MethodDescription> getMethodsMatcher() {
return named("addRegistry");
}
@Override
public String getMethodsInterceptor() {
return INTERCEPT_CLASS;
}
@Override
public boolean isOverrideArgs() {
return false;
}
}
};
}
@Override
protected ClassMatch enhanceClass() {
return NameMatch.byName(ENHANCE_CLASS);
}
}
增强代码如下:
public class MetricsAddRegistryInterceptor implements StaticMethodsAroundInterceptor {
private static ILog LOGGER = LogManager.getLogger(MetricsAddRegistryInterceptor.class);
private int init = 0;
@Override
public void beforeMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes, MethodInterceptResult result) {
}
@Override
public Object afterMethod(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes, Object ret) {
return null;
}
@Override
public void handleMethodException(Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes, Throwable t) {
}
@Override
public void onAfterMethod(Long startTime, Class clazz, Method method, Object[] allArguments, Class<?>[] parameterTypes, Object ret) {
if (init != 0) {
return;
}
MeterRegistry registry = (MeterRegistry) allArguments[0];
try {
Field field = registry.getClass().getDeclaredField("collectorMap");
field.setAccessible(true);
Object o = field.get(registry);
if (null != o) {
ConcurrentHashMap map = (ConcurrentHashMap) o;
if (map.size() > 0) {
Metrics.gauge(MetricsConfig.SERVICE_ERROR_CODE, AbstractLogger.incr);
init = 1;
}
}
} catch (Exception e) {
LOGGER.error(e,"MetricsAddRegistryInterceptor error");
}
}
}
最终报表效果
配合上报警系统就可以在 出现异常时及时报警
有了这个功能在出现skywalking-agent问题是我们可以提早知道问题,并做回退处理,在业务方发现之前把问题覆盖掉。
更多推荐
所有评论(0)