jetty servlet的编码字符集选择

记一次中文指标乱码问题,问题也很简单,如下图所示:

Untitled

从metricbeat开始找原因,发现其实只要是UTF-8的编码格式就都可以解析,最终发现是webServer返回的数据非UTF-8格式,修改方案也很简单。将servlet中的content-type里面的text/plain修改成text/plain; charset=utf-8就可以了,如下面代码所示:

1
2
3
4
5
6
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException {
response.setContentType("text/plain");
response.setStatus(HttpServletResponse.SC_OK);
response.getWriter().write("<h1>哈哈</h1>");
}

我们可以轻易使用一个demo来复现这个问题,在maven中添加如下依赖

1
2
3
4
5
6
7
8
9
10
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<version>9.4.35.v20201120</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
<version>9.4.35.v20201120</version>
</dependency>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
package com.shoothzj.jetty;

import org.eclipse.jetty.server.Server;
import org.eclipse.jetty.servlet.ServletContextHandler;
import org.eclipse.jetty.servlet.ServletHolder;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import java.io.IOException;

public class SimpleJettyServer {

public static void main(String[] args) throws Exception {
Server server = new Server(8080);

ServletContextHandler context = new ServletContextHandler(ServletContextHandler.SESSIONS);
context.setContextPath("/");
server.setHandler(context);

context.addServlet(new ServletHolder(new HelloDefaultServlet()), "/hello-default");
context.addServlet(new ServletHolder(new HelloUTF8Servlet()), "/hello-utf8");

server.start();
server.join();
}

public static class HelloDefaultServlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException {
response.setContentType("text/plain");
response.setStatus(HttpServletResponse.SC_OK);
response.getWriter().write("<h1>哈哈</h1>");
}
}

public static class HelloUTF8Servlet extends HttpServlet {
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse response)
throws IOException {
response.setContentType("text/plain; charset=UTF-8");
response.setStatus(HttpServletResponse.SC_OK);
response.getWriter().write("<h1>哈哈</h1>");
}
}
}

通过curl命令来复现这个问题

1
2
3
4
curl localhost:8080/hello-default
<h1>??</h1>%
curl localhost:8080/hello-utf8
<h1>哈哈</h1>%

那么servlet里面的数据如何编码,我们可以dive一下,首先servlet里面有一个函数叫**response.setCharacterEncoding();**这个函数可以指定编码格式。其次,servlet还会通过上面的setContentType函数来做一定的推断,比如content-type中携带了charset,就使用content-type中的charset。还有些特定的content-type,比如text/json,在没有设置的情况下,servlet容器会假设它使用utf-8编码。在推断不出来,也没有手动设置的情况下,jetty默认的编码是iso-8859-1,这就解释了乱码的问题。