[toc]
1. Spring Data Elasticsearch
Elasticsearch提供的Java客户端有一些不太方便的地方
- 很多地方需要拼接Json字符串,在java中拼接字符串有多恐怖你应该懂的
- 需要自己把对象序列化为json存储
- 查询到结果也需要自己反序列化为对象
因此,我们这里就不讲解原生的Elasticsearch客户端API了。而是学习Spring提供的套件:Spring Data Elasticsearch。
1.1 简介
Spring Data Elasticsearch是Spring Data项目下的一个子模块。
查看 Spring Data的官网:https://spring.io/projects/spring-data
data:image/s3,"s3://crabby-images/7701d/7701da6cfddce07b606c19fed6d6dbd26b849bd9" alt="1531753066475"
Spring Data的使命是为数据访问提供熟悉且一致的基于Spring的编程模型,同时仍保留底层数据存储的特殊特性。
它使得使用数据访问技术,关系数据库和非关系数据库,map-reduce框架和基于云的数据服务变得容易。这是一个总括项目,其中包含许多特定于给定数据库的子项目。这些令人兴奋的技术项目背后,是由许多公司和开发人员合作开发的。
Spring Data 的使命是给各种数据访问提供统一的编程接口,不管是关系型数据库(如MySQL),还是非关系数据库(如Redis),或者类似Elasticsearch这样的索引数据库。从而简化开发人员的代码,提高开发效率。
包含很多不同数据操作的模块:
data:image/s3,"s3://crabby-images/456d8/456d81a69c655fd2df163f5753d849095e1f42c0" alt="1531753715580"
Spring Data Elasticsearch的页面:https://projects.spring.io/spring-data-elasticsearch/
特征:
- 支持Spring的基于
@Configuration
的java配置方式,或者XML配置方式
- 提供了用于操作ES的便捷工具类**
ElasticsearchTemplate
**。包括实现文档到POJO之间的自动智能映射。
- 利用Spring的数据转换服务实现的功能丰富的对象映射
- 基于注解的元数据映射方式,而且可扩展以支持更多不同的数据格式
- 根据持久层接口自动生成对应实现方法,无需人工编写基本操作代码(类似mybatis,根据接口自动得到实现)。当然,也支持人工定制查询
data:image/s3,"s3://crabby-images/37f5d/37f5d1a48f69237fcc1d618bcd57cb646406357c" alt="1531754111583"
1.2 创建Demo工程
我们使用maven新建一个demo,学习Elasticsearch
data:image/s3,"s3://crabby-images/fc453/fc453ee844f85eb7d2c6a4a310a925ddc1c59107" alt=""
data:image/s3,"s3://crabby-images/d92c0/d92c032c9ad490f31f4771e9fe1be9f508c6967e" alt=""
pom依赖
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
| <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>
<groupId>com.leyou.demo</groupId> <artifactId>elasticsearch</artifactId> <version>0.0.1-SNAPSHOT</version> <packaging>jar</packaging>
<name>elasticsearch</name> <description>Demo project for Spring Boot</description>
<parent> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-parent</artifactId> <version>2.0.6.RELEASE</version> <relativePath/> </parent>
<properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> <java.version>1.8</java.version> </properties>
<dependencies> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-data-elasticsearch</artifactId> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-test</artifactId> <scope>test</scope> </dependency> </dependencies>
<build> <plugins> <plugin> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-maven-plugin</artifactId> </plugin> </plugins> </build> </project>
|
application.yml文件配置
1 2 3 4 5 6
| spring: data: elasticsearch: cluster-name: elasticsearch cluster-nodes: 106.15.72.229:9300
|
1.3 实体类及注解
首先我们准备好实体类
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
|
public class Item { private Long id;
private String title;
private String category;
private String brand;
private Double price;
private String images; }
|
映射
Spring Data通过注解来声明字段的映射属性,有下面的三个注解:
@Document
作用在类,标记实体类为文档对象,一般有四个属性
- indexName:对应索引库名称
- type:对应在索引库中的类型
- shards:分片数量,默认5
- replicas:副本数量,默认1
@Id
作用在成员变量,标记一个字段作为id主键
@Field
作用在成员变量,标记为文档的字段,并指定字段映射属性:
- type:字段类型,取值是枚举:FieldType
- index:是否索引,布尔类型,默认是true
- store:是否存储,布尔类型,默认是false
- analyzer:分词器名称:ik_max_word
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
|
@Data @Document(indexName = "item",type = "docs",shards = 1,replicas = 0) public class Item {
@Id private Long id;
@Field(type = FieldType.Text,analyzer = "ik_max_word") private String title;
@Field(type = FieldType.Keyword) private String category;
@Field(type = FieldType.Keyword) private String brand;
@Field(type = FieldType.Double) private Double price;
@Field(type = FieldType.Keyword,index = false) private String images; }
|
1.4 Template索引操作
配置启动类JustwebElasticsearchApplication
/**
* @Date 2020/3/31 15:21
* @Version 10.21
* @Author DuanChaojie
*/
@SpringBootApplication
public class JustwebElasticSearchApplication {
public static void main(String[] args) {
SpringApplication.run(JustwebElasticSearchApplication.class,args);
}
}
1.4.1 创建索引和映射
创建索引
- ElasticsearchTemplate中提供了创建索引的API
- 可以根据类的信息自动生成,也可以手动指定indexName和Settings
data:image/s3,"s3://crabby-images/117e7/117e7faf86c1c06baafcf4d9ac98c70f5e1fd936" alt="1531984923727"
映射
- 映射相关的API—puMapping
- 可以根据类的字节码信息(注解配置)来生成映射,或者手动编写映射
data:image/s3,"s3://crabby-images/f0109/f0109d5cf2fc865948c5a4b5c85b19f2cfa81a95" alt="1531985337698"
代码实现
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| @RunWith(SpringRunner.class) @SpringBootTest(classes = ItcastElasticsearchApplication.class) public class IndexTest {
@Autowired private ElasticsearchTemplate elasticsearchTemplate;
@Test public void testCreate(){ elasticsearchTemplate.createIndex(Item.class); elasticsearchTemplate.putMapping(Item.class); } }
|
结果:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
| GET /item { "item": { "aliases": {}, "mappings": { "docs": { "properties": { "brand": { "type": "keyword" }, "category": { "type": "keyword" }, "images": { "type": "keyword", "index": false }, "price": { "type": "double" }, "title": { "type": "text", "analyzer": "ik_max_word" } } } }, "settings": { "index": { "refresh_interval": "1s", "number_of_shards": "1", "provided_name": "item", "creation_date": "1525405022589", "store": { "type": "fs" }, "number_of_replicas": "0", "uuid": "4sE9SAw3Sqq1aAPz5F6OEg", "version": { "created": "6020499" } } } } }
|
1.3.2 删除索引
- 删除索引的API
- 可以根据类名或索引名删除。
data:image/s3,"s3://crabby-images/dbe26/dbe26d53e8d684b345941b20d1add068da33ca2a" alt="1526544759120"
1 2 3 4
| @Test public void deleteIndex() { elasticsearchTemplate.deleteIndex(Item.class); }
|
1.4 文档操作
Spring Data 的强大之处,就在于你不用写任何DAO处理,自动根据方法名或类的信息进行CRUD操作。只要你定义一个接口,然后继承Repository提供的一些子接口,就能具备各种基本的CRUD功能。
我们只需要定义接口,然后继承它就OK了。
data:image/s3,"s3://crabby-images/ed8cc/ed8ccf12fe88bb0fc6fcb70c07a5552b8e59f0ea" alt="1531987244855"
1 2
| public interface ItemRepository extends ElasticsearchRepository<Item,Long> { }
|
来看下Repository的继承关系
data:image/s3,"s3://crabby-images/ed5cb/ed5cb1249651ea3cdd642fb99d6661f8ebd6f0bf" alt="1531986965570"
我们看到有一个ElasticsearchRepository接口
data:image/s3,"s3://crabby-images/68e9b/68e9b6dd515e6f5950ec48e51083adaf8c2fd091" alt="1531987044693"
1.4.1 新增文档
1 2 3 4 5 6 7 8 9
| @Autowired private ItemRepository itemRepository;
@Test public void index() { Item item = new Item(1L, "小米手机7", " 手机", "小米", 3499.00, "http://image.leyou.com/13123.jpg"); itemRepository.save(item); }
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
| #去页面查询看看 GET /item/_search #结果 { "took": 0, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "item", "_type": "docs", "_id": "1", "_score": 1, "_source": { "id": 1, "title": "小米手机10", "category": "手机", "brand": "小米", "price": 3499, "images": "http://image.leyou.com/13123.jpg" } } ] } }
|
1.4.2 批量新增
代码
1 2 3 4 5 6 7 8
| @Test public void indexList() { List<Item> list = new ArrayList<>(); list.add(new Item(2L, "坚果手机R1", " 手机", "锤子", 3699.00, "http://image.leyou.com/123.jpg")); list.add(new Item(3L, "华为META10", " 手机", "华为", 4499.00, "http://image.leyou.com/3.jpg")); itemRepository.saveAll(list); }
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
| #再次去页面查询: { "took": 5, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "item", "_type": "docs", "_id": "2", "_score": 1, "_source": { "id": 2, "title": "坚果手机R1", "category": " 手机", "brand": "锤子", "price": 3699, "images": "http://image.leyou.com/13123.jpg" } }, { "_index": "item", "_type": "docs", "_id": "3", "_score": 1, "_source": { "id": 3, "title": "华为META10", "category": " 手机", "brand": "华为", "price": 4499, "images": "http://image.leyou.com/13123.jpg" } }, { "_index": "item", "_type": "docs", "_id": "1", "_score": 1, "_source": { "id": 1, "title": "小米手机7", "category": " 手机", "brand": "小米", "price": 3499, "images": "http://image.leyou.com/13123.jpg" } } ] } }
|
1.4.3 修改文档
修改和新增是同一个接口,区分的依据就是id,这一点跟我们在页面发起PUT请求是类似的。
1.4.4 基本查询
1 2 3 4 5 6 7 8 9 10 11 12 13
| @Test public void testQuery(){ Optional<Item> optional = this.itemRepository.findById(1l); System.out.println(optional.get()); }
@Test public void testFind(){ Iterable<Item> items = this.itemRepository.findAll(Sort.by(Sort.Direction.DESC, "price")); items.forEach(System.out::println); }
|
结果
data:image/s3,"s3://crabby-images/76de4/76de430cd9ae616816aa7a3a587dae67e6e35891" alt="1531990510740"
1.4.5 自定义方法
Spring Data 的另一个强大功能,是根据方法名称自动实现功能。
比如:你的方法名叫做:findByTitle,那么它就知道你是根据title查询,然后自动帮你完成,无需写实现类。
当然,方法名称要符合一定的约定
Keyword |
Sample |
Elasticsearch Query String |
And |
findByNameAndPrice |
{"bool" : {"must" : [ {"field" : {"name" : "?"}}, {"field" : {"price" : "?"}} ]}} |
Or |
findByNameOrPrice |
{"bool" : {"should" : [ {"field" : {"name" : "?"}}, {"field" : {"price" : "?"}} ]}} |
Is |
findByName |
{"bool" : {"must" : {"field" : {"name" : "?"}}}} |
Not |
findByNameNot |
{"bool" : {"must_not" : {"field" : {"name" : "?"}}}} |
Between |
findByPriceBetween |
{"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : ?,"include_lower" : true,"include_upper" : true}}}}} |
LessThanEqual |
findByPriceLessThan |
{"bool" : {"must" : {"range" : {"price" : {"from" : null,"to" : ?,"include_lower" : true,"include_upper" : true}}}}} |
GreaterThanEqual |
findByPriceGreaterThan |
{"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : null,"include_lower" : true,"include_upper" : true}}}}} |
Before |
findByPriceBefore |
{"bool" : {"must" : {"range" : {"price" : {"from" : null,"to" : ?,"include_lower" : true,"include_upper" : true}}}}} |
After |
findByPriceAfter |
{"bool" : {"must" : {"range" : {"price" : {"from" : ?,"to" : null,"include_lower" : true,"include_upper" : true}}}}} |
Like |
findByNameLike |
{"bool" : {"must" : {"field" : {"name" : {"query" : "?*","analyze_wildcard" : true}}}}} |
StartingWith |
findByNameStartingWith |
{"bool" : {"must" : {"field" : {"name" : {"query" : "?*","analyze_wildcard" : true}}}}} |
EndingWith |
findByNameEndingWith |
{"bool" : {"must" : {"field" : {"name" : {"query" : "*?","analyze_wildcard" : true}}}}} |
Contains/Containing |
findByNameContaining |
{"bool" : {"must" : {"field" : {"name" : {"query" : "**?**","analyze_wildcard" : true}}}}} |
In |
findByNameIn(Collection<String>names) |
{"bool" : {"must" : {"bool" : {"should" : [ {"field" : {"name" : "?"}}, {"field" : {"name" : "?"}} ]}}}} |
NotIn |
findByNameNotIn(Collection<String>names) |
{"bool" : {"must_not" : {"bool" : {"should" : {"field" : {"name" : "?"}}}}}} |
Near |
findByStoreNear |
Not Supported Yet ! |
True |
findByAvailableTrue |
{"bool" : {"must" : {"field" : {"available" : true}}}} |
False |
findByAvailableFalse |
{"bool" : {"must" : {"field" : {"available" : false}}}} |
OrderBy |
findByAvailableTrueOrderByNameDesc |
{"sort" : [{ "name" : {"order" : "desc"} }],"bool" : {"must" : {"field" : {"available" : true}}}} |
按照价格区间查询,定义这样的一个方法
1 2 3 4 5 6 7 8 9 10
| public interface ItemRepository extends ElasticsearchRepository<Item,Long> {
List<Item> findByPriceBetween(double price1, double price2); }
|
添加测试数据
1 2 3 4 5 6 7 8 9 10 11 12
| @Test public void indexList() { List<Item> list = new ArrayList<>(); list.add(new Item(1L, "小米手机7", "手机", "小米", 3299.00, "http://image.leyou.com/13123.jpg")); list.add(new Item(2L, "坚果手机R1", "手机", "锤子", 3699.00, "http://image.leyou.com/13123.jpg")); list.add(new Item(3L, "华为META10", "手机", "华为", 4499.00, "http://image.leyou.com/13123.jpg")); list.add(new Item(4L, "小米Mix2S", "手机", "小米", 4299.00, "http://image.leyou.com/13123.jpg")); list.add(new Item(5L, "荣耀V10", "手机", "华为", 2799.00, "http://image.leyou.com/13123.jpg")); itemRepository.saveAll(list); }
|
不需要写实现类,然后我们直接去运行
1 2 3 4 5
| @Test public void queryByPriceBetween(){ List<Item> items = this.itemRepository.findByPriceBetween(2000.00, 3500.00); items.forEach(System.out::println); }
|
虽然基本查询和自定义方法已经很强大了,但是如果是复杂查询(模糊、通配符、词条查询等)就显得力不从心了。此时,我们只能使用原生查询。
1.5 高级查询
1.5.1 基本查询
1 2 3 4 5 6 7 8
| @Test public void testQuery(){ MatchQueryBuilder queryBuilder = QueryBuilders.matchQuery("title", "小米"); Iterable<Item> items = this.itemRepository.search(queryBuilder); items.forEach(System.out::println); }
|
- Repository的search方法需要QueryBuilder参数,elasticSearch为我们提供了一个**
对象QueryBuilders
**
- QueryBuilders提供了大量的静态方法,用于生成各种不同类型的查询对象,例如:词条、模糊、通配符等QueryBuilder对象。
-
data:image/s3,"s3://crabby-images/34985/349858e8d45834d68c41f1fce671c72ed79bb42a" alt="1532008212626"
- elasticsearch提供很多可用的查询方式,但是不够灵活。如果想玩过滤或者聚合查询等就很难了。
1.5.2 自定义查询
先来看最基本的match query
queryBuilder.withQuery(QueryBuilders.matchQuery("title", "小米"));
Page<Item> items = this.itemRepository.search(queryBuilder.build());
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| @Test public void testNativeQuery(){ NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); queryBuilder.withQuery(QueryBuilders.matchQuery("title", "小米")); Page<Item> items = this.itemRepository.search(queryBuilder.build()); System.out.println(items.getTotalElements()); System.out.println(items.getTotalPages()); items.forEach(System.out::println); }
|
NativeSearchQueryBuilder:Spring提供的一个查询条件构建器,帮助构建json格式的请求体
Page<item>
:默认是分页查询,因此返回的是一个分页的结果对象,包含属性:
- totalElements:总条数
- totalPages:总页数
- Iterator:迭代器,本身实现了Iterator接口,因此可直接迭代得到当前页的数据
- 其它属性:
data:image/s3,"s3://crabby-images/d6ef8/d6ef844ae78c0f34b5b8932a3309103e6d4f6148" alt="1532009679148"
1.5.4 分页查询
- 利用
NativeSearchQueryBuilder
可以方便的实现分页。
- 可以发现,Elasticsearch中的分页是从第0页开始。
queryBuilder.withPageable(PageRequest.of(page, size));
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| @Test public void testNativeQuery(){ NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); queryBuilder.withQuery(QueryBuilders.termQuery("category", "手机"));
int page = 0; int size = 3; queryBuilder.withPageable(PageRequest.of(page, size));
Page<Item> items = this.itemRepository.search(queryBuilder.build()); System.out.println(items.getTotalElements()); System.out.println(items.getTotalPages()); System.out.println(items.getSize()); System.out.println(items.getNumber()); items.forEach(System.out::println); }
|
1.5.5 排序
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| @Test public void testSort(){ NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); queryBuilder.withQuery(QueryBuilders.termQuery("category", "手机"));
queryBuilder.withSort(SortBuilders.fieldSort("price").order(SortOrder.DESC));
Page<Item> items = this.itemRepository.search(queryBuilder.build()); System.out.println(items.getTotalElements()); items.forEach(System.out::println); }
|
1.6 聚合
1.6.1 聚合为桶
- 桶就是分组,比如这里我们按照品牌brand进行分组
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| @Test public void testAgg(){ NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); queryBuilder.withSourceFilter(new FetchSourceFilter(new String[]{""}, null)); queryBuilder.addAggregation( AggregationBuilders.terms("brands").field("brand")); AggregatedPage<Item> aggPage = (AggregatedPage<Item>) this.itemRepository.search(queryBuilder.build()); StringTerms agg = (StringTerms) aggPage.getAggregation("brands"); List<StringTerms.Bucket> buckets = agg.getBuckets(); for (StringTerms.Bucket bucket : buckets) { System.out.println(bucket.getKeyAsString()); System.out.println(bucket.getDocCount()); }
}
|
显示的结果
data:image/s3,"s3://crabby-images/3f0ab/3f0ab3876136a3426e438ac5b1cdfa55d8199c68" alt="1532012598213"
关键API:
AggregationBuilders
:聚合的构建工厂类。所有聚合都由这个类来构建,看看他的静态方法:
data:image/s3,"s3://crabby-images/fb00f/fb00f0adba07d62298951840f7e819dc564b84bc" alt="1526567597724"
AggregatedPage
:聚合查询的结果类。它是Page<T>
的子接口:
data:image/s3,"s3://crabby-images/372e6/372e66bafcbed2348dc173a91ea820dde2cb4ea1" alt="1526567748355"
AggregatedPage
在Page
功能的基础上,拓展了与聚合相关的功能,它其实就是对聚合结果的一种封装,大家可以对照聚合结果的JSON结构来看。
data:image/s3,"s3://crabby-images/bbbc7/bbbc7e8fba2412acd4042d75193fa5f0ffb7f70d" alt="1526567889455"
而返回的结果都是Aggregation类型对象,不过根据字段类型不同,又有不同的子类表示
data:image/s3,"s3://crabby-images/61981/61981a62fbd187bdb34757cf1a958948168f0ac7" alt="1526568128210"
我们看下页面的查询的JSON结果与Java类的对照关系:
data:image/s3,"s3://crabby-images/5ce22/5ce22fef03e51ad061a3785d0dbdb52fc57246cd" alt="1526571200130"
1.6.2 嵌套聚合,求平均值
代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
| @Test public void testSubAgg(){ NativeSearchQueryBuilder queryBuilder = new NativeSearchQueryBuilder(); queryBuilder.withSourceFilter(new FetchSourceFilter(new String[]{""}, null)); queryBuilder.addAggregation( AggregationBuilders.terms("brands").field("brand") .subAggregation(AggregationBuilders.avg("priceAvg").field("price")) ); AggregatedPage<Item> aggPage = (AggregatedPage<Item>) this.itemRepository.search(queryBuilder.build()); StringTerms agg = (StringTerms) aggPage.getAggregation("brands"); List<StringTerms.Bucket> buckets = agg.getBuckets(); for (StringTerms.Bucket bucket : buckets) { System.out.println(bucket.getKeyAsString() + ",共" + bucket.getDocCount() + "台");
InternalAvg avg = (InternalAvg) bucket.getAggregations().asMap().get("priceAvg"); System.out.println("平均售价:" + avg.getValue()); }
}
|
结果:
data:image/s3,"s3://crabby-images/d7d00/d7d00fde97b7fc6ba29695b5d4ed184d0fdafbd6" alt="1526572198447"