透過 Elasticsearch 提供 Ceph RGW Metadata 搜尋

使用 Object Storage 一陣子以後,或許你會發現儲存的檔案愈多要找檔案愈麻煩。因此 Ceph 在 K(Kraken) 版以後有提供一個 sync module 可以將資料同步到 Elasticsearch,方便使用者或管理人員利用 query 語法快速找到你想要搜尋檔案的 metadata。

Elasticsearch

Elasticsearch 是基於 Lucene 的分散式全文搜尋引擎,並在 Apache 許可證下作為開源軟體發佈。現今廣泛用於資料探勘領域,與 Logstash (數據收集與日誌解析引擎) 和 Kibana (數據可視覺化平台) 合稱 ELK。

Elasticsearch

環境

參數 數值
Operating System Ubuntu 16.04 LTS
Ceph Cluster IP 192.168.1.226
RGW1 192.168.1.226:8001
RGW2 192.168.1.226:8002
Realm Name test-realm
Zonegroup Name test-zonegroup
Zone1 test-zone-1
Zone2 test-zone-2
Elasticsearch Version 5.6+, < 6.0
Elasticsearch IP 192.168.1.226:9200

架構

實作時需要建立至少兩個 Zone,一個是提供使用者利用 s3 potocal 透過 Radosgw 對 Ceph Object Storage 進行操作,另一個 Zone 是同步數據使用,並將 Metadata 儲存至 Elasticsearch 上。這邊的例子採用的是同一座 Ceph Cluster 並且以不同的 port 分配兩個 RGW(Radosgw) 並個別建立兩個 Zone 在同一個 Zonegroup 與 Realm 上。

Elasticsearch

安裝 Elasticsearch

更新套件集

1
$ apt-get update

安裝 java

1
$ apt-get install openjdk-8-jdk -y

確認 java 環境是否正常

1
2
3
4
5
$ java -version

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

下載並安裝 Elasticsearch 套件

1
2
$ curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
$ dpkg -i elasticsearch-5.6.16.deb

啟動 Elasticsearch 服務

1
$ systemctl start elasticsearch

編輯 Elasticsearch Configuration,將原先 network.host 註解移除並填入 0.0.0.0,以提供所有 IP Address 皆可進行存取。

1
2
3
4
$ vim /etc/elasticsearch/elasticsearch.yml

network.host 0.0.0.0

重新啟動 Elasticsearch 服務

1
$ systemctl restart elasticsearch

確認環境是否正常提供服務

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ curl 192.168.1.226:9200
{
"name" : "QNbZkhV",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "MU9qYysiSLSDdpi-TXnWIw",
"version" : {
"number" : "5.6.16",
"build_hash" : "3a740d1",
"build_date" : "2019-03-13T15:33:36.565Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}

注意:IP Address 記得更換為自己的 IP Address

建立 test-realm

1
2
3
4
5
6
7
$ radosgw-admin realm create --rgw-realm=test-realm --default
{
"id": "96cf396e-796b-47e5-9ae2-2d59fb9e43df",
"name": "test-realm",
"current_period": "e1376cfd-1270-4e0f-bead-56a9cfbb1f8a",
"epoch": 1
}

建立 test-zonegroup

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ radosgw-admin zonegroup create --rgw-realm=test-realm --rgw-zonegroup=test-zonegroup --endpoints=http://192.168.1.226:8001 --master --default
{
"id": "ff420afa-3caa-4cdb-8e2e-595f8c0dc150",
"name": "test-zonegroup",
"api_name": "test-zonegroup",
"is_master": "true",
"endpoints": [
"http://192.168.1.226:8001"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "",
"zones": [],
...
}

建立 test-zone-1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ radosgw-admin zone create --rgw-realm=test-realm --rgw-zonegroup=test-zonegroup --rgw-zone=test-zone --endpoints=http://192.168.1.226:8001 --access-key=test --secret=test --master --default
{
...
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "test-zone.rgw.buckets.index",
...
}
}
],
...
}

建立 test 使用者

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ radosgw-admin user create --uid=test --display-name="test" --access-key=test --secret=test --system
{
"user_id": "test",
"display_name": "test",
"email": "",
...
"keys": [
{
"user": "test",
"access_key": "test",
"secret_key": "test"
}
],
...
}

更新當前 Realm 底下數據

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ radosgw-admin period update --commit
{
"id": "0d0783ec-c5c7-4e31-a1e5-ee9a81f9b08a",
"epoch": 1,
...
"period_map": {
"id": "0d0783ec-c5c7-4e31-a1e5-ee9a81f9b08a",
"zonegroups": [
{
"id": "ff420afa-3caa-4cdb-8e2e-595f8c0dc150",
"name": "test-zonegroup",
...
}
],
...
},
...
}

編輯 Radosgw Configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ vim /etc/ceph/radosgw.conf
[client.radosgw.gateway]
mon_host = 192.168.1.226:6789
keyring = /etc/ceph/ceph.client.radosgw.keyring
log file = /var/log/ceph/client.radosgw.test-zone-1.log
rgw enable usage log = true
rgw_frontends = civetweb port=8001
rgw_zone = test-zone

[client.radosgw.gateway2]
mon_host = 192.168.1.226:6789
keyring = /etc/ceph/ceph.client.radosgw.keyring
log file = /var/log/ceph/client.radosgw.test-zone-2.log
rgw enable usage log = true
rgw frontends = civetweb port=8002
rgw_zone = test-zone-2

注意:需在 /etc/ceph/ceph.client.radosgw.keyring 中加入 client.radosgw.gateway2 的 key

啟動 client.radosgw.gateway

1
$ radosgw -n client.radosgw.gateway -c /etc/ceph/radosgw.conf

建立 test-zone-2

1
2
3
4
5
6
7
8
9
10
11
$ radosgw-admin zone create --rgw-realm=test-realm  --rgw-zonegroup=test-zonegroup --rgw-zone=test-zone-2 --access-key=test --secret=test --endpoints=http://192.168.1.226:8002
{
"id": "d0a13b78-251b-4314-b026-722dfbe79ff1",
"name": "test-zone-2",
"domain_root": "test-zone-2.rgw.meta:root",
"control_pool": "test-zone-2.rgw.control",
"gc_pool": "test-zone-2.rgw.log:gc",
"lc_pool": "test-zone-2.rgw.log:lc",
"log_pool": "test-zone-2.rgw.log",
...
}

修改 test-zone-2 的配置,更改 tier-type 與 tier-config 並指向 Elasticsearch 的 Port。

1
2
3
4
5
6
7
8
9
10
11
12
13
$ radosgw-admin zone modify --rgw-realm=test-realm  --rgw-zonegroup=test-zonegroup --rgw-zone=test-zone-2 --tier-type=elasticsearch --tier-config=endpoint=http://192.168.1.226:9200
{
"id": "d0a13b78-251b-4314-b026-722dfbe79ff1",
"name": "test-zone-2",
"domain_root": "test-zone-2.rgw.meta:root",
"control_pool": "test-zone-2.rgw.control",
"gc_pool": "test-zone-2.rgw.log:gc",
"lc_pool": "test-zone-2.rgw.log:lc",
"log_pool": "test-zone-2.rgw.log",
"intent_log_pool": "test-zone-2.rgw.log:intent",
"usage_log_pool": "test-zone-2.rgw.log:usage",
...
}

更新當前 Realm 底下數據

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ radosgw-admin period update --commit
{
"id": "0d0783ec-c5c7-4e31-a1e5-ee9a81f9b08a",
"epoch": 2,
...
"period_map": {
"id": "0d0783ec-c5c7-4e31-a1e5-ee9a81f9b08a",
"zonegroups": [
{
"id": "ff420afa-3caa-4cdb-8e2e-595f8c0dc150",
"name": "test-zonegroup",
...
}
],
...
},
...
}

啟動 client.radosgw.gateway2

1
$ radosgw -n client.radosgw.gateway2 -c /etc/ceph/radosgw.conf

開啟瀏覽器輸入 192.168.1.226:8002 確認 Radosgw 服務正常

注意:Radosgw2 無法提供服務是正常的,因為若宣告 type 為 Elasticsearch 系統不會建立 bucket.data pool。

結果

建立 Bucket 與上傳檔案,可利用 s3 api 或 s3 browser 進行測試。檢視 Pool 狀態可以發現 Radosgw2 系統沒有建立 bucket.data

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ rados df
POOL_NAME USED …
.rgw.root 6.05KiB
test-zone-2.rgw.buckets.index 0B
test-zone-2.rgw.control 0B
test-zone-2.rgw.log 5.07KiB
test-zone-2.rgw.meta 0B
test-zone.rgw.buckets.data 3B
test-zone.rgw.buckets.index 0B
test-zone.rgw.control 0B
test-zone.rgw.log 0B
test-zone.rgw.meta 663B

total_objects 766
total_used 4.22GiB
total_avail 35.8GiB
total_space 40.0GiB

可以看到 test-zone-2 因為宣告的 type 為 elasticsearch 因此不會建立 buckets.data。換句話說,使用者無法透過 RGW2 對 Ceph Object Storage 進行請求存取。

使用 Elasticsearch 查詢語法查詢 metadata

1
2
$ curl http://192.168.1.226:9200/_search?q=name:testTest1
{"took":10,"timed_out":false,"_shards":{"total":26,"successful":26,"skipped":0,"failed":0},"hits":{"total":1,"max_score":0.6931472,"hits":[{"_index":"rgw-test-realm-313b4db1","_type":"object","_id":"cf4bd9d8-eb23-4773-bd29-b5d640040f36.14828.2:testTest1:null","_score":0.6931472,"_source":{"bucket":"bucket1","name":"testTest1","instance":"null","versioned_epoch":0,"owner":{"id":"test","display_name":"test"},"permissions":["test"],"meta":{"size":550474,"mtime":"2019-03-20T09:07:47.272Z","etag":"bf4fef418d8f1bb996200529e60bed00","tail_tag":"cf4bd9d8-eb23-4773-bd29-b5d640040f36.14828.62968","x-amz-content-sha256":"a4bb4653d3480dd8d5d58f76bca04dcc08e607bdb72b616b44feb980b3a387c6","x-amz-date":"20190320T090745Z"}}}]}}

注意:檔案名稱記得更換為自己所上傳的檔案名稱

參考

評論

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×