基础设施服务能力及职责摘录

InfraService

服务模块协作

服务内部职责划分

iacService[Iac任务模块]

  • 检查Iac清单
    • 校验不合理Iac任务
      • 配额过大
  • 调度Iac任务
    • 创建Iac任务
    • 追踪Iac任务
  • 核查Iac落地[后期建设]
    • 周期性核查历史Iac清单与资产落地情况
    • 核查当前Iac任务落地情况

templateService[配置模版/规格标准]

  • 管理模版
  • 管理标准和规格

allDeployService[业务&数据铺设服务]

  • 管理Sql全量初始化铺设
  • 管理Pod全量初始化铺设
    • 有/无编排顺序铺设

k8sService[K8s直查模块]

  • 集群查询[后期建设]
  • 节点查询[后期建设]
  • 节点池查询[后期建设]

rdsService[Rds直查模块]

  • 实例查询[当前建设]
  • 数据库查询[当前建设]
  • Db参数配置查询[后期建设]

ecsService[服务计算直查模块]

  • 实例查询[当前建设]


基础设施用例流程

总体流程




Infra数据模型

参数配置

规格数据样例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"id": "ai1.2xlarge.4",
"name": "ai1.2xlarge.4",
"vcpus": "8",
"ram": 32768,
"disk": "0",
"swap": "",
"attachableQuantity": {
"free_scsi": 20,
"free_blk": 12,
"free_disk": 20,
"free_nic": 4
},
"OS-FLV-EXT-DATA:ephemeral": 0,
"rxtx_factor": 1,
"OS-FLV-DISABLED:disabled": false,
"rxtx_quota": null,
"rxtx_cap": null,
"os-flavor-access:is_public": true,
}

Rds配置单样例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
"instance": {
"name": "rds-instance-rep2",
"datastore": {
"type": "MySQL",
"version": "5.6"
},
"flavor_ref": "rds.mysql.s1.large",
"volume": {
"type": "ULTRAHIGH",
"size": 100
},
"disk_encryption_id": "2gfdsh-844a-4023-a776-fc5c5fb71fb4",
"region": "aaa",
"availability_zone": "bbb",
"vpc_id": "490a4a08-ef4b-44c5-94be-3051ef9e4fce",
"subnet_id": "0e2eda62-1d42-4d64-a9d1-4e9aa9cd994f",
"security_group_id": "2a1f7fc8-3307-42a7-aa6f-42c8b9b8f8c5",
"port": "8635",
"backup_strategy": {
"start_time": "08:15-09:15",
"keep_days": 12
},
"charge_info": {
"charge_mode": "postPaid"
},
"password": "Test@12345678",
"configuration_id": "452408-ef4b-44c5-94be-305145fg",
"enterprise_project_id": "fdsa-3rds",
"time_zone": "UTC+04:00"
},
"job_id": "dff1d289-4d03-4942-8b9f-463ea07c000d"
}

k8s节点池样例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
{
"kind": "NodePool",
"apiVersion": "v3",
"metadata": {
"name": "lc-it-nodepool-79796",
"uid": "99addaa2-69eb-11ea-a592-0255ac1001bb"
},
"spec": {
"type": "vm",
"nodeTemplate": {
"flavor": "s6.large.2",
"az": "******",
"os": "EulerOS 2.5",
"login": {
"sshKey": "KeyPair-001"
},
"rootVolume": { //系统盘
"volumetype": "SAS",
"size": 40
},
"dataVolumes": [ //数据盘
{
"volumetype": "SAS",
"size": 100,
"extendParam": {
"useType": "docker"
}
}
],
"publicIP": { // 公有Ip
"eip": {
"bandwidth": {}
}
},
"nodeNicSpec": {
"primaryNic": {
"subnetId": "7e767d10-7548-4df5-ad72-aeac1d08bd8a"
}
},
"billingMode": 0,//0: 按需付费 1: 包周期 2: 已废弃:自动付费包周期
"extendParam": {
"maxPods": 110
},
"k8sTags": {
"cce.cloud.com/cce-nodepool": "lc-it-nodepool-79796"
}
},
"autoscaling": {
enable: 1,
minNodeCount: 1
maxNodeCount: 20
scaleDownCooldownTime: 111,//节点保留时间,单位为分钟,扩容出来的节点在这个时间内不会被缩掉
priority: 1,//节点池权重,更高的权重在扩容时拥有更高的优先级
},
initialNodeCount: 1, //初始化数量
"nodeManagement": {}
}
}

Infra清单用例

规格模版

  • SecurityGroup:安全组
  • Net:网络
  • Flaover:规格
  • Volume:数据磁盘
  • InfraExt:基建设施扩展属性

Infra清单

  • Infra
  • K8sTemplate
  • RdsTemplate

Infra清单数据结构

资源评估

资源评估类不需要变动参数模版,直接选择“规格”作为参数填充覆盖原模版

扩展参数

扩展参数类需要用到自定义Infra清单

Rds侧面

K8s侧

额外结构




Terraform

Infra清单-转变-Tf脚本

用户表单填写数据整理Infra数据结构题
Infra数据转变转Terraform HCL脚本

配置样例

Rds部分

华为云

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
variable "vpc_id" {}
variable "subnet_id" {}
variable "secgroup_id" {}
variable "availability_zone" {}

# 建设实例
resource "huaweicloud_rds_instance" "instance" {
name = "terraform_test_rds_instance"
flavor = "rds.pg.n1.large.2"
vpc_id = var.vpc_id #预留
subnet_id = var.subnet_id #预留
security_group_id = var.secgroup_id #预留
availability_zone = [var.availability_zone] #预留

db {
type = "MySQL"
version = "8.0"
password = "test"
}

volume {
type = "ULTRAHIGH"
size = 100
}

backup_strategy {
start_time = "08:00-09:00"
keep_days = 1
}
}

# 建设数据库
resource "huaweicloud_rds_mysql_database" "test" {
instance_id = huaweicloud_rds_instance.instance.id
name = "test"
character_set = "utf8"
description = "test database"
}

# 建设账号
resource "huaweicloud_rds_mysql_account" "test" {
instance_id = huaweicloud_rds_instance.instance.id
name = "test"
password = "Test@12345678"
}


# 授权
resource "huaweicloud_rds_mysql_database_privilege" "test" {
instance_id = huaweicloud_rds_instance.instance.id
db_name = huaweicloud_rds_mysql_database.test.name

users {
name = huaweicloud_rds_mysql_account.test.name
readonly = false
}
}

K8s

华为云

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
variable "cluster_id" {}
variable "key_pair" {}
variable "availability_zone" {}

resource "huaweicloud_cce_node_pool" "node_pool" {
cluster_id = var.cluster_id
name = "testpool"
os = "EulerOS 2.5"
initial_node_count = 2
flavor_id = "s3.large.4"
availability_zone = var.availability_zone
key_pair = var.keypair
scall_enable = true
min_node_count = 1
max_node_count = 10
scale_down_cooldown_time = 100
priority = 1
type = "vm"

root_volume {
size = 40
volumetype = "SAS"
}
data_volumes {
size = 100
volumetype = "SAS"
}
}

隔离性

  • 一个Terraform配置文件管理所有环境隔离性会被打破

    • workspace隔离(不直观)
    • 文件布局隔离
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      infra
      huaweicloud
      component
      rds
      ecs
      k8s
      azure
      component
      rds
      ecs
      k8s
      aliyun
      component
      rds
      ecs
      k8s
      prod/租户1
      service
      - var.tf
      - main.tf
      - outputs.tf
      database
      - var.tf
      - main.tf
      - outputs.tf
      test/租户2
      service
      - var.tf
      - main.tf
      - outputs.tf
      database
      - var.tf
      - main.tf
      - outputs.tf
  • 太过隔离引发问题

    • 需要每个文件夹中多次运行terraform apply(使用Terragrunt通过apply-all命令来自动执行)
    • 调用资源依赖关系将变得十分困难,Terraform的terraform_remote_state数据源可以解决

terraform_remote_state使用数据源直接读取其他Terraform写入的Terraform状态文件
举例数据库隔离粒度不是Rds,而是Database,只需要一个Db
各个Service所需要的db来源就可以用remote_state

1
2
3
4
5
6
7
8
9
data "terraform_remote_state" "db"{
backend = "s3"

config = {
bucket = "bucket_name"
key = "prod/data-stores/mysql/terraform.tfstate"
region = "us-east-2"
}
}

状态管理

Terraform脚本代码可Git托管,State当前集群状态无法给Git托管

  • 手动错误
    忘记从Git中读取最新状态文件,或在运行Terraform之后忘记将状态文件推送Git
    系统意外地回到之前状态或重复了以前的部署
  • 锁定
    Git无法处理锁定机制,该机制避免多实例对同一个状态文件同时运行terraform apply
  • 机密
    Terraform状态文件属于纯文本文件。可能会将与资源相关的敏感数据写入文件

关于状态存储

Amazon S3远程存储Terraform状态文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
provider aws {
region = "us-east-2"
}

resource "aws_s3_bucket" "terraform_state" {
bucket = "terraform-state"

# 防止Terraform Destory删除S3 Bucket
lifecyle {
prevent_destroy = true
}

# s3启用版本控制
versioning {
enabled = true
}

# 服务器端加密
servver_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorigthm = "AES256"
}
}
}
}

关于状态锁定

1
2
3
4
5
6
7
8
9
resource "aws_dynamodb_table" "terraform_locks"{
name = "terraform-locks"
billing_mode = "PER_REQUEST"
hash_key = "LockId"
attribute {
name = "LockId"
type = "S"
}
}"

装配backend

1
2
3
4
5
6
7
8
9
10
terraform {
backend "s3" {
bucket = "terraform-state"
key = "/xxx/xxx/terraform.tfstate"
region = "xxx"

dynamodb_table = "terraform-locks"
encrypt = true
}
}

也可以单独拿出backend配置

1
2
3
4
5
6
> backend.hcl

bucket = "xxx"
region = "xxx"
dynamodb_table = "xxx"
encrypt = true

terraform init -backend-config=backend.hcl




可重用设施部分

默认配额-基础设施重用

默认配额: 该HCL不额外制定其他属性被认定默认配额

infra component云设施基础组件不同Provider

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
provider "alicloud"{
access_key = "your-access-key"
secret_key = "your-secret-key"
region = "cn-hangzhou"
}

provider "huaweicloud" {
access_key = "your-access-key"
secret_key = "your-secret-key"
region = "cn-north-1"
project_id = "your-project-id"
}


provider "azurerm" {
features {}
client_id = "your-client-id"
client_secret = "your-client-secret"
subscription_id = "your-subscription-id"
tenant_id = "your-tenant-id"
}

如按照默认我们提供的配置的则走infra->huawei|alicloud|azure

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
- infra
- alicloud
- component
- k8s
- main.tf
- rds
- main.tf

- 租户1
- service
- main.tf
- databases
- main.tf


module "webservice" {
source = "../../infra/alicloud/component/k8s/..."
}

module "databases" {
source = "../../infra/alicloud/component/rds/..."
}

如有自定义配置走全量自定义,非模块引入

1
2
3
4
5
6
7
8
9
- 租户1
- service
- databases
- main.tf

resource xxx {
xxx
....
}

默认配额-可调参部分

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
- infra
- huaweicloud
- component
- k8s
- main.tf

- 租户1
- service
- var.tf
- main.tf
- databases


> var.tf
variable "cluster_node_count" {
description = "节点池数量"
type = "number"
}

> main.tf

resource "webservice" {
source = "../../infra/huaweicloud/component/k8s/..."
cluster_node_count = var.cluster_node_count
}

> infra/huaweicloud/component/main.tf

resource "huaweicloud_cce_node_pool" "test" {
...
initial_node_count = var.cluster_node_count
scall_enable = false
min_node_count = 0
max_node_count = 0
...
}




Deploy部署

Deploy部署包

Sealos优势

  • 自建云自带image-cri-shim类镜像私仓
  • LVScare多Master ipvs轻量级负载均衡
  • 镜像内部附带kubectl、sealos操作命令,将部署工作融合到一个镜像
  • 业务镜像全部离线download到registry

手动安装自建私有云

1
2
3
sealos run labring/kubernetes:v1.25.0 labring/helm:v3.8.2 labring/calico:v3.24.1 \
--masters 192.168.64.2,192.168.64.22,192.168.64.20 \
--nodes 192.168.64.21,192.168.64.19 -p [your-ssh-passwd]

手动铺设Deploy程序包

1
sealos run digiwin.com/xxx/deploy:x.x.x

Deploy程序包结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
.
├── charts
│ └── nginx
│ ├── Chart.lock
│ ├── charts
│ ├── Chart.yaml
│ ├── README.md
│ ├── templates
│ ├── values.schema.json
│ └── values.yaml
├── images
│ └── shim
│ └── nginxImages
├── init.sh
├── Kubefile
├── manifests
│ └── nginx
│ ├── deployment.yaml
│ ├── ingress.yaml
│ └── service.yaml
├── opt
│ └── helm
└── registry

Kubefile内容

1
2
3
4
5
6
7
8
FROM scratch
ENV version v1.1.0
COPY manifests ./manifests
COPY registry ./registry
ENTRYPOINT ["kubectl apply -f manifests/tigera-operator.yaml"]
CMD ["kubectl apply -f manifests/custom-resources.yaml"]
# 当然也可以做py脚本初始化,由Py脚本去做Sql的全量铺设以及kubectl apply -f 集群各类yaml文件的初始化
CMD [manifests/start.py]

构建/建设/应用Deploy程序包




问题及物料

问题

  • 客户开放的子账号模式?合作伙伴模式?主账号申请账单?涉及到毛利计算颗粒度
  • 使用限制:单账号单集群节点最大用量
  • 资费:包年/包月/用量计费
  • 资产基础设施盘点: 盘点未关联客户Infra清单的设施、未落地Infra清单的设施
  • 工作流程
    • Rds购入
    • Rds逻辑库配置
    • K8s NodePool/AgentPool配置
    • K8s Scale伸缩实例数

材料准备

  • 开设子账号
    • 华为云
      • IAM、Region-项目Id-项目名
        • 子账号【编程访问】
        • CCE FullAccess
        • RDS ManageAccess
      • User Name,Access Key Id,Secret Access Key
    • 微软云 (swqqh@qq.com)
      • Microsoft Entra Privileged Identity Management
      • 凭据:subscription_id、tenant_id、client_id、client_secret
    • 阿里云
      • RAM子账号、Region、用户组/权限
        • AliyunCSFullAccess
        • AliyunRDSFullAccess
        • 启用 OpenAPI 调用访问
      • AccessKey SECRET_KEY
  • Terraform测试/开发集群:测试产生资费
    • 华为云
    • 微软云
    • 阿里云
  • 回收:中间的路由设施也得回收