AWS Cloud watch에서는 인스턴스나 기타 서비스에 대한 성능 지표들을 확인할 수 있다. 하지만 인스턴스 내부 시스템 지표는 수집할 수가 없다. 따라서 이러한 지표를 확인하기 위해 외부 솔루션(datadog, newrelic)이나 AWS에서 제공하는 Cloud watch Agent를 설치하여 확인할 수 있다. 나는 단순히 메모리 지표 확인을 위해, 가장 설치하기 쉬운 Cloud watch Agent를 선택하였다.
Cloud watch Agent에서 수집할 수 있는 지표
세부 정보 수준 | 포함된 지표 |
기본 | Mem: mem_used_percent Disk: disk_used_percent disk와 같은 disk_used_percent 지표에는 Partition의 측정기준이 있는데, 이는 생성된 사용자 지정 지표의 수가 인스턴스와 연결된 파티션의 수에 따라 달라진다. |
표준 | CPU: cpu_usage_idle, cpu_usage_iowait, cpu_usage_user, cpu_usage_system Disk: disk_used_percent, disk_inodes_free Diskio: diskio_io_time Mem: mem_used_percent Swap: swap_used_percent |
고급 | CPU: cpu_usage_idle, cpu_usage_iowait, cpu_usage_user, cpu_usage_system Disk: disk_used_percent, disk_inodes_free Diskio: diskio_io_time, diskio_write_bytes, diskio_read_bytes, diskio_writes, diskio_reads Mem: mem_used_percent Netstat: netstat_tcp_established, netstat_tcp_time_wait Swap: swap_used_percent |
수집 범위
- 메모리
- 디스크
- 로그
설치
설치는 크게 4단계로 구분이 되어있다.
- EC2 인스턴스에 IAM Role을 적용
- 인스턴스에 Agent 설치
- Wizard를 통해 Agent 세팅
- Agent 실행
인스턴스 세팅
IAM Role 생성
- IAM 접속 → 역할 → 역할 만들기 → EC2 선택 → CloudWatchAgentServerPolicy 정책 선택 → 이름 설정
EC2 인스턴스에 IAM Role 적용
- EC2 접속 → 인스턴스 선택 → 작업 → 보안 → IAM 역할 수정 → 생성한 IAM ROLE 부여 후 저장
Agent 설치
wget <https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb>
sudo dpkg -i -E ./amazon-cloudwatch-agent.deb
Wizard 실행하여 세팅
나는 로그는 모니터링 하지 않고, 디스크와 메모리만 모니터링하는 것으로 세팅했다.
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard
On which OS are you planning to use the agent?
1. linux
2. windows
3. darwin
default choice: [1]: 1
Trying to fetch the default region based on ec2 metadata...
Are you using EC2 or On-Premises hosts?
1. EC2
2. On-Premises
default choice: [1]: 1
Which user are you planning to run the agent?
1. root
2. cwagent
3. others
default choice: [1]: 1
Do you want to turn on StatsD daemon?
1. yes
2. no
default choice: [1]: 1
Which port do you want StatsD daemon to listen to?
default choice: [8125]
What is the collect interval for StatsD daemon?
1. 10s
2. 30s
3. 60s
default choice: [1]: 1
What is the aggregation interval for metrics collected by StatsD daemon?
1. Do not aggregate
2. 10s
3. 30s
4. 60s
default choice: [4]: 4
Do you want to monitor metrics from CollectD? WARNING: CollectD must be installed or the Agent will fail to start
1. yes
2. no
default choice: [1]: 1
Do you want to monitor any host metrics? e.g. CPU, memory, etc.
1. yes
2. no
default choice: [1]: 1
Do you want to monitor cpu metrics per core?
1. yes
2. no
default choice: [1]: 1
Do you want to add ec2 dimensions (ImageId, InstanceId, InstanceType, AutoScalingGroupName) into all of your metrics if the info is available?
1. yes
2. no
default choice: [1]: 1
Do you want to aggregate ec2 dimensions (InstanceId)?
1. yes
2. no
default choice: [1]: 1
Would you like to collect your metrics at high resolution (sub-minute resolution)? This enables sub-minute resolution for all metrics, but you can customize for specific metrics in the output json file.
1. 1s
2. 10s
3. 30s
4. 60s
default choice: [4]: 4
Which default metrics config do you want?
1. Basic
2. Standard
3. Advanced
4. None
default choice: [1]: 1
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
Are you satisfied with the above config? Note: it can be manually customized after the wizard completes to add additional items.
1. yes
2. no
default choice: [1]: 1
Do you have any existing CloudWatch Log Agent (<http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html>) configuration file to import for migration?
1. yes
2. no
default choice: [2]: 2
Do you want to monitor any log files?
1. yes
2. no
default choice: [1]: 2
2
Saved config file to /opt/aws/amazon-cloudwatch-agent/bin/config.json successfully.
Current config as follows:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
Please check the above content of the config.
The config file is also located at /opt/aws/amazon-cloudwatch-agent/bin/config.json.
Edit it manually if needed.
Do you want to store the config in the SSM parameter store?
1. yes
2. no
default choice: [1]: 2
config.json 수정
위와 같이 세팅해도 충분하나, 기본 세팅으로 할 경우 disk_used_percent가 실제 인스턴스 root 용량과 다를 수 있다. 따라서 /opt/aws/amazon-cloudwatch-agent/bin/config.json 파일을 아래와 같이 수정한다.
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"aggregation_dimensions": [
[
"InstanceId"
]
],
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"collectd": {
"metrics_aggregation_interval": 60
},
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"/",
"/tmp"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
},
"statsd": {
"metrics_aggregation_interval": 60,
"metrics_collection_interval": 10,
"service_address": ":8125"
}
}
}
}
Agent 실행
sudo mkdir -p /usr/share/collectd/
sudo touch /usr/share/collectd/types.db
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a start
Agent 실행 확인
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
more /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
지표 확인
- Cloudwatch 접속 → 모든 지표 → CWAgent
- InstanceId에서 mem_used_percent와 disk_used_percent 를 확인할 수 있다.
'개발' 카테고리의 다른 글
dind(docker in docker) 방식으로 gitlab-runner 등록하기 (0) | 2022.10.23 |
---|---|
Airflow 로컬에서 세팅하기 (0) | 2022.10.15 |
AWS OpenSearch mapper_parsing_exception Trouble shooting (0) | 2022.10.12 |
AWS OpenSearch 시작하기 (feat.node.js) (0) | 2022.10.11 |
Airflow에 BigQuery 연동하기 (0) | 2022.10.07 |