dt-obs-hosts

安装量: 225
排名: #9502

安装

npx skills add https://github.com/dynatrace/dynatrace-for-ai --skill dt-obs-hosts

Infrastructure Hosts Skill Monitor and manage host and process infrastructure including CPU, memory, disk, network, and technology inventory. What This Skill Does Discover and inventory hosts across cloud and on-premise environments Monitor host resource utilization (CPU, memory, disk, network) Track process resource consumption and lifecycle Analyze container and Kubernetes infrastructure Discover services via listening ports Manage technology stack versions and compliance Attribute infrastructure costs by cost center and product Validate data quality and metadata completeness Plan capacity and detect resource saturation Correlate infrastructure health across layers When to Use This Skill Use this skill when the user needs to: Inventory: "Show me all Linux hosts in AWS us-east-1" Monitor: "What hosts have high CPU usage?" Troubleshoot: "Which processes are consuming the most memory?" Discover: "What databases are running in production?" Plan: "Track Kubernetes version distribution for upgrade planning" Cost: "Calculate infrastructure costs by cost center" Security: "Find all processes listening on port 22" Compliance: "Identify hosts running EOL Java versions" Quality: "Check data completeness for AWS hosts" Optimize: "Find rightsizing candidates based on utilization" Core Concepts Entities HOST - Physical or virtual machines (cloud or on-premise) PROCESS - Running processes and process groups CONTAINER - Kubernetes containers NETWORK_INTERFACE - Host network interfaces DISK - Host disk volumes Metrics Categories Host Metrics - dt.host.cpu. , dt.host.memory. , dt.host.disk. , dt.host.net. Process Metrics - dt.process.cpu. , dt.process.memory. , dt.process.io. , dt.process.network. Inventory - OS type, cloud provider, technology stack, versions Cost - dt.cost.costcenter , dt.cost.product Quality - Metadata completeness, version compliance Alert Thresholds CPU/Memory/Disk: 80% warning, 90% critical Network:

70% high, >85% saturated Disk Latency: 20ms bottleneck Network Errors: Drop rate >1%, error rate >0.1% Swap: 30% warning, >50% critical Key Workflows 1. Host Discovery and Classification Discover hosts, classify by OS/cloud, inventory resources. smartscapeNodes "HOST" | fieldsAdd os.type, cloud.provider, host.logical.cpu.cores, host.physical.memory | summarize host_count = count(), by: {os.type, cloud.provider} | sort host_count desc OS Types: LINUX , WINDOWS , AIX , SOLARIS , ZOS → For cloud-specific attributes, see references/inventory-discovery.md 2. Resource Utilization Monitoring Monitor CPU, memory, disk, network across hosts. timeseries { cpu = avg(dt.host.cpu.usage), memory = avg(dt.host.memory.usage), disk = avg(dt.host.disk.used.percent) }, by: {dt.smartscape.host} | fieldsAdd host_name = getNodeName(dt.smartscape.host) | filter arrayAvg(cpu) > 80 or arrayAvg(memory) > 80 | sort arrayAvg(cpu) desc High utilization threshold: 80% warning, 90% critical → For detailed CPU analysis, see references/host-metrics.md → For memory breakdown, see references/host-metrics.md 3. Process Resource Analysis Identify top resource consumers at process level. timeseries { cpu = avg(dt.process.cpu.usage), memory = avg(dt.process.memory.usage) }, by: {dt.smartscape.process} | fieldsAdd process_name = getNodeName(dt.smartscape.process) | filter arrayAvg(cpu) > 50 | sort arrayAvg(cpu) desc | limit 20 → For process I/O analysis, see references/process-monitoring.md → For process network metrics, see references/process-monitoring.md 4. Technology Stack Inventory Discover and track software technologies and versions. smartscapeNodes "PROCESS" | fieldsAdd process.software_technologies | expand tech = process.software_technologies | fieldsAdd tech_type = tech[type], tech_version = tech[version] | summarize process_count = count(), by: {tech_type, tech_version} | sort process_count desc Common Technologies: Java, Node.js, Python, .NET, databases, web servers, messaging systems → For version compliance checks, see references/inventory-discovery.md 5. Service Discovery via Ports Map listening ports to services for security and inventory. smartscapeNodes "PROCESS" | fieldsAdd process.listen_ports, dt.process_group.detected_name | filter isNotNull(process.listen_ports) and arraySize(process.listen_ports) > 0 | expand port = process.listen_ports | summarize process_count = count(), by: {port, dt.process_group.detected_name} | sort toLong(port) asc | limit 50 Well-known ports: 80 (HTTP), 443 (HTTPS), 22 (SSH), 3306 (MySQL), 5432 (PostgreSQL) → For comprehensive port mapping, see references/inventory-discovery.md 6. Container and Kubernetes Monitoring Track container distribution and K8s workload types. smartscapeNodes "CONTAINER" | fieldsAdd k8s.cluster.name, k8s.namespace.name, k8s.workload.kind | summarize container_count = count(), by: {k8s.cluster.name, k8s.workload.kind} | sort k8s.cluster.name, container_count desc Workload Types: deployment , daemonset , statefulset , job , cronjob Note: Container image names/versions NOT available in smartscape. → For K8s version tracking, see references/container-monitoring.md → For container lifecycle, see references/container-monitoring.md 7. Cost Attribution and Chargeback Calculate infrastructure costs by cost center. smartscapeNodes "HOST" | fieldsAdd dt.cost.costcenter, host.logical.cpu.cores, host.physical.memory | filter isNotNull(dt.cost.costcenter) | fieldsAdd memory_gb = toDouble(host.physical.memory) / 1024 / 1024 / 1024 | summarize host_count = count(), total_cores = sum(toLong(host.logical.cpu.cores)), total_memory_gb = sum(memory_gb), by: {dt.cost.costcenter} | sort total_cores desc → For product-level cost tracking, see references/inventory-discovery.md 8. Infrastructure Health Correlation Correlate host and process metrics for cross-layer analysis. timeseries { host_cpu = avg(dt.host.cpu.usage), host_memory = avg(dt.host.memory.usage), process_cpu = avg(dt.process.cpu.usage) }, by: {dt.smartscape.host, dt.smartscape.process} | fieldsAdd host_name = getNodeName(dt.smartscape.host), process_name = getNodeName(dt.smartscape.process) | filter arrayAvg(host_cpu) > 70 | sort arrayAvg(host_cpu) desc Health scoring: Critical if any resource >90%, warning if >80% → For multi-resource saturation detection, see references/host-metrics.md Common Query Patterns Pattern 1: Smartscape Discovery Use smartscapeNodes to discover and classify entities. smartscapeNodes "HOST" | fieldsAdd | filter | summarize Pattern 2: Timeseries Performance Use timeseries to analyze metrics over time. timeseries metric = avg(dt.host.), by: {dt.smartscape.host} | fieldsAdd | filter Pattern 3: Cross-Layer Correlation Correlate host and process metrics. timeseries { host_cpu = avg(dt.host.cpu.usage), process_cpu = avg(dt.process.cpu.usage) }, by: {dt.smartscape.host, dt.smartscape.process} Pattern 4: Entity Enrichment with Lookup Enrich data with entity attributes. After lookup , reference fields with lookup. prefix. timeseries cpu = avg(dt.host.cpu.usage), by: {dt.smartscape.host} | lookup [ smartscapeNodes HOST | fields id, cpuCores, memoryTotal ], sourceField:dt.smartscape.host, lookupField:id | fieldsAdd cores = lookup.cpuCores, mem_gb = lookup.memoryTotal / 1024 / 1024 / 1024 Tags and Metadata Important Notes Generic tags field is NOT populated in smartscape queries Use specific tag fields: tags:azure[] , tags:environment Use custom metadata: host.custom.metadata[] Available Tags Azure Tags: tags:azure[dt_owner_team] , tags:azure[dt_cloudcost_capability] Environment: tags:environment Custom Metadata: host.custom.metadata[OperatorVersion] , host.custom.metadata[Cluster] Cost: dt.cost.costcenter , dt.cost.product → For complete tag reference, see references/inventory-discovery.md Cloud-Specific Attributes AWS cloud.provider == "aws" aws.region , aws.availability_zone , aws.account.id aws.resource.id , aws.resource.name aws.state (running, stopped, terminated) Azure cloud.provider == "azure" azure.location , azure.subscription , azure.resource.group azure.status , azure.provisioning_state azure.resource.sku.name (VM size) Kubernetes k8s.cluster.name , k8s.cluster.uid k8s.namespace.name , k8s.node.name , k8s.pod.name k8s.workload.name , k8s.workload.kind → For multi-cloud analysis, see references/inventory-discovery.md Best Practices Alerting Use percentiles (p95, p99) for latency metrics Use max() for resource limits Use avg() for utilization trends Set multi-level thresholds (warning at 80%, critical at 90%) Time Windows Real-time: 5-15 minute windows Trends: 24 hours to 7 days Capacity planning: 30-90 days Query Optimization Use filters early in the pipeline Limit results with | limit N Use specific entity types in smartscapeNodes Aggregate before enrichment (lookup) Data Quality Validate metadata completeness (target >90%) Check for duplicate host names Ensure cost tag coverage Monitor data freshness (lifetime.end) Limitations and Notes Smartscape Limitations Container image names/versions NOT available in smartscape Generic tags field NOT populated (use specific tag namespaces) Process metadata varies by process type Platform-Specific dt.host.cpu.iowait available on Linux only AIX has specific CPU metrics (entitlement, physc) Inode metrics available on Linux only Best Practices Use getNodeName() to get human-readable names Convert bytes to GB for readability: / 1024 / 1024 / 1024 Round aggregated values: round(value, decimals: 1) Use isNotNull() checks before array operations When to Load References This skill uses progressive disclosure . Start here for 80% of use cases. Load reference files for detailed specifications when needed. Load host-metrics.md when: Analyzing CPU component breakdown (user, system, iowait, steal) Investigating memory pressure and swap usage Troubleshooting disk I/O latency Diagnosing network packet drops or errors Load process-monitoring.md when: Analyzing process-level I/O patterns Investigating TCP connection quality Detecting resource exhaustion (file descriptors, threads) Tracking GC suspension time Load container-monitoring.md when: Analyzing container lifecycle and churn Tracking Kubernetes version distribution Managing OneAgent operator versions Planning K8s cluster upgrades Load inventory-discovery.md when: Performing security audits via port discovery Implementing cost attribution and chargeback Validating data quality and metadata completeness Managing multi-cloud infrastructure References host-metrics.md - Detailed host CPU, memory, disk, and network monitoring process-monitoring.md - Process-level CPU, memory, I/O, and network analysis container-monitoring.md - Container inventory, Kubernetes versions, and operator management inventory-discovery.md - Host/process discovery, technology inventory, cost attribution, and data quality

返回排行榜