安装
npx skills add https://github.com/dynatrace/dynatrace-for-ai --skill dt-obs-logs
复制
Log Analysis Skill
Query, filter, and analyze Dynatrace log data using DQL for troubleshooting and monitoring.
What This Skill Covers
Fetching and filtering logs by severity, content, and entity
Searching log messages using pattern matching
Calculating error rates and statistics
Analyzing log patterns and trends
Grouping and aggregating log data by dimensions
When to Use This Skill
Use this skill when users want to:
Find specific log entries (e.g., "show me error logs from the last hour")
Filter logs by severity, process group, or content
Search logs for specific keywords or phrases
Calculate error rates or log statistics
Identify common error messages or patterns
Analyze log trends over time
Troubleshoot issues using log data
Key Concepts
Log Data Model
timestamp
When the log entry was created
content
The log message text
status
Log level (ERROR, FATAL, WARN, INFO, etc.)
dt.process_group.id
Associated process group entity
dt.process_group.detected_name
Resolves process group IDs to human-readable names
Query Patterns
fetch logs
Primary command for log data access
Time ranges
Use
from:now() -
for time windows
Filtering
Apply severity, content, and entity filters
Aggregation
Group and summarize log data
Pattern Detection
Use
matchesPhrase()
and
contains()
for content search
Common Operations
Severity filtering (single or multiple levels)
Content search (simple and full-text)
Entity-based filtering (process groups)
Time-series analysis (bucketing, sorting)
Error rate calculation
Pattern analysis (exceptions, timeouts, etc.)
Core Workflows
1. Log Searching
Find specific log entries by time, severity, and content.
Typical steps
:
Define time range
Filter by severity (optional)
Search content for keywords
Select relevant fields
Sort and limit results
Example
:
fetch logs, from:now() - 1h
| filter status == "ERROR"
| fields timestamp, content, process_group = dt.process_group.detected_name
| sort timestamp desc
| limit 100
2. Log Filtering
Narrow down logs using multiple criteria (severity, entity, content).
Typical steps
:
Fetch logs with time range
Apply severity filters
Filter by entity (process_group)
Apply content filters
Format and sort output
Example
:
fetch logs, from:now() - 2h
| filter in(status, {"ERROR", "FATAL", "WARN"})
| summarize count(), by:
| fieldsAdd process_group = dt.process_group.detected_name
| sort count() desc
3. Pattern Analysis
Identify patterns, trends, and anomalies in log data.
Typical steps
:
Fetch logs with time range
Add pattern detection fields
Aggregate by entity or time
Calculate statistics and ratios
Sort by frequency or rate
Example
:
fetch logs, from:now() - 2h
| filter status == "ERROR"
| fieldsAdd
has_exception = if(matchesPhrase(content, "exception"), true, else: false),
has_timeout = if(matchesPhrase(content, "timeout"), true, else: false)
| summarize
count(),
exception_count = countIf(has_exception == true),
timeout_count = countIf(has_timeout == true),
by:
Key Functions
Filtering
filter status == "ERROR"
- Filter by status level
in(status, "ERROR", "FATAL", "WARN")
- Multi-status filter
contains(content, "keyword")
- Simple substring search
matchesPhrase(content, "exact phrase")
- Full-text phrase search
Entity Operations
dt.process_group.detected_name
- Get human-readable process group name
filter process_group == "service-name"
- Filter by specific entity
Aggregation
count()
- Count all log entries
countIf(condition)
- Conditional count
by:
- Group by entity or time bucket
bin(timestamp, 5m)
- Time bucketing for trends
Field Operations
fields timestamp, content, status
- Select specific fields
fieldsAdd name = expression
- Add computed fields
if(condition, true_value, else: false_value)
- Conditional logic
Common Patterns
Content Search
Simple substring search:
fetch logs, from:now() - 1h
| filter contains(content, "database")
| fields timestamp, content, status
Full-text phrase search:
fetch logs, from:now() - 1h
| filter matchesPhrase(content, "connection timeout")
| fields timestamp, content, process_group = dt.process_group.detected_name
Error Rate Calculation
Calculate error rates over time:
fetch logs, from:now() - 2h
| summarize
total_logs = count(),
error_logs = countIf(status == "ERROR"),
by:
| fieldsAdd error_rate = (error_logs * 100.0) / total_logs
| sort time_bucket asc
Top Error Messages
Find most common errors:
fetch logs, from:now() - 24h
| filter status == "ERROR"
| summarize error_count = count(), by:
| sort error_count desc
| limit 20
Process Group-Specific Logs
Filter logs by process group:
fetch logs, from:now() - 1h
| fieldsAdd process_group = dt.process_group.detected_name
| filter process_group == "payment-service"
| filter status == "ERROR"
| fields timestamp, content, status
| sort timestamp desc
Structured / JSON Log Parsing
Many applications emit JSON-formatted log lines. Use
parse
to extract fields instead of dumping raw content:
fetch logs, from:now() - 1h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd level = log[level], message = log[msg], error = log[error]
| fields timestamp, level, message, error
| sort timestamp desc
| limit 50
Aggregate by a parsed field:
fetch logs, from:now() - 4h
| filter status == "ERROR"
| parse content, "JSON:log"
| fieldsAdd message = log[msg]
| summarize error_count = count(), by:
| sort error_count desc
| limit 20
Notes:
parse content, "JSON:log"
creates a record field
log
— access nested values with
log[key]
Filter logs with
contains()
before
parse
to reduce parsing overhead
Works with any JSON-structured field, not just
content
Best Practices
Always specify time ranges
- Use
from:now() -
to limit data
Apply filters early
- Filter by severity and entity before aggregation
Use appropriate search methods
-
contains()
for simple,
matchesPhrase()
for exact
Limit results
- Add
| limit 100
to prevent overwhelming output
Sort meaningfully
- Sort by timestamp for recent logs, by count for top errors
Name entities
- Use
dt.process_group.detected_name
or
getNodeName()
for human-readable output
Use time buckets for trends
-
bin(timestamp, 5m)
for time-series analysis
Integration Points
Entity model
Uses
dt.process_group.id
for service correlation
Time series
Supports temporal analysis with
bin()
and time ranges
Content search
Full-text search capabilities via
matchesPhrase()
Aggregation
Statistical analysis using
summarize
and conditional functions
Limitations & Notes
Log availability depends on OneAgent configuration and log ingestion
Full-text search (
matchesPhrase
) may have performance implications on large datasets
Entity names require proper OneAgent monitoring for resolution
Time ranges should be reasonable (avoid unbounded queries)
← 返回排行榜