OSQuery - From Zero to Hero
Osquery is a powerful host instrumentation framework originally developed by Facebook that enables you to treat your operating system as a relational database. It allows you to query system information using SQL, making it a versatile tool for system monitoring, security analysis, and infrastructure management.
Table of Contents
- Introduction to Osquery
- Understanding Osquery’s Architecture
- Installation Guide
- Basic Usage and First Queries
- Osquery Interactive Mode (osqueryi)
- Osquery Daemon Mode (osqueryd)
- Tables and Schema
- Writing Effective Queries
- Configuration Files
- Query Packs
- Scheduled Queries
- File Integrity Monitoring
- Event-Based Tables
- Osquery Extensions
- Distributed Queries
- Security Monitoring Use Cases
- Performance Optimization
- Logging and Data Collection
- Integration with Other Tools
- Advanced Deployment at Scale
- Troubleshooting
- Resources and Further Learning
Introduction to Osquery
One of the most powerful features of Osquery is its cross-platform compatibility - it works on macOS, Linux, and Windows, allowing you to use the same queries across different environments. This eliminates the need for custom scripts for each platform, providing consistent data collection regardless of the operating system.
Key Benefits of Osquery
- Cross-Platform Support: Works consistently across macOS, Linux, and Windows systems
- SQL Interface: Uses familiar SQL syntax, making it accessible for those with database experience
- Extensive Data Collection: Gathers rich system data including running processes, user logins, network connections, browser plugins, loaded kernel modules, hardware events, file hashes, and more
- Open Source: Benefits from community contributions and transparency
- Flexibility: Can be used for security monitoring, compliance checks, IT operations, and troubleshooting
Understanding Osquery’s Architecture
Osquery has two main components that serve different purposes:
- Osqueryi (Interactive Mode): A command-line interface for executing ad-hoc SQL queries directly on the system
- Osqueryd (Daemon Mode): A background service that runs scheduled queries and logs the results for analysis
Under the hood, Osquery uses SQLite as its query engine, leveraging SQLite’s query parsing, optimization, and execution functionality. However, it doesn’t use SQLite for data storage. Instead, it generates most data on-the-fly through “Virtual Tables” at query execution time. For persistent storage needs, Osquery uses an embedded RocksDB database.
Installation Guide
Let’s go through installation steps for different platforms:
Ubuntu/Debian Installation
For Ubuntu or Debian-based systems, you can use the official repository:
1
2
3
4
5
6
7
8
9
10
# Set up the repository key
export OSQUERY_KEY=1484120AC4E9F8A1A577AEEE97A80C63C9D8B80B
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys $OSQUERY_KEY
# Add the repository
sudo add-apt-repository 'deb [arch=amd64] https://pkg.osquery.io/deb deb main'
# Update package lists and install osquery
sudo apt update
sudo apt install osquery
macOS Installation
1
2
3
4
5
6
# Using Homebrew
brew update
brew install osquery
# Alternatively, download the package from the official website
# https://osquery.io/downloads
Windows Installation
Download the installer from the official website (https://osquery.io/downloads) and follow the installation instructions provided.
Basic Usage and First Queries
After installation, you can immediately start using Osquery’s interactive shell to run queries:
1
2
# Launch the interactive shell
osqueryi
This will open the interactive shell with a “Using a virtual database” message. From here, you can start querying your system using SQL syntax.
Osquery Interactive Mode (osqueryi)
The interactive mode allows you to run ad-hoc queries against your system. Here are some essential commands:
.help
- Display help information.tables
- List all available tables.schema [table]
- Show the schema (fields) for a specific table or all tables.all [query]
- Run a query and display the results
For system-level queries that require elevated privileges, you may need to run osqueryi with sudo.
Example Queries in Interactive Mode
Here are some basic examples to get you started:
- List all users on the system:
1
SELECT uid, username, directory FROM users LIMIT 5;
- Get system hardware information:
1
SELECT cpu_brand, hardware_vendor, hardware_model FROM system_info;
- Check disk space:
1
SELECT path, type, round((blocks_available * blocks_size * 10e-10), 2) AS gigs_free FROM mounts WHERE path='/';
You can also run queries directly from the command line without entering the interactive shell:
1
2
3
4
5
6
7
8
# Run a query directly
osqueryi "SELECT * FROM users;"
# Pipe a query into osqueryi
echo "SELECT * FROM users;" | osqueryi
# Get JSON output
osqueryi --json "SELECT * FROM users;"
Osquery Daemon Mode (osqueryd)
The daemon mode (osqueryd) runs in the background to execute scheduled queries and log the results. It’s ideal for continuous monitoring of your system.
To start the osquery daemon:
1
sudo osqueryctl start
The osqueryctl tool helps you manage various aspects of your Osquery installation, including starting, stopping, and configuring the Osquery daemon.
Tables and Schema
Osquery organizes system data into tables, where each table stores information about a specific aspect of your system. For example, the “processes” table contains information about all running processes.
To see all available tables:
1
.tables
To examine the schema of a specific table:
1
.schema processes
Some of the most commonly used tables include:
processes
- Running processesusers
- System userslistening_ports
- Network ports being listened onsystem_info
- General system informationmounts
- Mounted filesystemsarp_cache
- ARP cache entriesinstalled_packages
- Installed software packages
Writing Effective Queries
Let’s look at some practical SQL queries that demonstrate Osquery’s power:
- Find processes that are listening on open ports:
1 2 3 4
SELECT DISTINCT processes.name, listening_ports.port, processes.pid FROM listening_ports JOIN processes USING (pid) WHERE listening_ports.address = '0.0.0.0';
- Find macOS LaunchDaemons that launch executables and keep them running:
1 2 3 4
SELECT name, program || program_arguments AS executable FROM launchd WHERE (run_at_load = 1 AND keep_alive = 1) AND (program != '' OR program_arguments != '');
- Check for ARP anomalies:
1 2 3 4
SELECT address, mac, COUNT(mac) AS mac_count FROM arp_cache GROUP BY mac HAVING count(mac) > 1;
When writing queries, remember to:
- Use JOINs to combine data from multiple tables
- Apply WHERE clauses to filter results
- Use GROUP BY for aggregation
- Keep queries focused and optimized
Configuration Files
Osquery’s configuration is managed through JSON files. The default configuration file location varies by platform:
- Linux:
/etc/osquery/osquery.conf
- macOS:
/var/osquery/osquery.conf
or/usr/local/etc/osquery/osquery.conf
- Windows:
C:\ProgramData\osquery\osquery.conf
You can override this path using the --config_path
flag:
1
osqueryd --config_path=/path/to/osquery.conf
Configuration Structure
A basic configuration file has the following structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
"options": {
"config_plugin": "filesystem",
"logger_plugin": "filesystem",
"schedule_splay_percent": 10
},
"schedule": {
"system_info": {
"query": "SELECT hostname, cpu_brand, physical_memory FROM system_info;",
"interval": 3600
}
},
"packs": {
"osquery-monitoring": "/usr/share/osquery/packs/osquery-monitoring.conf",
"incident-response": "/usr/share/osquery/packs/incident-response.conf"
}
}
The main sections include:
options
- Global settings for osqueryschedule
- Scheduled queries that run at specified intervalspacks
- Query packs (predefined groups of queries)
Query Packs
Query packs are collections of queries organized by category. They help you define and manage your monitoring schedule. Osquery comes with several built-in packs for common monitoring tasks:
incident-response
- Queries for security incident investigationit-compliance
- Compliance-related queriesosquery-monitoring
- Monitoring osquery itselfvulnerability-management
- Identifying potential security issueshardware-monitoring
- Hardware-related information
Example Pack Configuration
A pack can be defined directly in the main configuration or in a separate file:
1
2
3
4
5
6
7
8
9
10
11
12
{
"queries": {
"network_interfaces": {
"query": "SELECT interface, address, mask, broadcast FROM interface_addresses WHERE NOT interface='lo';",
"interval": 3600,
"description": "Retrieves the interfaces and their associated configuration"
}
},
"discovery": [
"SELECT pid FROM processes WHERE name = 'mysqld';"
]
}
The discovery
section contains queries that determine whether the pack should run. If all discovery queries return at least one row, the pack’s queries will be added to the schedule.
Scheduled Queries
Scheduled queries are defined in the configuration file and run at specified intervals. This is useful for continuous monitoring:
1
2
3
4
5
6
7
8
9
10
11
12
{
"schedule": {
"processes": {
"query": "SELECT name, pid FROM processes;",
"interval": 60
},
"users_logged_in": {
"query": "SELECT user, tty, host, time FROM logged_in_users;",
"interval": 300
}
}
}
Each query has a unique name, the SQL query to execute, and an interval in seconds determining how frequently it runs.
File Integrity Monitoring
File Integrity Monitoring (FIM) allows you to track changes to critical files on your system. Osquery can monitor file modifications, additions, and deletions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"options": {
"enable_file_events": "true"
},
"file_paths": {
"etc": [
"/etc/%%"
],
"bin": [
"/bin/%%",
"/sbin/%%"
]
}
}
You can then use the file_events
table to query for changes:
1
SELECT target_path, time, action, category FROM file_events;
Event-Based Tables
Event-based tables in Osquery allow you to monitor real-time events rather than just point-in-time information. These tables include:
file_events
- File system changesprocess_events
- Process creation/terminationsocket_events
- Network socket activitiesuser_events
- User authentication events
To use these tables, you need to enable the relevant event subscribers in your configuration:
1
2
3
4
5
6
7
{
"options": {
"disable_events": "false",
"enable_file_events": "true",
"enable_process_events": "true"
}
}
Osquery Extensions
Osquery’s functionality can be extended through extensions, which add new tables or functions. Extensions can be written in Python, Go, or C++.
To load an extension:
1
osqueryi --extension=/path/to/extension.ext
In the configuration file:
1
2
3
4
5
{
"options": {
"extensions_autoload": "/etc/osquery/extensions.load"
}
}
Where extensions.load
contains paths to extensions.
Distributed Queries
Distributed queries allow you to run ad-hoc queries across multiple endpoints simultaneously. This is useful for fleet-wide investigations or data collection.
To facilitate distributed queries at scale, you’ll need a fleet management solution. Some popular open-source options include:
- Fleet - A Golang-powered fleet manager originally developed by Kolide
- Zentral - A framework for capturing events from various sources
- Doorman - One of the initial osquery fleet managers
Security Monitoring Use Cases
Osquery excels at security monitoring tasks. Here are some common use cases:
- Detect unauthorized processes:
1
SELECT name, path, pid FROM processes WHERE name NOT IN (SELECT name FROM allowed_processes);
- Monitor for suspicious network connections:
1 2 3 4
SELECT p.name, p.path, lp.port, lp.address FROM processes p JOIN listening_ports lp ON p.pid = lp.pid WHERE lp.port IN (4444, 5555, 6666);
- Identify unauthorized users:
1 2
SELECT username, uid, gid FROM users WHERE username NOT IN (SELECT username FROM authorized_users);
- Check for compliance with security policies:
1
SELECT * FROM mounts WHERE device LIKE '/dev/%' AND flags NOT LIKE '%noexec%';
- File integrity monitoring for critical system files:
1 2
SELECT target_path, time, action FROM file_events WHERE target_path LIKE '/etc/%' OR target_path LIKE '/bin/%';
Performance Optimization
To optimize Osquery’s performance, consider these best practices:
- Use specific queries rather than wildcard selects
- Limit the result set when possible
- Optimize join operations
- Adjust query intervals based on importance
- Use discovery queries to run packs only when needed
Osquery was designed with performance in mind. The team at Facebook designed it to be simple, performant, and reliable, so it won’t significantly impact the systems it’s monitoring.
Logging and Data Collection
Osquery can output results in different formats for integration with other tools:
1
2
3
4
5
# Output in CSV format
osqueryi --csv "SELECT * FROM processes;" > processes.csv
# Output in JSON format
osqueryi --json "SELECT * FROM processes;" > processes.json
For the daemon mode, logs are typically stored in:
- Linux:
/var/log/osquery/
- macOS:
/var/log/osquery/
- Windows:
C:\ProgramData\osquery\logs\
Integration with Other Tools
Osquery data can be integrated with various tools for enhanced monitoring and analysis:
- SIEM Solutions - Send osquery logs to Splunk, ELK Stack, or other SIEM platforms
- Monitoring Tools - Integrate with Prometheus for metrics collection
- Configuration Management - Deploy and manage osquery using Chef, Ansible, or Puppet
- Threat Intelligence Platforms - Correlate osquery data with threat feeds
- Visualization Tools - Create dashboards using Grafana or Kibana
Advanced Deployment at Scale
Deploying osquery at scale requires consideration of:
- Deployment Automation - Use configuration management tools like Chef, Ansible, or Puppet
- Fleet Management - Implement a solution for managing configurations and distributed queries
- Data Transport - Consider how data will be securely transmitted from endpoints
- Data Aggregation - Set up systems to collect and process data from multiple endpoints
- Storage - Plan for storing potentially large volumes of data
- Analysis - Implement solutions for analyzing and acting on the collected data
Troubleshooting
Common issues and their solutions:
- High CPU Usage
- Optimize query intervals
- Reduce the number of scheduled queries
- Check for inefficient queries
- Memory Leaks
- Update to the latest version
- Check for problematic extensions
- Configuration Issues
- Validate JSON syntax
- Check file permissions
- Ensure paths are correct
- Query Failures
- Verify table existence
- Check for syntax errors
- Ensure appropriate permissions
Resources and Further Learning
To continue your osquery journey:
- Official Documentation
- Community Resources
- Osquery Slack Channel
- Osquery Forums
- Books and Tutorials
- “Mastering Osquery” by Daniel Sullivan
- Online tutorials from security blogs
- Query Packs and Examples
- Osquery Packs Repository
- Community-shared query collections
By mastering osquery, you gain a powerful tool for system visibility, security monitoring, and IT operations management across your entire infrastructure.