OSQuery - From Zero to Hero

Osquery is a powerful host instrumentation framework originally developed by Facebook that enables you to treat your operating system as a relational database. It allows you to query system information using SQL, making it a versatile tool for system monitoring, security analysis, and infrastructure management.

Posted Apr 9, 2025

By Jared

11 min read

Introduction to Osquery
Understanding Osquery’s Architecture
Installation Guide
Basic Usage and First Queries
Osquery Interactive Mode (osqueryi)
Osquery Daemon Mode (osqueryd)
Tables and Schema
Writing Effective Queries
Configuration Files
Query Packs
Scheduled Queries
File Integrity Monitoring
Event-Based Tables
Osquery Extensions
Distributed Queries
Security Monitoring Use Cases
Performance Optimization
Logging and Data Collection
Integration with Other Tools
Advanced Deployment at Scale
Troubleshooting
Resources and Further Learning

Introduction to Osquery

One of the most powerful features of Osquery is its cross-platform compatibility - it works on macOS, Linux, and Windows, allowing you to use the same queries across different environments. This eliminates the need for custom scripts for each platform, providing consistent data collection regardless of the operating system.

Key Benefits of Osquery

Cross-Platform Support: Works consistently across macOS, Linux, and Windows systems
SQL Interface: Uses familiar SQL syntax, making it accessible for those with database experience
Extensive Data Collection: Gathers rich system data including running processes, user logins, network connections, browser plugins, loaded kernel modules, hardware events, file hashes, and more
Open Source: Benefits from community contributions and transparency
Flexibility: Can be used for security monitoring, compliance checks, IT operations, and troubleshooting

Understanding Osquery’s Architecture

Osquery has two main components that serve different purposes:

Osqueryi (Interactive Mode): A command-line interface for executing ad-hoc SQL queries directly on the system
Osqueryd (Daemon Mode): A background service that runs scheduled queries and logs the results for analysis

Under the hood, Osquery uses SQLite as its query engine, leveraging SQLite’s query parsing, optimization, and execution functionality. However, it doesn’t use SQLite for data storage. Instead, it generates most data on-the-fly through “Virtual Tables” at query execution time. For persistent storage needs, Osquery uses an embedded RocksDB database.

Installation Guide

Let’s go through installation steps for different platforms:

Ubuntu/Debian Installation

For Ubuntu or Debian-based systems, you can use the official repository:

  
# Set up the repository key
export OSQUERY_KEY=1484120AC4E9F8A1A577AEEE97A80C63C9D8B80B
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys $OSQUERY_KEY

# Add the repository
sudo add-apt-repository 'deb [arch=amd64] https://pkg.osquery.io/deb deb main'

# Update package lists and install osquery
sudo apt update
sudo apt install osquery

macOS Installation

  
# Using Homebrew
brew update
brew install osquery

# Alternatively, download the package from the official website
# https://osquery.io/downloads

Windows Installation

Download the installer from the official website (https://osquery.io/downloads) and follow the installation instructions provided.

Basic Usage and First Queries

After installation, you can immediately start using Osquery’s interactive shell to run queries:

# Launch the interactive shell
osqueryi

This will open the interactive shell with a “Using a virtual database” message. From here, you can start querying your system using SQL syntax.

Osquery Interactive Mode (osqueryi)

The interactive mode allows you to run ad-hoc queries against your system. Here are some essential commands:

.help - Display help information
.tables - List all available tables
.schema [table] - Show the schema (fields) for a specific table or all tables
.all [query] - Run a query and display the results

For system-level queries that require elevated privileges, you may need to run osqueryi with sudo.

Example Queries in Interactive Mode

Here are some basic examples to get you started:

List all users on the system:

  
SELECT uid, username, directory FROM users LIMIT 5;

Get system hardware information:

  
SELECT cpu_brand, hardware_vendor, hardware_model FROM system_info;

Check disk space:

  
SELECT path, type, round((blocks_available * blocks_size * 10e-10), 2) AS gigs_free FROM mounts WHERE path='/';

You can also run queries directly from the command line without entering the interactive shell:

  
# Run a query directly
osqueryi "SELECT * FROM users;"

# Pipe a query into osqueryi
echo "SELECT * FROM users;" | osqueryi

# Get JSON output
osqueryi --json "SELECT * FROM users;"

Osquery Daemon Mode (osqueryd)

The daemon mode (osqueryd) runs in the background to execute scheduled queries and log the results. It’s ideal for continuous monitoring of your system.

To start the osquery daemon:

sudo osqueryctl start

The osqueryctl tool helps you manage various aspects of your Osquery installation, including starting, stopping, and configuring the Osquery daemon.

Tables and Schema

Osquery organizes system data into tables, where each table stores information about a specific aspect of your system. For example, the “processes” table contains information about all running processes.

To see all available tables:

.tables

To examine the schema of a specific table:

  
.schema processes

Some of the most commonly used tables include:

processes - Running processes
users - System users
listening_ports - Network ports being listened on
system_info - General system information
mounts - Mounted filesystems
arp_cache - ARP cache entries
installed_packages - Installed software packages

Writing Effective Queries

Let’s look at some practical SQL queries that demonstrate Osquery’s power:

Find processes that are listening on open ports:

  
SELECT DISTINCT processes.name, listening_ports.port, processes.pid 
FROM listening_ports 
JOIN processes USING (pid) 
WHERE listening_ports.address = '0.0.0.0';

Find macOS LaunchDaemons that launch executables and keep them running:

  
SELECT name, program || program_arguments AS executable 
FROM launchd 
WHERE (run_at_load = 1 AND keep_alive = 1) 
AND (program != '' OR program_arguments != '');

Check for ARP anomalies:

  
SELECT address, mac, COUNT(mac) AS mac_count 
FROM arp_cache 
GROUP BY mac 
HAVING count(mac) > 1;

When writing queries, remember to:

Use JOINs to combine data from multiple tables
Apply WHERE clauses to filter results
Use GROUP BY for aggregation
Keep queries focused and optimized

Configuration Files

Osquery’s configuration is managed through JSON files. The default configuration file location varies by platform:

Linux: /etc/osquery/osquery.conf
macOS: /var/osquery/osquery.conf or /usr/local/etc/osquery/osquery.conf
Windows: C:\ProgramData\osquery\osquery.conf

You can override this path using the --config_path flag:

osqueryd --config_path=/path/to/osquery.conf

Configuration Structure

A basic configuration file has the following structure:

  
{
  "options": {
    "config_plugin": "filesystem",
    "logger_plugin": "filesystem",
    "schedule_splay_percent": 10
  },
  "schedule": {
    "system_info": {
      "query": "SELECT hostname, cpu_brand, physical_memory FROM system_info;",
      "interval": 3600
    }
  },
  "packs": {
    "osquery-monitoring": "/usr/share/osquery/packs/osquery-monitoring.conf",
    "incident-response": "/usr/share/osquery/packs/incident-response.conf"
  }
}

The main sections include:

options - Global settings for osquery
schedule - Scheduled queries that run at specified intervals
packs - Query packs (predefined groups of queries)

Query Packs

Query packs are collections of queries organized by category. They help you define and manage your monitoring schedule. Osquery comes with several built-in packs for common monitoring tasks:

incident-response - Queries for security incident investigation
it-compliance - Compliance-related queries
osquery-monitoring - Monitoring osquery itself
vulnerability-management - Identifying potential security issues
hardware-monitoring - Hardware-related information

Example Pack Configuration

A pack can be defined directly in the main configuration or in a separate file:

  
{
  "queries": {
    "network_interfaces": {
      "query": "SELECT interface, address, mask, broadcast FROM interface_addresses WHERE NOT interface='lo';",
      "interval": 3600,
      "description": "Retrieves the interfaces and their associated configuration"
    }
  },
  "discovery": [
    "SELECT pid FROM processes WHERE name = 'mysqld';"
  ]
}

The discovery section contains queries that determine whether the pack should run. If all discovery queries return at least one row, the pack’s queries will be added to the schedule.

Scheduled Queries

Scheduled queries are defined in the configuration file and run at specified intervals. This is useful for continuous monitoring:

  
{
  "schedule": {
    "processes": {
      "query": "SELECT name, pid FROM processes;",
      "interval": 60
    },
    "users_logged_in": {
      "query": "SELECT user, tty, host, time FROM logged_in_users;",
      "interval": 300
    }
  }
}

Each query has a unique name, the SQL query to execute, and an interval in seconds determining how frequently it runs.

File Integrity Monitoring

File Integrity Monitoring (FIM) allows you to track changes to critical files on your system. Osquery can monitor file modifications, additions, and deletions:

  
{
  "options": {
    "enable_file_events": "true"
  },
  "file_paths": {
    "etc": [
      "/etc/%%"
    ],
    "bin": [
      "/bin/%%",
      "/sbin/%%"
    ]
  }
}

You can then use the file_events table to query for changes:

  
SELECT target_path, time, action, category FROM file_events;

Event-Based Tables

Event-based tables in Osquery allow you to monitor real-time events rather than just point-in-time information. These tables include:

file_events - File system changes
process_events - Process creation/termination
socket_events - Network socket activities
user_events - User authentication events

To use these tables, you need to enable the relevant event subscribers in your configuration:

  
{
  "options": {
    "disable_events": "false",
    "enable_file_events": "true",
    "enable_process_events": "true"
  }
}

Osquery Extensions

Osquery’s functionality can be extended through extensions, which add new tables or functions. Extensions can be written in Python, Go, or C++.

To load an extension:

osqueryi --extension=/path/to/extension.ext

In the configuration file:

  
{
  "options": {
    "extensions_autoload": "/etc/osquery/extensions.load"
  }
}

Where extensions.load contains paths to extensions.

Distributed Queries

Distributed queries allow you to run ad-hoc queries across multiple endpoints simultaneously. This is useful for fleet-wide investigations or data collection.

To facilitate distributed queries at scale, you’ll need a fleet management solution. Some popular open-source options include:

Fleet - A Golang-powered fleet manager originally developed by Kolide
Zentral - A framework for capturing events from various sources
Doorman - One of the initial osquery fleet managers

Security Monitoring Use Cases

Osquery excels at security monitoring tasks. Here are some common use cases:

Detect unauthorized processes:

  
SELECT name, path, pid FROM processes WHERE name NOT IN (SELECT name FROM allowed_processes);

Monitor for suspicious network connections:

  
SELECT p.name, p.path, lp.port, lp.address 
FROM processes p 
JOIN listening_ports lp ON p.pid = lp.pid 
WHERE lp.port IN (4444, 5555, 6666);

Identify unauthorized users:

  
SELECT username, uid, gid FROM users 
WHERE username NOT IN (SELECT username FROM authorized_users);

Check for compliance with security policies:

  
SELECT * FROM mounts WHERE device LIKE '/dev/%' AND flags NOT LIKE '%noexec%';

File integrity monitoring for critical system files:

  
SELECT target_path, time, action FROM file_events 
WHERE target_path LIKE '/etc/%' OR target_path LIKE '/bin/%';

Performance Optimization

To optimize Osquery’s performance, consider these best practices:

Use specific queries rather than wildcard selects
Limit the result set when possible
Optimize join operations
Adjust query intervals based on importance
Use discovery queries to run packs only when needed

Osquery was designed with performance in mind. The team at Facebook designed it to be simple, performant, and reliable, so it won’t significantly impact the systems it’s monitoring.

Logging and Data Collection

Osquery can output results in different formats for integration with other tools:

  
# Output in CSV format
osqueryi --csv "SELECT * FROM processes;" > processes.csv

# Output in JSON format
osqueryi --json "SELECT * FROM processes;" > processes.json

For the daemon mode, logs are typically stored in:

Linux: /var/log/osquery/
macOS: /var/log/osquery/
Windows: C:\ProgramData\osquery\logs\

Integration with Other Tools

Osquery data can be integrated with various tools for enhanced monitoring and analysis:

SIEM Solutions - Send osquery logs to Splunk, ELK Stack, or other SIEM platforms
Monitoring Tools - Integrate with Prometheus for metrics collection
Configuration Management - Deploy and manage osquery using Chef, Ansible, or Puppet
Threat Intelligence Platforms - Correlate osquery data with threat feeds
Visualization Tools - Create dashboards using Grafana or Kibana

Advanced Deployment at Scale

Deploying osquery at scale requires consideration of:

Deployment Automation - Use configuration management tools like Chef, Ansible, or Puppet
Fleet Management - Implement a solution for managing configurations and distributed queries
Data Transport - Consider how data will be securely transmitted from endpoints
Data Aggregation - Set up systems to collect and process data from multiple endpoints
Storage - Plan for storing potentially large volumes of data
Analysis - Implement solutions for analyzing and acting on the collected data

Troubleshooting

Common issues and their solutions:

High CPU Usage
- Optimize query intervals
- Reduce the number of scheduled queries
- Check for inefficient queries
Memory Leaks
- Update to the latest version
- Check for problematic extensions
Configuration Issues
- Validate JSON syntax
- Check file permissions
- Ensure paths are correct
Query Failures
- Verify table existence
- Check for syntax errors
- Ensure appropriate permissions

Resources and Further Learning

To continue your osquery journey:

Official Documentation
- Osquery Documentation
- Osquery GitHub Repository
Community Resources
- Osquery Slack Channel
- Osquery Forums
Books and Tutorials
- “Mastering Osquery” by Daniel Sullivan
- Online tutorials from security blogs
Query Packs and Examples
- Osquery Packs Repository
- Community-shared query collections

By mastering osquery, you gain a powerful tool for system visibility, security monitoring, and IT operations management across your entire infrastructure.

Development, Monitoring

database monitoring

This post is licensed under CC BY 4.0 by the author.

OSQuery - From Zero to Hero

Table of Contents

Introduction to Osquery

Key Benefits of Osquery

Understanding Osquery’s Architecture

Installation Guide

Ubuntu/Debian Installation

macOS Installation

Windows Installation

Basic Usage and First Queries

Osquery Interactive Mode (osqueryi)

Example Queries in Interactive Mode

Osquery Daemon Mode (osqueryd)

Tables and Schema

Writing Effective Queries

Configuration Files

Configuration Structure

Query Packs

Example Pack Configuration

Scheduled Queries

File Integrity Monitoring

Event-Based Tables

Osquery Extensions

Distributed Queries

Security Monitoring Use Cases

Performance Optimization

Logging and Data Collection

Integration with Other Tools

Advanced Deployment at Scale

Troubleshooting

Resources and Further Learning

Trending Tags