Post

OSQuery - From Zero to Hero

Osquery is a powerful host instrumentation framework originally developed by Facebook that enables you to treat your operating system as a relational database. It allows you to query system information using SQL, making it a versatile tool for system monitoring, security analysis, and infrastructure management.

OSQuery - From Zero to Hero

Table of Contents

  1. Introduction to Osquery
  2. Understanding Osquery’s Architecture
  3. Installation Guide
  4. Basic Usage and First Queries
  5. Osquery Interactive Mode (osqueryi)
  6. Osquery Daemon Mode (osqueryd)
  7. Tables and Schema
  8. Writing Effective Queries
  9. Configuration Files
  10. Query Packs
  11. Scheduled Queries
  12. File Integrity Monitoring
  13. Event-Based Tables
  14. Osquery Extensions
  15. Distributed Queries
  16. Security Monitoring Use Cases
  17. Performance Optimization
  18. Logging and Data Collection
  19. Integration with Other Tools
  20. Advanced Deployment at Scale
  21. Troubleshooting
  22. Resources and Further Learning

Introduction to Osquery

One of the most powerful features of Osquery is its cross-platform compatibility - it works on macOS, Linux, and Windows, allowing you to use the same queries across different environments. This eliminates the need for custom scripts for each platform, providing consistent data collection regardless of the operating system.

Key Benefits of Osquery

  1. Cross-Platform Support: Works consistently across macOS, Linux, and Windows systems
  2. SQL Interface: Uses familiar SQL syntax, making it accessible for those with database experience
  3. Extensive Data Collection: Gathers rich system data including running processes, user logins, network connections, browser plugins, loaded kernel modules, hardware events, file hashes, and more
  4. Open Source: Benefits from community contributions and transparency
  5. Flexibility: Can be used for security monitoring, compliance checks, IT operations, and troubleshooting

Understanding Osquery’s Architecture

Osquery has two main components that serve different purposes:

  1. Osqueryi (Interactive Mode): A command-line interface for executing ad-hoc SQL queries directly on the system
  2. Osqueryd (Daemon Mode): A background service that runs scheduled queries and logs the results for analysis

Under the hood, Osquery uses SQLite as its query engine, leveraging SQLite’s query parsing, optimization, and execution functionality. However, it doesn’t use SQLite for data storage. Instead, it generates most data on-the-fly through “Virtual Tables” at query execution time. For persistent storage needs, Osquery uses an embedded RocksDB database.

Installation Guide

Let’s go through installation steps for different platforms:

Ubuntu/Debian Installation

For Ubuntu or Debian-based systems, you can use the official repository:

1
2
3
4
5
6
7
8
9
10
# Set up the repository key
export OSQUERY_KEY=1484120AC4E9F8A1A577AEEE97A80C63C9D8B80B
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys $OSQUERY_KEY

# Add the repository
sudo add-apt-repository 'deb [arch=amd64] https://pkg.osquery.io/deb deb main'

# Update package lists and install osquery
sudo apt update
sudo apt install osquery

macOS Installation

1
2
3
4
5
6
# Using Homebrew
brew update
brew install osquery

# Alternatively, download the package from the official website
# https://osquery.io/downloads

Windows Installation

Download the installer from the official website (https://osquery.io/downloads) and follow the installation instructions provided.

Basic Usage and First Queries

After installation, you can immediately start using Osquery’s interactive shell to run queries:

1
2
# Launch the interactive shell
osqueryi

This will open the interactive shell with a “Using a virtual database” message. From here, you can start querying your system using SQL syntax.

Osquery Interactive Mode (osqueryi)

The interactive mode allows you to run ad-hoc queries against your system. Here are some essential commands:

  • .help - Display help information
  • .tables - List all available tables
  • .schema [table] - Show the schema (fields) for a specific table or all tables
  • .all [query] - Run a query and display the results

For system-level queries that require elevated privileges, you may need to run osqueryi with sudo.

Example Queries in Interactive Mode

Here are some basic examples to get you started:

  1. List all users on the system:
    1
    
    SELECT uid, username, directory FROM users LIMIT 5;
    
  2. Get system hardware information:
    1
    
    SELECT cpu_brand, hardware_vendor, hardware_model FROM system_info;
    
  3. Check disk space:
    1
    
    SELECT path, type, round((blocks_available * blocks_size * 10e-10), 2) AS gigs_free FROM mounts WHERE path='/';
    

You can also run queries directly from the command line without entering the interactive shell:

1
2
3
4
5
6
7
8
# Run a query directly
osqueryi "SELECT * FROM users;"

# Pipe a query into osqueryi
echo "SELECT * FROM users;" | osqueryi

# Get JSON output
osqueryi --json "SELECT * FROM users;"

Osquery Daemon Mode (osqueryd)

The daemon mode (osqueryd) runs in the background to execute scheduled queries and log the results. It’s ideal for continuous monitoring of your system.

To start the osquery daemon:

1
sudo osqueryctl start

The osqueryctl tool helps you manage various aspects of your Osquery installation, including starting, stopping, and configuring the Osquery daemon.

Tables and Schema

Osquery organizes system data into tables, where each table stores information about a specific aspect of your system. For example, the “processes” table contains information about all running processes.

To see all available tables:

1
.tables

To examine the schema of a specific table:

1
.schema processes

Some of the most commonly used tables include:

  • processes - Running processes
  • users - System users
  • listening_ports - Network ports being listened on
  • system_info - General system information
  • mounts - Mounted filesystems
  • arp_cache - ARP cache entries
  • installed_packages - Installed software packages

Writing Effective Queries

Let’s look at some practical SQL queries that demonstrate Osquery’s power:

  1. Find processes that are listening on open ports:
    1
    2
    3
    4
    
    SELECT DISTINCT processes.name, listening_ports.port, processes.pid 
    FROM listening_ports 
    JOIN processes USING (pid) 
    WHERE listening_ports.address = '0.0.0.0';
    
  2. Find macOS LaunchDaemons that launch executables and keep them running:
    1
    2
    3
    4
    
    SELECT name, program || program_arguments AS executable 
    FROM launchd 
    WHERE (run_at_load = 1 AND keep_alive = 1) 
    AND (program != '' OR program_arguments != '');
    
  3. Check for ARP anomalies:
    1
    2
    3
    4
    
    SELECT address, mac, COUNT(mac) AS mac_count 
    FROM arp_cache 
    GROUP BY mac 
    HAVING count(mac) > 1;
    

When writing queries, remember to:

  • Use JOINs to combine data from multiple tables
  • Apply WHERE clauses to filter results
  • Use GROUP BY for aggregation
  • Keep queries focused and optimized

Configuration Files

Osquery’s configuration is managed through JSON files. The default configuration file location varies by platform:

  • Linux: /etc/osquery/osquery.conf
  • macOS: /var/osquery/osquery.conf or /usr/local/etc/osquery/osquery.conf
  • Windows: C:\ProgramData\osquery\osquery.conf

You can override this path using the --config_path flag:

1
osqueryd --config_path=/path/to/osquery.conf

Configuration Structure

A basic configuration file has the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "options": {
    "config_plugin": "filesystem",
    "logger_plugin": "filesystem",
    "schedule_splay_percent": 10
  },
  "schedule": {
    "system_info": {
      "query": "SELECT hostname, cpu_brand, physical_memory FROM system_info;",
      "interval": 3600
    }
  },
  "packs": {
    "osquery-monitoring": "/usr/share/osquery/packs/osquery-monitoring.conf",
    "incident-response": "/usr/share/osquery/packs/incident-response.conf"
  }
}

The main sections include:

  • options - Global settings for osquery
  • schedule - Scheduled queries that run at specified intervals
  • packs - Query packs (predefined groups of queries)

Query Packs

Query packs are collections of queries organized by category. They help you define and manage your monitoring schedule. Osquery comes with several built-in packs for common monitoring tasks:

  • incident-response - Queries for security incident investigation
  • it-compliance - Compliance-related queries
  • osquery-monitoring - Monitoring osquery itself
  • vulnerability-management - Identifying potential security issues
  • hardware-monitoring - Hardware-related information

Example Pack Configuration

A pack can be defined directly in the main configuration or in a separate file:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "queries": {
    "network_interfaces": {
      "query": "SELECT interface, address, mask, broadcast FROM interface_addresses WHERE NOT interface='lo';",
      "interval": 3600,
      "description": "Retrieves the interfaces and their associated configuration"
    }
  },
  "discovery": [
    "SELECT pid FROM processes WHERE name = 'mysqld';"
  ]
}

The discovery section contains queries that determine whether the pack should run. If all discovery queries return at least one row, the pack’s queries will be added to the schedule.

Scheduled Queries

Scheduled queries are defined in the configuration file and run at specified intervals. This is useful for continuous monitoring:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "schedule": {
    "processes": {
      "query": "SELECT name, pid FROM processes;",
      "interval": 60
    },
    "users_logged_in": {
      "query": "SELECT user, tty, host, time FROM logged_in_users;",
      "interval": 300
    }
  }
}

Each query has a unique name, the SQL query to execute, and an interval in seconds determining how frequently it runs.

File Integrity Monitoring

File Integrity Monitoring (FIM) allows you to track changes to critical files on your system. Osquery can monitor file modifications, additions, and deletions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
  "options": {
    "enable_file_events": "true"
  },
  "file_paths": {
    "etc": [
      "/etc/%%"
    ],
    "bin": [
      "/bin/%%",
      "/sbin/%%"
    ]
  }
}

You can then use the file_events table to query for changes:

1
SELECT target_path, time, action, category FROM file_events;

Event-Based Tables

Event-based tables in Osquery allow you to monitor real-time events rather than just point-in-time information. These tables include:

  • file_events - File system changes
  • process_events - Process creation/termination
  • socket_events - Network socket activities
  • user_events - User authentication events

To use these tables, you need to enable the relevant event subscribers in your configuration:

1
2
3
4
5
6
7
{
  "options": {
    "disable_events": "false",
    "enable_file_events": "true",
    "enable_process_events": "true"
  }
}

Osquery Extensions

Osquery’s functionality can be extended through extensions, which add new tables or functions. Extensions can be written in Python, Go, or C++.

To load an extension:

1
osqueryi --extension=/path/to/extension.ext

In the configuration file:

1
2
3
4
5
{
  "options": {
    "extensions_autoload": "/etc/osquery/extensions.load"
  }
}

Where extensions.load contains paths to extensions.

Distributed Queries

Distributed queries allow you to run ad-hoc queries across multiple endpoints simultaneously. This is useful for fleet-wide investigations or data collection.

To facilitate distributed queries at scale, you’ll need a fleet management solution. Some popular open-source options include:

  • Fleet - A Golang-powered fleet manager originally developed by Kolide
  • Zentral - A framework for capturing events from various sources
  • Doorman - One of the initial osquery fleet managers

Security Monitoring Use Cases

Osquery excels at security monitoring tasks. Here are some common use cases:

  1. Detect unauthorized processes:
    1
    
    SELECT name, path, pid FROM processes WHERE name NOT IN (SELECT name FROM allowed_processes);
    
  2. Monitor for suspicious network connections:
    1
    2
    3
    4
    
    SELECT p.name, p.path, lp.port, lp.address 
    FROM processes p 
    JOIN listening_ports lp ON p.pid = lp.pid 
    WHERE lp.port IN (4444, 5555, 6666);
    
  3. Identify unauthorized users:
    1
    2
    
    SELECT username, uid, gid FROM users 
    WHERE username NOT IN (SELECT username FROM authorized_users);
    
  4. Check for compliance with security policies:
    1
    
    SELECT * FROM mounts WHERE device LIKE '/dev/%' AND flags NOT LIKE '%noexec%';
    
  5. File integrity monitoring for critical system files:
    1
    2
    
    SELECT target_path, time, action FROM file_events 
    WHERE target_path LIKE '/etc/%' OR target_path LIKE '/bin/%';
    

Performance Optimization

To optimize Osquery’s performance, consider these best practices:

  1. Use specific queries rather than wildcard selects
  2. Limit the result set when possible
  3. Optimize join operations
  4. Adjust query intervals based on importance
  5. Use discovery queries to run packs only when needed

Osquery was designed with performance in mind. The team at Facebook designed it to be simple, performant, and reliable, so it won’t significantly impact the systems it’s monitoring.

Logging and Data Collection

Osquery can output results in different formats for integration with other tools:

1
2
3
4
5
# Output in CSV format
osqueryi --csv "SELECT * FROM processes;" > processes.csv

# Output in JSON format
osqueryi --json "SELECT * FROM processes;" > processes.json

For the daemon mode, logs are typically stored in:

  • Linux: /var/log/osquery/
  • macOS: /var/log/osquery/
  • Windows: C:\ProgramData\osquery\logs\

Integration with Other Tools

Osquery data can be integrated with various tools for enhanced monitoring and analysis:

  1. SIEM Solutions - Send osquery logs to Splunk, ELK Stack, or other SIEM platforms
  2. Monitoring Tools - Integrate with Prometheus for metrics collection
  3. Configuration Management - Deploy and manage osquery using Chef, Ansible, or Puppet
  4. Threat Intelligence Platforms - Correlate osquery data with threat feeds
  5. Visualization Tools - Create dashboards using Grafana or Kibana

Advanced Deployment at Scale

Deploying osquery at scale requires consideration of:

  1. Deployment Automation - Use configuration management tools like Chef, Ansible, or Puppet
  2. Fleet Management - Implement a solution for managing configurations and distributed queries
  3. Data Transport - Consider how data will be securely transmitted from endpoints
  4. Data Aggregation - Set up systems to collect and process data from multiple endpoints
  5. Storage - Plan for storing potentially large volumes of data
  6. Analysis - Implement solutions for analyzing and acting on the collected data

Troubleshooting

Common issues and their solutions:

  1. High CPU Usage
    • Optimize query intervals
    • Reduce the number of scheduled queries
    • Check for inefficient queries
  2. Memory Leaks
    • Update to the latest version
    • Check for problematic extensions
  3. Configuration Issues
    • Validate JSON syntax
    • Check file permissions
    • Ensure paths are correct
  4. Query Failures
    • Verify table existence
    • Check for syntax errors
    • Ensure appropriate permissions

Resources and Further Learning

To continue your osquery journey:

  1. Official Documentation
  2. Community Resources
    • Osquery Slack Channel
    • Osquery Forums
  3. Books and Tutorials
    • “Mastering Osquery” by Daniel Sullivan
    • Online tutorials from security blogs
  4. Query Packs and Examples

By mastering osquery, you gain a powerful tool for system visibility, security monitoring, and IT operations management across your entire infrastructure.

This post is licensed under CC BY 4.0 by the author.