Sandbox
omegaUp uses Minijail, a sandboxing tool originally developed by Google for Chrome OS, to securely execute user-submitted code. The sandbox provides robust isolation to prevent malicious code from affecting the system.
Overview
The sandbox intercepts and controls all system interactions from user code, ensuring:
- Process Isolation: Code runs in isolated namespaces
- Syscall Filtering: Dangerous system calls are blocked
- Resource Limits: Memory, CPU, and file access are restricted
- Network Blocking: No network access from submitted code
Architecture
flowchart TD
subgraph Runner
Code[User Code] --> Minijail
subgraph Minijail[Minijail Sandbox]
Namespaces[Namespace Isolation]
Seccomp[Seccomp-BPF Filter]
Limits[Resource Limits]
end
Minijail --> Kernel[Linux Kernel]
end
Kernel --> |Allowed| Execute[Execute Syscall]
Kernel --> |Blocked| Kill[Kill Process]
Security Layers
1. Linux Namespaces
Minijail uses namespaces to isolate the process:
| Namespace | Isolation Provided |
|---|---|
| PID | Process ID isolation - can't see other processes |
| NET | Network isolation - no network access |
| MNT | Mount isolation - restricted filesystem view |
| IPC | Inter-process communication isolation |
| USER | User ID mapping - runs as unprivileged user |
| UTS | Hostname isolation |
2. Seccomp-BPF Filtering
System calls are filtered using seccomp-BPF:
// Allowed syscalls (essential for execution)
read, write, open, close, fstat, mmap, mprotect,
munmap, brk, exit_group, arch_prctl, access,
execve, getpid, getuid, getgid, geteuid, getegid
// Blocked syscalls (dangerous operations)
socket, connect, bind, listen, accept, // No networking
fork, clone, vfork, // No process creation
kill, tkill, tgkill, // No signal sending
ptrace, // No debugging
mount, umount, pivot_root, // No filesystem changes
3. Resource Limits
Enforced limits using setrlimit:
| Resource | Typical Limit | Purpose |
|---|---|---|
| CPU Time | 1-60 seconds | Prevent infinite loops |
| Memory | 256 MB | Prevent memory exhaustion |
| File Size | 64 MB | Limit output |
| Open Files | 20 | Prevent file descriptor exhaustion |
| Processes | 1 | No forking |
| Stack Size | 8 MB | Prevent stack overflow |
Syscall Handling
The sandbox can handle system calls in three ways:
Allow
Harmless syscalls proceed normally:
read() → Allow (needed for input)
write() → Allow (needed for output)
mmap() → Allow (needed for memory allocation)
Block/Kill
Dangerous syscalls terminate the process:
socket() → EPERM (no networking)
fork() → EPERM (no process creation)
ptrace() → SIGKILL (no debugging)
Replace/Emulate
Some syscalls are replaced with safe alternatives:
getpid() → Returns fixed value
gettimeofday() → Returns controlled time
Execution Flow
sequenceDiagram
participant Runner
participant Minijail
participant Kernel
participant Code
Runner->>Minijail: Start sandboxed process
Minijail->>Kernel: Create namespaces
Minijail->>Kernel: Apply seccomp filter
Minijail->>Kernel: Set resource limits
Minijail->>Code: Execute user code
loop Each syscall
Code->>Kernel: System call
Kernel->>Minijail: Check filter
alt Allowed
Minijail->>Kernel: Execute
Kernel->>Code: Return result
else Blocked
Minijail->>Code: Return error/kill
end
end
Code->>Runner: Exit
Runner->>Runner: Collect results
Filesystem Isolation
Visible Paths
User code can only see:
/usr/lib/ # Shared libraries (read-only)
/lib/ # System libraries (read-only)
/tmp/ # Temporary files (read-write, limited)
/dev/null # Null device
/dev/urandom # Random numbers (limited)
Hidden Paths
Protected from user code:
/home/ # User data
/etc/ # System configuration
/proc/ # Process information (mostly)
/sys/ # System information
/var/ # Variable data
Language-Specific Configurations
Different languages require different sandbox profiles:
C/C++
Syscalls: minimal set
Filesystem: standard libraries only
Memory: direct allocation
Java
Syscalls: extended for JVM
Filesystem: JRE paths added
Memory: JVM heap management
Python
Syscalls: interpreter requirements
Filesystem: Python stdlib paths
Memory: interpreter overhead
Interpreted Languages
Additional considerations:
- Interpreter binaries accessible
- Standard libraries in path
- Module import restrictions
Minijail Configuration
Command Line Options
minijail0 \
-c 0 # No capabilities
-n # No new privileges
-v # Mount namespace
-p # PID namespace
-l # IPC namespace
-e # Network namespace
-r # Remount /proc read-only
-t # Mount tmpfs at /tmp
-b /usr/lib,/usr/lib,0 # Bind mount (read-only)
-S /path/to/policy.bpf # Seccomp policy
-T static # Static seccomp
-- /path/to/program # Program to run
Seccomp Policy File
# policy.bpf - Example seccomp policy
read: 1
write: 1
open: 1
close: 1
fstat: 1
mmap: arg2 in ~0x4 # No PROT_EXEC with PROT_WRITE
munmap: 1
brk: 1
exit_group: 1
Error Detection
Runtime Error Signals
| Signal | Meaning | Common Cause |
|---|---|---|
| SIGSEGV | Segmentation fault | Invalid memory access |
| SIGFPE | Floating point exception | Division by zero |
| SIGABRT | Abort | Assertion failure |
| SIGKILL | Killed | Syscall violation |
| SIGXCPU | CPU time exceeded | Infinite loop |
Sandbox Violations
Detected and reported as RTE with details:
Syscall violation: socket (blocked)
Memory limit exceeded: 267386880 > 268435456
Time limit exceeded: 1.023s > 1.000s
Comparison with Alternatives
| Feature | Minijail | Docker | chroot | ptrace |
|---|---|---|---|---|
| Overhead | Very Low | Low | Very Low | High |
| Isolation | Strong | Strong | Weak | Medium |
| Syscall Filter | Yes | Yes | No | Yes |
| Namespace Support | Yes | Yes | No | No |
| Resource Limits | Yes | Yes | No | No |
| Setup Complexity | Medium | Low | Low | High |
Security Considerations
Defense in Depth
The sandbox is one layer of multiple security measures:
- Sandbox (Minijail) - Process isolation
- Container (Docker) - Service isolation
- Network - Firewall rules
- Authentication - Access control
Known Limitations
- Cannot prevent all timing attacks
- Some information leakage through resource usage
- Requires Linux kernel features
Escape Prevention
Regular security audits check for:
- Syscall filter bypasses
- Namespace escape techniques
- Resource limit circumvention
Troubleshooting
Common Issues
Sandbox initialization failed:
# Check kernel capabilities
cat /proc/sys/kernel/unprivileged_userns_clone
# Should be 1
Syscall blocked unexpectedly:
# Run with verbose logging
minijail0 -L -- /path/to/program
Memory limit issues:
# Check cgroup limits
cat /sys/fs/cgroup/memory/omegaup/memory.limit_in_bytes
Related Documentation
- Runner Internals - Code execution details
- Grader Internals - Submission processing
- Security - Overall security architecture
- Verdicts - Understanding execution results