Beyond CRuby: True Parallel Ruby with JRuby and TruffleRuby (Part 4)

Throughout this series, we’ve explored CRuby’s concurrency model - threads limited by the GVL, cooperative Fibers, and isolated Ractors. But what if you could have threads that truly run in parallel across multiple CPU cores?

Breaking Free from the GVL

While CRuby implements the Global VM Lock for thread safety, Ruby the language doesn’t mandate this. Alternative Ruby implementations can and do provide true parallel threading:

  • JRuby: Ruby on the Java Virtual Machine with real parallel threads
  • TruffleRuby: High-performance Ruby with parallel execution via GraalVM

Let’s explore how these implementations deliver the parallelism that CRuby can’t.

JRuby: Ruby on the JVM

JRuby runs Ruby code on the Java Virtual Machine, leveraging Java’s mature threading model. This means threads in JRuby are actual OS threads that can execute Ruby code simultaneously.

Setting Up JRuby

First, ensure you have Java installed (JRuby requires Java 8 or higher):

1
2
3
4
5
# Check Java version
java -version

# Install Java if needed (macOS example)
brew install openjdk@17

Then install JRuby:

1
2
3
4
5
6
7
8
9
10
11
# Using mise
mise install ruby@jruby-9.4.13.0
mise use ruby@jruby-9.4.13.0

# Using rbenv
rbenv install jruby-9.4.13.0
rbenv local jruby-9.4.13.0

# Or using RVM
rvm install jruby
rvm use jruby

True Parallel Execution

Remember our CPU-intensive Fibonacci example from Part 1? Let’s see how it performs with JRuby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
require 'benchmark'

def fibonacci(n)
  return n if n <= 1
  fibonacci(n - 1) + fibonacci(n - 2)
end

# Single-threaded
time1 = Benchmark.realtime do
  4.times { fibonacci(35) }
end

# Multi-threaded
time2 = Benchmark.realtime do
  threads = 4.times.map do
    Thread.new { fibonacci(35) }
  end
  threads.each(&:join)
end

puts "Single-threaded: #{time1.round(2)}s"
puts "Multi-threaded: #{time2.round(2)}s"
puts "Speedup: #{(time1/time2).round(2)}x"

# JRuby Output:
# Single-threaded: 1.95s
# Multi-threaded: 0.51s
# Speedup: 3.78x

# CRuby Output (for comparison):
# Single-threaded: 2.93s
# Multi-threaded: 2.84s
# Speedup: 1.03x

JRuby achieves nearly 4x speedup on a 4-core machine - true parallel execution!

Thread Safety Becomes Critical

With real parallelism comes real danger. Race conditions that might be hidden in CRuby become visible in JRuby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# This code is MORE dangerous in JRuby!
counter = 0

threads = 100.times.map do
  Thread.new do
    1000.times do
      counter += 1  # Multiple threads REALLY access this simultaneously
    end
  end
end

threads.each(&:join)
puts "Counter: #{counter}"

# CRuby: Often gets close to 100,000 (GVL provides some protection)
# JRuby: Counter: 68371 (or similar) - real race conditions!

Always use proper synchronization in JRuby:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
require 'thread'
counter = 0
mutex = Mutex.new

threads = 100.times.map do
  Thread.new do
    1000.times do
      mutex.synchronize { counter += 1 }
    end
  end
end

threads.each(&:join)
puts "Counter: #{counter}"  # Always 100,000

JRuby-Specific Features

JRuby provides additional concurrency tools from the Java ecosystem. You can leverage Java’s battle-tested concurrent data structures and atomic operations directly from Ruby code, eliminating the need for manual mutex management in many cases:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Using Java's concurrent data structures
require 'java'
java_import 'java.util.concurrent.ConcurrentHashMap'
java_import 'java.util.concurrent.atomic.AtomicInteger'

# Thread-safe hash without explicit locking
safe_hash = ConcurrentHashMap.new

threads = 10.times.map do |i|
  Thread.new do
    1000.times do |j|
      safe_hash.put("thread_#{i}_item_#{j}", j * i)
    end
  end
end

threads.each(&:join)
puts "Hash size: #{safe_hash.size}"  # Always 10,000

# Atomic operations
counter = AtomicInteger.new(0)

threads = 100.times.map do
  Thread.new do
    1000.times { counter.increment_and_get }
  end
end

threads.each(&:join)
puts "Atomic counter: #{counter.get}"  # Always 100,000

Leveraging Java Thread Pools

Java’s ExecutorService provides sophisticated thread pool management with built-in queuing, scheduling, and lifecycle control. This example shows how to efficiently process multiple tasks using a fixed-size thread pool instead of creating threads manually:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
require 'java'
java_import 'java.util.concurrent.Executors'

# Create a fixed thread pool
executor = Executors.new_fixed_thread_pool(4)

# Submit tasks
futures = 20.times.map do |i|
  executor.submit do
    result = fibonacci(30)
    puts "Task #{i} completed: #{result}"
    result
  end
end

# Get results
results = futures.map(&:get)
executor.shutdown

puts "All tasks completed. Sum: #{results.sum}"

TruffleRuby: High-Performance Polyglot Ruby

TruffleRuby, built on GraalVM, offers not just parallel threads but also advanced JIT compilation and polyglot capabilities.

Setting Up TruffleRuby

1
2
3
4
5
6
7
8
9
10
# Using mise
mise install ruby@truffleruby
mise use ruby@truffleruby

# Using rbenv
rbenv install truffleruby
rbenv local truffleruby

# Or download directly
# Visit: https://github.com/oracle/truffleruby/releases

Parallel Performance

TruffleRuby threads behave similarly to JRuby - true parallel execution. This example demonstrates TruffleRuby’s impressive performance on CPU-intensive matrix multiplication, where parallel threads can utilize all available CPU cores:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
require 'benchmark'

# Matrix multiplication - CPU intensive
def matrix_multiply(size)
  a = Array.new(size) { Array.new(size) { rand } }
  b = Array.new(size) { Array.new(size) { rand } }
  c = Array.new(size) { Array.new(size, 0) }

  size.times do |i|
    size.times do |j|
      size.times do |k|
        c[i][j] += a[i][k] * b[k][j]
      end
    end
  end
  c
end

# Parallel matrix operations
matrices = 8.times.map { 100 }

time1 = Benchmark.realtime do
  matrices.map { |size| matrix_multiply(size) }
end

time2 = Benchmark.realtime do
  threads = matrices.map do |size|
    Thread.new { matrix_multiply(size) }
  end
  threads.map(&:value)
end

puts "Sequential: #{time1.round(2)}s"
puts "Parallel: #{time2.round(2)}s"
puts "Speedup: #{(time1/time2).round(2)}x"

# TruffleRuby Output (8-core machine):
# Sequential: 4.82s
# Parallel: 0.78s
# Speedup: 6.18x

TruffleRuby’s Polyglot Features

One of TruffleRuby’s unique advantages is its ability to seamlessly interoperate with other GraalVM languages like JavaScript, Python, and Java. This enables you to leverage libraries from different ecosystems within a single Ruby application:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Access JavaScript from Ruby
js_array = Polyglot.eval('js', '[1, 2, 3, 4, 5]')
js_array.each { |n| puts n * 2 }

# Use Java classes
Polyglot.eval('java', 'java.util.concurrent.ConcurrentLinkedQueue').new

# Share objects between languages
ruby_proc = proc { |x| x * 2 }
Polyglot.export('double_func', ruby_proc)

# JavaScript can now use the Ruby proc
result = Polyglot.eval('js', 'Polyglot.import("double_func")(21)')
puts result  # 42

Practical Patterns for Parallel Ruby

CPU-Bound Work Distribution

When you have CPU-intensive work to distribute across multiple cores, this pattern creates an optimal number of threads based on available processors and divides the work evenly among them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Works efficiently in JRuby/TruffleRuby
def parallel_map(array, &block)
  # Determine optimal thread count
  thread_count = [array.size, Etc.nprocessors].min
  slice_size = (array.size / thread_count.to_f).ceil

  threads = array.each_slice(slice_size).map do |slice|
    Thread.new { slice.map(&block) }
  end

  threads.flat_map(&:value)
end

# Process data in parallel
numbers = (1..1000).to_a
results = parallel_map(numbers) { |n| n ** 2 }

Parallel File Processing

This pattern demonstrates a worker pool approach for processing multiple files in parallel. It uses thread-safe queues to coordinate work distribution and result collection, making it ideal for batch processing tasks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
require 'thread'

def parallel_file_processor(files, workers: 4)
  queue = Queue.new
  results = Queue.new

  # Add all files to queue
  files.each { |f| queue << f }

  # Create worker threads
  threads = workers.times.map do
    Thread.new do
      while (file = queue.pop(true) rescue nil)
        result = process_file(file)
        results << { file: file, result: result }
      end
    end
  end

  threads.each(&:join)

  # Collect results
  output = {}
  results.size.times do
    r = results.pop
    output[r[:file]] = r[:result]
  end
  output
end

def process_file(file)
  # Simulate CPU-intensive processing
  content = File.read(file)
  content.split.map(&:upcase).uniq.sort
end

Choosing the Right Implementation

When to Use JRuby

  • Java integration needed - Access to Java libraries and frameworks
  • CPU-intensive workloads - Scientific computing, data processing
  • Existing Java infrastructure - Deploy Ruby in Java environments
  • Mature threading model - Proven, stable parallel execution

When to Use TruffleRuby

  • Maximum performance - Advanced JIT compilation
  • Polyglot applications - Mix Ruby with JavaScript, Python, Java
  • Research/experimentation - Cutting-edge VM technology
  • C extension compatibility - Better than JRuby for many gems

When to Stick with CRuby

  • Gem compatibility - Best support for the Ruby ecosystem
  • Deployment simplicity - Widely supported, well-understood
  • I/O-bound workloads - GVL released during I/O operations
  • Lower memory usage - Generally uses less memory than JVM-based implementations (check out how much time it takes to start an irb session)

Migration Considerations

Thread Safety Audit

Moving from CRuby to JRuby/TruffleRuby requires careful review of your code. Race conditions that might be masked by the GVL in CRuby will become real bugs in truly parallel implementations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# CRuby - might work due to GVL
class Counter
  attr_accessor :value

  def initialize
    @value = 0
  end

  def increment
    @value += 1  # NOT thread-safe in JRuby/TruffleRuby
  end
end

# JRuby/TruffleRuby - proper synchronization required
class SafeCounter
  def initialize
    @value = 0
    @mutex = Mutex.new
  end

  def increment
    @mutex.synchronize { @value += 1 }
  end

  def value
    @mutex.synchronize { @value }
  end
end

Gem Compatibility

Not all gems work across all Ruby implementations, especially those with C extensions. Use platform-specific gems in your Gemfile to handle compatibility:

1
2
3
4
5
6
7
8
# In your Gemfile
platforms :jruby do
  gem 'jdbc-postgres'  # JRuby-specific database driver
end

platforms :mri do
  gem 'pg'  # CRuby-specific gem
end

Conclusion

JRuby and TruffleRuby prove that Ruby can deliver true parallel performance. While CRuby’s GVL simplifies thread safety, these alternative implementations show what’s possible when threads can genuinely run in parallel.

The choice of implementation depends on your needs:

  • CRuby for compatibility and simplicity
  • JRuby for Java integration and stable parallelism
  • TruffleRuby for maximum performance and polyglot capabilities

Understanding these options empowers you to choose the right tool for your concurrent Ruby applications. The GVL isn’t a limitation of Ruby - it’s an implementation choice that you can opt out of when true parallelism matters.

References

Prateek Choudhary
Prateek Choudhary
Technology Leader