Java advanced 4 min read

Streams API

The Streams API (introduced in Java 8) lets you express data processing as a declarative pipeline rather than imperative loops. A stream is not a data structure — it does not store elements. Instead it carries values from a source through a chain of operations, computing results on demand. This shift from how to what makes code shorter, more readable, and trivially parallelizable.

Anatomy of a Stream Pipeline

Every pipeline has three parts:

Source — a collection, array, generator, or I/O channel that produces elements.
Intermediate operations — lazy transformations (map, filter, sorted) that return a new stream.
Terminal operation — triggers execution and produces a result or side effect (collect, reduce, forEach).

import java.util.List;

List<String> names = List.of("Ada", "Linus", "Grace", "Alan");

long count = names.stream()        // source
    .filter(n -> n.length() > 3)   // intermediate
    .map(String::toUpperCase)      // intermediate
    .count();                      // terminal

System.out.println(count);

Output:

Note: A stream can be consumed only once. After a terminal operation, the stream is closed; reusing it throws IllegalStateException.

map, filter, reduce

These three operations cover the majority of transformations.

map applies a function to each element, producing a 1:1 transformed stream.
filter keeps elements matching a predicate.
reduce folds the stream into a single value using an associative accumulator.

import java.util.List;

List<Integer> nums = List.of(1, 2, 3, 4, 5, 6);

int sumOfSquaresOfEvens = nums.stream()
    .filter(n -> n % 2 == 0)   // 2, 4, 6
    .map(n -> n * n)           // 4, 16, 36
    .reduce(0, Integer::sum);  // 56

System.out.println(sumOfSquaresOfEvens);

Output:

The two-argument reduce(identity, accumulator) returns a plain value; the single-argument form returns an Optional because an empty stream has no result.

Collecting Results with Collectors

collect is the most versatile terminal operation. The Collectors factory provides ready-made reductions.

import java.util.*;
import java.util.stream.Collectors;

record Person(String name, String dept) {}

List<Person> people = List.of(
    new Person("Ada", "Eng"),
    new Person("Linus", "Eng"),
    new Person("Grace", "Sales")
);

// toList
List<String> names = people.stream()
    .map(Person::name)
    .collect(Collectors.toList());

// groupingBy
Map<String, List<Person>> byDept = people.stream()
    .collect(Collectors.groupingBy(Person::dept));

// joining
String roster = people.stream()
    .map(Person::name)
    .collect(Collectors.joining(", ", "[", "]"));

System.out.println(names);
System.out.println(byDept.keySet());
System.out.println(roster);

Output:

[Ada, Linus, Grace]
[Eng, Sales]
[Ada, Linus, Grace]

Collector	Produces	Typical use
`toList()` / `toSet()`	`List` / `Set`	Materialize results
`toMap(k, v)`	`Map`	Index by key
`groupingBy(fn)`	`Map<K, List<V>>`	Bucket elements
`partitioningBy(pred)`	`Map<Boolean, List<V>>`	Split true/false
`joining(sep, pre, post)`	`String`	Concatenate text
`counting()` / `summingInt()`	`Long` / `Integer`	Aggregations

Tip: Use Collectors.groupingBy(classifier, downstream) to group and reduce in one pass, e.g. groupingBy(Person::dept, Collectors.counting()).

Lazy Evaluation

Intermediate operations do nothing until a terminal operation runs. The pipeline then processes elements one at a time, vertically, enabling short-circuiting.

import java.util.stream.Stream;

Optional<Integer> first = Stream.of(1, 2, 3, 4, 5)
    .peek(n -> System.out.println("peek " + n))
    .filter(n -> n % 2 == 0)
    .findFirst();

Output:

peek 1
peek 2

Only two elements are inspected because findFirst short-circuits once a match is found — proof the pipeline is pull-based, not push-based.

Parallel Streams and Their Caveats

parallelStream() (or .parallel()) splits work across the common ForkJoinPool. It can speed up CPU-bound work on large datasets, but it is not free.

long total = nums.parallelStream()
    .filter(n -> n % 2 == 0)
    .mapToLong(Integer::longValue)
    .sum();

Warning: Parallel streams require stateless, non-interfering, associative operations. Never mutate shared state from a lambda, and avoid them for small collections or I/O-bound tasks where coordination overhead dominates.

Best Practices

Prefer streams for declarative transformations; keep loops for complex stateful logic.
Keep lambdas pure and side-effect-free.
Use primitive streams (IntStream, LongStream) to avoid boxing in numeric pipelines.
Reach for Collectors.toUnmodifiableList() when results should be immutable.
Measure before parallelizing — default to sequential streams.
Avoid storing streams in fields or returning reusable streams; return collections instead.

Interview Questions

Q: What is the difference between intermediate and terminal operations? Intermediate operations are lazy and return a new stream (map, filter). Terminal operations trigger execution and produce a result or side effect (collect, count). A pipeline does nothing until a terminal operation is invoked.

Q: Why can a stream be consumed only once? A stream models a one-shot traversal of a source. Once a terminal operation runs the stream is closed; reusing it throws IllegalStateException. Create a new stream from the source instead.

Q: When would a parallel stream hurt performance? For small datasets, I/O-bound work, or operations with non-associative accumulators or shared mutable state. Splitting and merging overhead plus contention can outweigh any gains.

Q: What is the difference between map and flatMap? map produces one output per input (1:1). flatMap maps each element to a stream and flattens the results into a single stream, useful for un-nesting collections of collections.