Streams API
The Streams API (introduced in Java 8) lets you express data processing as a declarative pipeline rather than imperative loops. A stream is not a data structure — it does not store elements. Instead it carries values from a source through a chain of operations, computing results on demand. This shift from how to what makes code shorter, more readable, and trivially parallelizable.
Anatomy of a Stream Pipeline
Every pipeline has three parts:
- Source — a collection, array, generator, or I/O channel that produces elements.
- Intermediate operations — lazy transformations (
map,filter,sorted) that return a new stream. - Terminal operation — triggers execution and produces a result or side effect (
collect,reduce,forEach).
import java.util.List;
List<String> names = List.of("Ada", "Linus", "Grace", "Alan");
long count = names.stream() // source
.filter(n -> n.length() > 3) // intermediate
.map(String::toUpperCase) // intermediate
.count(); // terminal
System.out.println(count);
Output:
3
Note: A stream can be consumed only once. After a terminal operation, the stream is closed; reusing it throws
IllegalStateException.
map, filter, reduce
These three operations cover the majority of transformations.
mapapplies a function to each element, producing a 1:1 transformed stream.filterkeeps elements matching a predicate.reducefolds the stream into a single value using an associative accumulator.
import java.util.List;
List<Integer> nums = List.of(1, 2, 3, 4, 5, 6);
int sumOfSquaresOfEvens = nums.stream()
.filter(n -> n % 2 == 0) // 2, 4, 6
.map(n -> n * n) // 4, 16, 36
.reduce(0, Integer::sum); // 56
System.out.println(sumOfSquaresOfEvens);
Output:
56
The two-argument reduce(identity, accumulator) returns a plain value; the single-argument form returns an Optional because an empty stream has no result.
Collecting Results with Collectors
collect is the most versatile terminal operation. The Collectors factory provides ready-made reductions.
import java.util.*;
import java.util.stream.Collectors;
record Person(String name, String dept) {}
List<Person> people = List.of(
new Person("Ada", "Eng"),
new Person("Linus", "Eng"),
new Person("Grace", "Sales")
);
// toList
List<String> names = people.stream()
.map(Person::name)
.collect(Collectors.toList());
// groupingBy
Map<String, List<Person>> byDept = people.stream()
.collect(Collectors.groupingBy(Person::dept));
// joining
String roster = people.stream()
.map(Person::name)
.collect(Collectors.joining(", ", "[", "]"));
System.out.println(names);
System.out.println(byDept.keySet());
System.out.println(roster);
Output:
[Ada, Linus, Grace]
[Eng, Sales]
[Ada, Linus, Grace]
| Collector | Produces | Typical use |
|---|---|---|
toList() / toSet() | List / Set | Materialize results |
toMap(k, v) | Map | Index by key |
groupingBy(fn) | Map<K, List<V>> | Bucket elements |
partitioningBy(pred) | Map<Boolean, List<V>> | Split true/false |
joining(sep, pre, post) | String | Concatenate text |
counting() / summingInt() | Long / Integer | Aggregations |
Tip: Use
Collectors.groupingBy(classifier, downstream)to group and reduce in one pass, e.g.groupingBy(Person::dept, Collectors.counting()).
Lazy Evaluation
Intermediate operations do nothing until a terminal operation runs. The pipeline then processes elements one at a time, vertically, enabling short-circuiting.
import java.util.stream.Stream;
Optional<Integer> first = Stream.of(1, 2, 3, 4, 5)
.peek(n -> System.out.println("peek " + n))
.filter(n -> n % 2 == 0)
.findFirst();
Output:
peek 1
peek 2
Only two elements are inspected because findFirst short-circuits once a match is found — proof the pipeline is pull-based, not push-based.
Parallel Streams and Their Caveats
parallelStream() (or .parallel()) splits work across the common ForkJoinPool. It can speed up CPU-bound work on large datasets, but it is not free.
long total = nums.parallelStream()
.filter(n -> n % 2 == 0)
.mapToLong(Integer::longValue)
.sum();
Warning: Parallel streams require stateless, non-interfering, associative operations. Never mutate shared state from a lambda, and avoid them for small collections or I/O-bound tasks where coordination overhead dominates.
Best Practices
- Prefer streams for declarative transformations; keep loops for complex stateful logic.
- Keep lambdas pure and side-effect-free.
- Use primitive streams (
IntStream,LongStream) to avoid boxing in numeric pipelines. - Reach for
Collectors.toUnmodifiableList()when results should be immutable. - Measure before parallelizing — default to sequential streams.
- Avoid storing streams in fields or returning reusable streams; return collections instead.
Interview Questions
Q: What is the difference between intermediate and terminal operations?
Intermediate operations are lazy and return a new stream (map, filter). Terminal operations trigger execution and produce a result or side effect (collect, count). A pipeline does nothing until a terminal operation is invoked.
Q: Why can a stream be consumed only once?
A stream models a one-shot traversal of a source. Once a terminal operation runs the stream is closed; reusing it throws IllegalStateException. Create a new stream from the source instead.
Q: When would a parallel stream hurt performance? For small datasets, I/O-bound work, or operations with non-associative accumulators or shared mutable state. Splitting and merging overhead plus contention can outweigh any gains.
Q: What is the difference between map and flatMap?
map produces one output per input (1:1). flatMap maps each element to a stream and flattens the results into a single stream, useful for un-nesting collections of collections.