Collections
This article describes the Java Collections Framework. Here you will learn what collections are and how they can make your job easier and programs better. You'll learn about the core elements — interfaces, implementations, and algorithms — that comprise the Java Collections Framework.
Introduction
A collection is simply an object that groups multiple elements into a single unit. Collections are used to store, retrieve, manipulate, and communicate aggregate data. Typically, they represent data items that form a natural group.
Collection implementations in earlier (pre-1.2) versions of the Java platform included Vector, Hashtable, and array. However, those earlier versions did not contain a collections framework.
What Is a Collections Framework?
A collections framework is a unified architecture for representing and manipulating collections. All collections frameworks contain the following:
- Interfaces: These are abstract data types that represent collections. Interfaces allow collections to be manipulated independently of the details of their representation. In object-oriented languages, interfaces generally form a hierarchy.
- Implementations: These are the concrete implementations of the collection interfaces. In essence, they are reusable data structures.
- Algorithms: These are the methods that perform useful computations, such as searching and sorting, on objects that implement collection interfaces. The algorithms are said to be polymorphic: that is, the same method can be used on many different implementations of the appropriate collection interface.
Interfaces
The core collection interfaces encapsulate different types of collections, which are shown in the figure below.

Figure 1The core collection interfaces.
Note that the hierarchy consists of two distinct trees — a Map is not a true Collection.
All the core collection interfaces are generic. For example, this is the declaration of the Collection interface.
public interface Collection
The
The modification operations in each interface are designated optional — a given implementation may elect not to support all operations. If an unsupported operation is invoked, a collection throws an UnsupportedOperationException. Implementations are responsible for documenting which of the optional operations they support. All of the Java platform's general-purpose implementations support all of the optional operations.
The following list describes the core collection interfaces:
- Collection —the Collection interface is the least common denominator that all collections implement and is used to pass collections around and to manipulate them when maximum generality is desired. The Java platform doesn't provide any direct implementations of this interface but provides implementations of more specific subinterfaces, such as Set and List.
- Set — a collection that cannot contain duplicate elements.
- List — an ordered collection. Lists can contain duplicate elements. The user of a List generally has precise control over where in the list each element is inserted and can access elements by their index.
- Queue — a collection used to hold multiple elements prior to processing. Queues typically, but do not necessarily, order elements in a FIFO (first-in, first-out) manner. Among the exceptions are priority queues, which order elements according to a supplied comparator or the elements' natural ordering. Whatever the ordering used, the head of the queue is the element that would be removed by a call to remove or poll. In a FIFO queue, all new elements are inserted at the tail of the queue. Other kinds of queues may use different placement rules. Every Queue implementation must specify its ordering properties.
- Map — an object that maps keys to values. A Map cannot contain duplicate keys; each key can map to at most one value.
- SortedSet — a Set that maintains its elements in ascending orders. Several additional operations are provided to take advantage of the ordering. Sorted sets are used for naturally ordered sets, such as word lists and membership rolls.
- SortedMap — a Map that maintains its mappings in ascending key order. Sorted maps are used for naturally ordered collections of key/value pairs, such as dictionaries and telephone directories.
The Collection Interface
A Collection represents a group of objects known as its elements. The Collection interface is used to pass around collections of objects where maximum generality is desired. For example, by convention all general-purpose collection implementations have a constructor that takes a Collection argument. This constructor, known as a conversion constructor, initializes the new collection to contain all of the elements in the specified collection, whatever the given collection's subinterface or implementation type. In other words, it allows you to convert the collection's type.
Suppose, for example, that you have a CollectionCollection. This idiom creates a new ArrayList, initially containing all the elements in c.
List
The following shows the Collection interface.
public interface Collection
// Basic operations
int size();
boolean isEmpty();
boolean contains(Object element);
boolean add(E element); //optional
boolean remove(Object element); //optional
Iterator
boolean containsAll(Collection c);
boolean addAll(Collectionextends E> c); //optional
boolean removeAll(Collection c); //optional
boolean retainAll(Collection c); //optional
void clear(); //optional
Object[] toArray();
}
The interface has methods to tell you how many elements are in the collection (size, isEmpty), to check whether a given object is in the collection (contains), to add and remove an element from the collection (add, remove), and to provide an iterator over the collection (iterator).
Traversing Collections
There are two ways to traverse collections:
for-each Construct
The for-each construct allows you to concisely traverse a collection or array using a for loop. The following code uses the for-each construct to print out each element of a collection on a separate line.
Iterators
An Iterator is an object that enables you to traverse through a collection and to remove elements from the collection selectively, if desired. You get an Iterator for a collection by calling its iterator method. The following is the Iterator interface.
public interface Iterator
boolean hasNext();
E next();
void remove(); //optional
}
The hasNext method returns true if the iteration has more elements, and the next method returns the next element in the iteration. The remove method removes the last element that was returned by next from the underlying Collection. The remove method may be called only once per call to next and throws an exception if this rule is violated.
Use Iterator instead of the for-each construct when you need to:
ü Remove the current element. The for-each construct hides the iterator, so you cannot call remove. Therefore, the for-each construct is not usable for filtering.
ü Iterate over multiple collections in parallel.
The following method shows you how to use an Iterator to filter an arbitrary Collection — that is, traverse the collection removing specific elements.
static void filter(Collection c) {
for (Iterator it = c.iterator(); it.hasNext(); )
if (!cond(it.next()))
it.remove();
}
Collection Interface Bulk Operations
Bulk operations perform an operation on an entire Collection. You could implement these shorthand operations using the basic operations, though in most cases such implementations would be less efficient. The following are the bulk operations:
ü containsAll — returns true if the target Collection contains all of the elements in the specified Collection.
ü addAll — adds all of the elements in the specified Collection to the target Collection.
ü removeAll — removes from the target Collection all of its elements that are also contained in the specified Collection.
ü retainAll — removes from the target Collection all its elements that are not also contained in the specified Collection. That is, it retains only those elements in the target Collection that are also contained in the specified Collection.
ü clear — removes all elements from the Collection.
The addAll, removeAll, and retainAll methods all return true if the target Collection was modified in the process of executing the operation.
As a simple example of the power of bulk operations, consider the following idiom to remove all instances of a specified element, e, from a Collection, c.
c.removeAll(Collections.singleton(e));
More specifically, suppose you want to remove all of the null elements from a Collection.
c.removeAll(Collections.singleton(null));
This idiom uses Collections.singleton, which is a static factory method that returns an immutable Set containing only the specified element.
Collection Interface Array Operations
The toArray methods are provided as a bridge between collections and older APIs that expect arrays on input. The array operations allow the contents of a Collection to be translated into an array. The simple form with no arguments creates a new array of Object. The more complex form allows the caller to provide an array or to choose the runtime type of the output array.
For example, suppose that c is a Collection. The following snippet dumps the contents of c into a newly allocated array of Object whose length is identical to the number of elements in c.
Object[] a = c.toArray();
Suppose that c is known to contain only strings (perhaps because c is of type Collectionc into a newly allocated array of String whose length is identical to the number of elements in c.
String[] a = c.toArray(new String[0]);
The Set Interface
A Set is a Collection that cannot contain duplicate elements. The Set interface contains only methods inherited from Collection and adds the restriction that duplicate elements are prohibited. Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set instances to be compared meaningfully even if their implementation types differ. Two Set instances are equal if they contain the same elements.
The following is the Set interface.
public interface Set
// Basic operations
int size();
boolean isEmpty();
boolean contains(Object element);
boolean add(E element); //optional
boolean remove(Object element); //optional
Iterator
// Bulk operations
boolean containsAll(Collection c);
boolean addAll(Collectionextends E> c); //optional
boolean removeAll(Collection c); //optional
boolean retainAll(Collection c); //optional
void clear(); //optional
// Array Operations
Object[] toArray();
}
The Java platform contains three general-purpose Set implementations: HashSet, TreeSet, and LinkedHashSet.
ü HashSet, which stores its elements in a hash table, is the best-performing implementation; however it makes no guarantees concerning the order of iteration.
ü TreeSet, which stores its elements in a red-black tree, orders its elements based on their values; it is substantially slower than HashSet.
ü LinkedHashSet, which is implemented as a hash table with a linked list running through it, orders its elements based on the order in which they were inserted into the set (insertion-order). LinkedHashSet spares its clients from the unspecified, generally chaotic ordering provided by HashSet at a cost that is only slightly higher.
Here's a simple but useful Set idiom. Suppose you have a Collection, c, and you want to create another Collection containing the same elements but with all duplicates eliminated. The following one-liner does the trick.
Collection
It works by creating a Set, initially containing all the elements in c. It uses the standard conversion constructor.
Here is a minor variant of this idiom that preserves the order of the original collection while removing duplicate element.
Collection
The following is a generic method that encapsulates the preceding idiom, returning a Set of the same generic type as the one passed.
public static
return new LinkedHashSet
}
Set Interface Basic Operations
The size operation returns the number of elements in the Set. The isEmpty method does exactly what you think it would. The add method adds the specified element to the Set if it's not already present and returns a boolean indicating whether the element was added. Similarly, the remove method removes the specified element from the Set if it's present and returns a boolean indicating whether the element was present. The iterator method returns an Iterator over the Set.
The following program takes the words in its argument list and prints out any duplicate words, the number of distinct words, and a list of the words with duplicates eliminated.
import java.util.*;
public class FindDups {
public static void main(String[] args) {
Set
for (String a : args)
if (!s.add(a))
System.out.println("Duplicate detected: " + a);
System.out.println(s.size() + " distinct words: " + s);
}
}
Now run the program.
java FindDups i came i saw i left
Duplicate detected: i
Duplicate detected: i
4 distinct words: [i, left, saw, came]
Note that the code always refers to the Collection by its interface type (Set) rather than by its implementation type (HashSet). This is a strongly recommended programming practice because it gives you the flexibility to change implementations merely by changing the constructor.
The implementation type of the Set in the preceding example is HashSet, which makes no guarantees as to the order of the elements in the Set. If you want the program to print the word list in alphabetical order, merely change the Set's implementation type from HashSet to TreeSet. Making this trivial one-line change causes the command line in the previous example to generate the following output.
java FindDups i came i saw i left
Duplicate detected: i
Duplicate detected: i
4 distinct words: [came, i, left, saw]
Set Interface Bulk Operations
Bulk operations are particularly well suited to Sets; when applied, they perform standard set-algebraic operations. Suppose s1 and s2 are sets. Here's what bulk operations do:
ü s1.containsAll(s2) — returns true if s2 is a subset of s1.
ü s1.addAll(s2) — transforms s1 into the union of s1 and s2.
ü s1.retainAll(s2) — transforms s1 into the intersection of s1 and s2.
ü s1.removeAll(s2) — transforms s1 into the set difference of s1 and s2.
To calculate the union, intersection, or set difference of two sets nondestructively (without modifying either set), the caller must copy one set before calling the appropriate bulk operation. The following are the resulting idioms.
Set
union.addAll(s2);
Set
intersection.retainAll(s2);
Set
difference.removeAll(s2);
The implementation type of the result Set in the preceding idioms is HashSet, which is, as already mentioned, the best all-around Set implementation in the Java platform. However, any general-purpose Set implementation could be substituted.
Let's revisit the FindDups program. Suppose you want to know which words in the argument list occur only once and which occur more than once, but you do not want any duplicates printed out repeatedly. This effect can be achieved by generating two sets — one containing every word in the argument list and the other containing only the duplicates. The words that occur only once are the set difference of these two sets, which we know how to compute. Here's how the resulting program looks.
import java.util.*;
public class FindDups2 {
public static void main(String[] args) {
Set
Set
for (String a : args)
if (!uniques.add(a))
dups.add(a);
// Destructive set-difference
uniques.removeAll(dups);
System.out.println("Unique words: " + uniques);
System.out.println("Duplicate words: " + dups);
}
}
When run with the same argument list used earlier (i came i saw i left), the program yields the following output.
Unique words: [left, saw, came]
Duplicate words: [i]
A less common set-algebraic operation is the symmetric set difference — the set of elements contained in either of two specified sets but not in both. The following code calculates the symmetric set difference of two sets nondestructively.
Set
symmetricDiff.addAll(s2);
Set
tmp.retainAll(s2));
symmetricDiff.removeAll(tmp);
Set Interface Array Operations
The array operations don't do anything special for Sets beyond what they do for any other Collection.