An n-gram is a sequence of n
characters within a given piece of text.
For a given piece of text, we can compute a list of all the n-grams. The size of the n-grams (n
) must be specified in the arguments.
Let’s make a list of n-grams from a string:
import java.util.*;class Ngrams {public static List<String> ngrams(int n, String str) {List<String> ngrams = new ArrayList<String>();for (int i = 0; i < str.length() - n + 1; i++)// Add the substring or size nngrams.add(str.substring(i, i + n));// In each iteration, the window moves one step forward// Hence, each n-gram is added to the listreturn ngrams;}public static void main( String args[] ) {String s = "abcdef";List<String> ngrams = ngrams(3, s);for (String ngram : ngrams){System.out.println(ngram);}}}
All you have to do is iterate through the string with a fixed window of size n
. In each iteration, the new substring, or n-gram, will be added to the ngrams
list.
The lazy algorithm approach is to create an iterator that iterates the text. In each iteration, the current n-gram is printed.
import java.util.Iterator;class NgramIterator implements Iterator<String> {private final String str;private final int n;int pos = 0;public NgramIterator(int n, String str) {this.n = n;this.str = str;}public boolean hasNext() {return pos < str.length() - n + 1;}public String next() {return str.substring(pos, pos++ + n);}public static void main( String args[] ) {String s = "abcdef";new NgramIterator(3, s).forEachRemaining(System.out::println);}}
Once again, n
is the size of the n-grams, and pos
instructs the iterator to begin at the start of the string.