🔥 🔥Practical Open Source is coming 🔥🔥 Propose an article about doing business with Open Source!

How Apache Groovy makes working with strings easy

A “string” in programming is zero or more characters that you can think of as (mostly) a single entity. It’s a concept that exists in most programming languages. In Java, and therefore in Groovy, strings are primarily created as instances of the class String. (If you haven’t installed Groovy yet, please read the intro to this series.)

From the Java perspective, because String is a class, it’s (obviously) not a primitive type, which might initially seem unfortunate. But when you examine the rich behavior defined for instances of String, you can see there is a great deal of advantage to thinking of a string as a full-on class.

For example, there are methods that find individual characters in the instance of String (indexOf(), charAt()), that return parts of the instance, possibly modified (substring(), replace()), that apply regular expressions to search the instance (matches(), regionMatches()) and so on.

As an aside, there is equal merit in thinking of instances of Integer rather than int, Double rather than double, and so on from the same perspective. An int in Java is just an int — something that has an integer value that may be mutable. It has no other behavior. In contrast, Integer comes with all sorts of useful behavior: max() and min(), parseInt(), toString() and so forth. As mentioned earlier, Groovy promotes all declarations of primitive types to full objects.

In Java and Groovy, an important aspect of String objects to recognize is that they are immutable, indicating that once created, their state cannot be modified. So if you have declared String s = "abc", then there’s no way to convert that middle "b" into another character. The expression s.replace("b","x") does not change s; rather, it returns a copy of s with the "b" replaced by "x". (As far as I can recall from my early Java days, String was the only class that maintained its value as immutable.)

Programmers who need mutable strings should examine the class StringBuffer, which in addition to being mutable, is also thread-safe.

This a good time to explain to the reader the meaning of the Java keyword final. If a variable of primitive type (int, double, and so on) is declared final, its value cannot be changed. However, if the variable is an instance of a class, the instance it refers to cannot be changed. But if the instance itself is mutable, then its contents can be changed. Here’s a small program that illustrates that:

 1  import java.lang.*;
 2  public class Groovy09a {
 3    public static void main(String[] args) {
 4      var t1 = new Test();
 5      t1.setA(42);
 6      final var t2 = t1;
 7      System.out.println("t1.getA() " + t1.getA() + " t2.getA() " + t2.getA());
 8      t2.setA(57);
 9      System.out.println("t1.getA() " + t1.getA() + " t2.getA() " + t2.getA());
10    }
11  }
12  class Test {
13    private int a = 0;
14    public int getA() { return this.a; }
15    public void setA(int a) { this.a = a; }
16  }

When you run this, you see:

$ java Groovy09a
t1.getA() 42 t2.getA() 42
t1.getA() 57 t2.getA() 57

So even though t2 is final, you can change the value of the field a of the instance of Test to which t2 refers.

However, if you insert a line like:

t2 = new Test();

Between lines nine and 10, you’ll get the following compilation error:

$ javac Groovy09a.java
Groovy09a.java:11: error: cannot assign a value to final variable t2
t2 = new Test();
^
1 error

Let’s get back to the String class.

When I write scripts to process incoming data, one of the things I need to do frequently is extract a substring — or several — from each line of input.

In Java, this operation looks like:

line.substring(7,11)

This means the part of the line starting in position seven and ending just before position 11 (remember that strings and therefore substrings, start at position zero and the last position is the length of the string less 1).

I can do the same in Groovy. But I can also use the range operator as a more brief way of extracting substrings from strings:

1  String line = "0123456789abcdefghijklmnopqrstuvwxyz"
2  println line.substring(7,11)
3  println line[7..11]
4  println line[7..<11]
5  println line[7..10]
6  println line[30..-1]
7  println line[-6..-1]

When you run this, you see:

$ groovy Groovy09b.groovy
789a
789ab
789a
789a
uvwxyz
uvwxyz

In Groovy, the notation a..b is a range that defines all the integers between a and b inclusively.

Therefore, you see that line.substring(7,11) is equivalent to line[7..<11] or line[7..10]. The notation 7…<11 means the range from seven up to just before 11. For completeness, the notation 6<…<11 means the range from just after six up to just before 11. It’s also useful to know that ranges of this sort can be applied to arrays and lists.

Groovy augments the behavior of String objects with additional methods. There’s a particular group of methods whose names all start with “take” that I often find useful, especially for text that’s structured regularly but not in a fixed format. For instance, there’s the takeBetween() method that allows pulling out the substring between a begin and end marker:’s structured regularly but not in a fixed format. For instance, there’s the takeBetween() method that allows pulling out the substring between a begin and end marker:

 1  String html = """
 2  <html>
 3    <head>
 4      <title>Hello world</title>
 5    </head>
 6    <body>
 7      <h1>Hello world</h1>
 8        <p>Hello world</p>
 9    </body>
10  </html>
11  """
12  println html.takeBetween("<html>","</html>")
13  println html.takeBetween("<p>","</p>")

When you run this, you see:

$ groovy Groovy09c.groovy
<head>
  <title>Hello world</title>
</head>
<body>
  <h1>Hello world</h1>
    <p>Hello world</p>
  </body>
Hello world

Here you have in turn extracted the text between <html> and </html> and between <p> and </p>.

It is worth mentioning that Groovy has always permitted multiline strings by using the """ and ''' sequences to start and end the multiline block.

Keep in mind another useful Groovy addition to Java strings —- the GString — which you’ve seen in previous articles. The GString provides a mechanism to interpolate values into strings. For example, in Java you would write:

System.out.println("t1.getA() " + t1.getA() + " t2.getA() " + t2.getA());

Whereas in Groovy you use the GString to write:

System.out.println(“t1.getA() ${t1.getA()} t2.getA() ${t2.getA()}”);

Or of course, you can shorten this up further, by dropping System.out., the parentheses on the method call and use dot notation to access the getters:

println “t1.a ${t1.a} t2.a ${t2.a}”

Conclusion

Strings are fundamental to programming — in some ways, programming can be seen as translating text into action and back into text. Java provides a powerful String class that facilitates complex string transformation programs. Groovy adds additional sophistication to the String class and streamlines the use of String through greater syntactic support. As students of Groovy will soon realize, this streamlining is both due to specific additional syntactic support for string operations and the synergy of other cool stuff in Groovy. This includes ranges, that generate emergent properties programmers can use to their advantage.

Photo by Daniel Fazio on Unsplash

Disclaimer: All published articles represent the views of the authors, they don’t represent the official positions of the Open Source Initiative, even if the authors are OSI staff members or board directors.

One response to “How Apache Groovy makes working with strings easy”

  1. […] my previous article, I reintroduced the Java, and Groovy String class. This time around, I’m going to look at […]

Leave a Reply

Author

Support us

OpenSource.net is supported by the Open Source Initiative, the non-profit organization that defines Open Source.

Trending