A “string” in programming is zero or more characters that you can think of as (mostly) a single entity. It’s a concept that exists in most programming languages. In Java, and therefore in Groovy, strings are primarily created as instances of the class String
. (If you haven’t installed Groovy yet, please read the intro to this series.)
From the Java perspective, because String
is a class, it’s (obviously) not a primitive type, which might initially seem unfortunate. But when you examine the rich behavior defined for instances of String, you can see there is a great deal of advantage to thinking of a string as a full-on class.
For example, there are methods that find individual characters in the instance of String
(indexOf()
, charAt()
), that return parts of the instance, possibly modified (substring()
, replace()
), that apply regular expressions to search the instance (matches()
, regionMatches()
) and so on.
As an aside, there is equal merit in thinking of instances of Integer
rather than int
, Double
rather than double
, and so on from the same perspective. An int
in Java is just an int
— something that has an integer value that may be mutable. It has no other behavior. In contrast, Integer
comes with all sorts of useful behavior: max()
and min()
, parseInt()
, toString()
and so forth. As mentioned earlier, Groovy promotes all declarations of primitive types to full objects.
In Java and Groovy, an important aspect of
objects to recognize is that they are immutable, indicating that once created, their state cannot be modified. So if you have declared String
String s = "abc"
, then there’s no way to convert that middle "b"
into another character. The expression s.replace("b","x")
does not change s
; rather, it returns a copy of s
with the "b"
replaced by "x"
. (As far as I can recall from my early Java days, String was the only class that maintained its value as immutable.)
Programmers who need mutable strings should examine the class StringBuffer
, which in addition to being mutable, is also thread-safe.
This a good time to explain to the reader the meaning of the Java keyword final
. If a variable of primitive type (int
, double
, and so on) is declared final
, its value cannot be changed. However, if the variable is an instance of a class, the instance it refers to cannot be changed. But if the instance itself is mutable, then its contents can be changed. Here’s a small program that illustrates that:
1 import java.lang.*;
2 public class Groovy09a {
3 public static void main(String[] args) {
4 var t1 = new Test();
5 t1.setA(42);
6 final var t2 = t1;
7 System.out.println("t1.getA() " + t1.getA() + " t2.getA() " + t2.getA());
8 t2.setA(57);
9 System.out.println("t1.getA() " + t1.getA() + " t2.getA() " + t2.getA());
10 }
11 }
12 class Test {
13 private int a = 0;
14 public int getA() { return this.a; }
15 public void setA(int a) { this.a = a; }
16 }
When you run this, you see:
$ java Groovy09a
t1.getA() 42 t2.getA() 42
t1.getA() 57 t2.getA() 57
So even though t2
is final, you can change the value of the field a
of the instance of Test
to which t2
refers.
However, if you insert a line like:
t2 = new Test();
Between lines nine and 10, you’ll get the following compilation error:
$ javac Groovy09a.java
Groovy09a.java:11: error: cannot assign a value to final variable t2
t2 = new Test();
^
1 error
Let’s get back to the String
class.
When I write scripts to process incoming data, one of the things I need to do frequently is extract a substring — or several — from each line of input.
In Java, this operation looks like:
line.substring(7,11)
This means the part of the line starting in position seven and ending just before position 11 (remember that strings and therefore substrings, start at position zero and the last position is the length of the string less 1).
I can do the same in Groovy. But I can also use the range operator as a more brief way of extracting substrings from strings:
1 String line = "0123456789abcdefghijklmnopqrstuvwxyz"
2 println line.substring(7,11)
3 println line[7..11]
4 println line[7..<11]
5 println line[7..10]
6 println line[30..-1]
7 println line[-6..-1]
When you run this, you see:
$ groovy Groovy09b.groovy
789a
789ab
789a
789a
uvwxyz
uvwxyz
In Groovy, the notation a..b
is a range that defines all the integers between a
and b
inclusively.
Therefore, you see that line.substring(7,11)
is equivalent to line[7..<11]
or line[7..10]
. The notation 7…<11 means the range from seven up to just before 11. For completeness, the notation 6<…<11 means the range from just after six up to just before 11. It’s also useful to know that ranges of this sort can be applied to arrays and lists.
Groovy augments the behavior of String
objects with additional methods. There’s a particular group of methods whose names all start with “take” that I often find useful, especially for text that’s structured regularly but not in a fixed format. For instance, there’s the takeBetween()
method that allows pulling out the substring between a begin and end marker:’s structured regularly but not in a fixed format. For instance, there’s the takeBetween()
method that allows pulling out the substring between a begin and end marker:
1 String html = """
2 <html>
3 <head>
4 <title>Hello world</title>
5 </head>
6 <body>
7 <h1>Hello world</h1>
8 <p>Hello world</p>
9 </body>
10 </html>
11 """
12 println html.takeBetween("<html>","</html>")
13 println html.takeBetween("<p>","</p>")
When you run this, you see:
$ groovy Groovy09c.groovy
<head>
<title>Hello world</title>
</head>
<body>
<h1>Hello world</h1>
<p>Hello world</p>
</body>
Hello world
Here you have in turn extracted the text between <html>
and </html>
and between <p>
and </p>
.
It is worth mentioning that Groovy has always permitted multiline strings by using the """
and '''
sequences to start and end the multiline block.
Keep in mind another useful Groovy addition to Java strings —- the GString
— which you’ve seen in previous articles. The GString
provides a mechanism to interpolate values into strings. For example, in Java you would write:
System.out.println("t1.getA() " + t1.getA() + " t2.getA() " + t2.getA());
Whereas in Groovy you use the GString to write:
System.out.println(“t1.getA() ${t1.getA()} t2.getA() ${t2.getA()}”);
Or of course, you can shorten this up further, by dropping System.out.
, the parentheses on the method call and use dot notation to access the getters:
println “t1.a ${t1.a} t2.a ${t2.a}”
Conclusion
Strings are fundamental to programming — in some ways, programming can be seen as translating text into action and back into text. Java provides a powerful String
class that facilitates complex string transformation programs. Groovy adds additional sophistication to the String
class and streamlines the use of String
through greater syntactic support. As students of Groovy will soon realize, this streamlining is both due to specific additional syntactic support for string operations and the synergy of other cool stuff in Groovy. This includes ranges, that generate emergent properties programmers can use to their advantage.
Photo by Daniel Fazio on Unsplash
Leave a Reply