Chapter three: String handling

Scan of page 19 Scan of page 20 Scan of page 21 Scan of page 22 Scan of page 23 Scan of page 24 Scan of page 25 Scan of page 26

A string constant is a sequence of characters forming a single data value. e.g.:

"BASIC"

String variables are identified by a £ or $ sign

e.g.:

A$

The name range for a string variable is thus A$–Z$ and A0$–Z9$ except where a string array is being declared. This would be, say, four strings

e.g.:

A$(0) = "TOM", A$(1) = "DICK", A$(2) = "HARRY" and A$(3) = "FRED"

Only single letter variable names may be used for arrays hence the range for string arrays is A$(…)—Z$(…) and a statement such as:

10 DIM A3$(4)

is invalid.

Note: DIM A$(4,10) would declare an array of 5 x 11 strings = 55 strings.

A string variable can be up to 511 characters in length; if it is of zero length it is known as the null string. Strings may contain any character except “←”, which is used to correct mistyped characters.

Use of quotes

Strings must be enclosed in quotes (“) except in DATA statements and in response to INPUT.

There is no connection between a numeric and a string variable with the same name, e.g. T and T$ refer to completely different variables, but T$ and T£ refer to the same variable.

The & operator

This operator is used to join together (concatenate) two strings. For example, if a surname is held in one string and a first name in another, it may sometimes be necessary to see them as a single string. Thus if A$ = MICHAEL and B$ = JONES, the statement:

30 LET C$ = A$ & B$

would set C$ equal to “MICHAELJONES”, or if the strings needed a space between the two elements:

30 LET C$ = A$ & " " & B$

would put C$ equal to “MICHAEL JONES”.

The & operator can also be used to make long strings. If card input is used, the 80 column limit only permits strings in DATA statements to be about 70 characters long, but they can be extended by combining them after reading:

10 READ A$, B$
20 LET A$ = A$ & B$
30 PRINT A$

If after joining the two strings A$ proves to have more characters than can be printed on one line, the string is truncated for printing purposes.

String functions

String functions enable complex string handling to be carried out very easily. The length of a string can be given, the position within it of a particular character, and specific characters can be isolated allowing manipulation of the contents of the string.

Functions giving numeric results

CHR(A$)

This gives the 2903 internal character code of the first character of A$:

10 LET A$ = "CAR"
20 LET B = CHR(A$)

will set B equal to 35, the code for “C”.

LEN(A$)

This gives the number of characters in A$:

10 LET A$ = "WEEKEND"
20 LET X = LEN(A$)

sets X equal to 7. Spaces are treated as part of the string so that “A CUP OF TEA” has a length of 12. The quotes are excluded. A string constant may be used instead of a variable e.g. LET D = LEN(“WEEKEND”)

LIN(A$)

This gives the number of newline characters that appear in a string. If a string A$ is output as:

ONE
newline
newline
TWO
newline
newline
THREE

two lots of two newlines will have appeared, and LIN(A$) will have the value 4.

OCC(A$,B$)

The OCC function will give the number of non-overlapping occurrences of B$ in A$, hence if A$ = “THE CAT SAT ON THE MAT” and B$ = “AT”, the function will show that there are 3 occurrences by assigning 3 to C in this statement:

10 LET C = OCC(A$,B$)

The parameters may be literals not variables e.g.:

10 LET C = OCC("TRUMPET", "T")

Because only non-overlapping occurrences are counted, the number of occurrences of “ANA” in “BANANA” would be 1.

POS(A$,B$)

The character position of the start of the first occurrence of B$ in A$. If B$ does not appear in A$, this number will be 0.

10 LET A$ = "PROTECTION"
20 LET B$ = "T"
30 LET C = POS(A$, B$)
40 LET D = POS(A$, "O")
50 LET C$ = "Z"
60 LET E = POS(A$,C$)

C = 4, D = 3, E = 0.

POS(A$,B$,N)

This gives the position of the start of the Nth occurrence of B$ in A$, and is set to zero if there is no Nth occurrence:

25 LET D$ = "ISS"
30 LET E$ = "MISSISSIPP1"
40 LET D = POS(E$,D$,2)
50 LET E = POS(E$,D$,3)

In this example, D = 5 since the second occurrence of ISS begins at character 5, and E = 0 since there is no third occurrence.

VAL(A$)

The numeric value of A$, which is a string containing numeric characters:

10 LET K = VAL(A$)

If A$ = “194” then K would be set to 194.

This function is very useful in processing Terminal Format Files (see Chapter six). Data is input to these files as single strings, and using the string-handling functions, a numeric section of the whole could be isolated. If the VAL function is then applied, the numeric value of this string of numeric characters can be found and normal numeric processing can be performed.

Functions giving string results

CHR$(N)

The character whose 2903 internal character code is N. Thus:

CHR$(33) = "A"
CHR$(26) = "*"
CHR$(16) = " "

This function is only designed to decode one character. The statement:

10 LET A$ = CHR$(413544)

will not give the individual character value “ICL” but the character value of the whole number modulo 64, i.e. “H”.

DAT$

This gives the date in the format dd/mm/yy

TIM$

This gives the time in the format hh/mm/ss

SEG$(A$,N,M)

The segment of A$ from the Nth character to the Mth character. If M is omitted the segment is from N to the end of A$.

e.g.:

10 LET A$ = "TRUNCATION"
20 LET B$ = SEG$(A$,5,7)

would set B$ = “CAT” and:

10 READ C$, D
20 LET D$ = SEG$(C$,D)
30 DATA DRAGONFLY, 7

would set D$ = “FLY” taking from character 7 to the end as implied by the SEG$ statement.

SEG$ is a useful function to use in conjunction with the numeric result functions LEN and POS, described earlier. They can be used to scan a string interspersed with a known character, e.g. a space or a comma, to allow specific parts of the string to be accessed. For example, if a string contains a first name and a surname with a space between them, it is possible to isolate and print the surname alone, or the whole name in surname first name order.

e.g.:

10 LET A$ = "JOHN EVANS"

The position of the first character of the surname will be the position of the space + 1:

20 LET B = POS(A$, " ") + 1

In this case B = 6

The end of the surname is the end of the string itself, so the function which gives the length of a string will point to the final character position:

30 LET C = LEN(A$)

Here C = 10.

We now know that the surname is the section of A$ from the value of B to the value of C and the SEG$ function will isolate this section:

40 LET D$ = SEG$(A$, B, C)

D$ now holds the surname only, “EVANS”.

These functions can also be used if a single string holds several pieces of information all separated by, for instance, a comma:

20 DATA "MR.R JONES,20 PARK DRIVE, WIGAN, LANCS"
30 DATA "MRS.J. BUTLER,120 GREEN LANE, CHATHAM, KENT"
40 DATA "MRS.L MITCHELL,57 VAUGHAN ROAD, LEEK, STAFFS"
50 DATA "MR.R ELLIS,91 MELROSE AVENUE, GRAVESEND, KENT"

It could be ascertained how many of the householders are men by isolating the section up to the first full stop. To find how many of the people live in Kent LEN, POS and SEG$ can be used to find the value of the section after the third comma.

SEG$ can also be used in conjunction with other functions. If it is necessary to find out how many seconds a program has been running for the following can be used:

10 LET A = VAL(SEG$(TIM$,7))
15 ΡRΙΝΤ Α
20 END

This gives the numeric value of the last two characters of the time, the number of seconds. If the time was 11/45/20, it would hold the value 20. This could be done at various points in the program to monitor how long the processing took.

SUB$(A$,N,M)

This function allows a subsection of a string to be isolated. It differs from SEG$ in that M is the number of characters starting from N: it does not end at the Mth character. If M is omitted, the Nth character of A$ is given:

10 LET A$ = "EXPORT"
20 LET B$ = SUB$(A$,3)

B$ is then equal to “P”.

10 LET A$ = "SUBSECTION"
20 LET B$ = SUB$(A$,4,4)

B$ is then equal to “SECT”.

STR$(X)

The number X expressed as a string. This is the opposite of VAL in that it assigns a number to a string.

It is useful in printing numbers with no surrounding spaces. In the normal way a positive number has a leading space and a trailing space and a negative number has a trailing space. Thus:

10 PRINT 99

gives▽99▽but

20 PRINT STR$(99)

gives 99

30 PRINT -99;–99;99

gives -99▽-99▽▽99▽

but

40 PRINT STR$(-99);STR$(-99);STR$(99)

gives -99-9999.

GAP$(N)

This function gives a string of N spaces.

10 A$ = “THIRD" & GAP$(3) & "FLOOR"
20 PRINT A$

gives “THIRD   FLOOR”

LIN$(N)

LIN$(N) gives N new lines. This can be used to space out strings for printing.

10 READ A$, B$, C$, D$
20 LET T$ = A$ & LIN$(1) & B$ & LIN$(1) & C$ & LIN$(1) & D$
30 PRINT T$
40 DΑΤΑ "ΤΗΕ ΤΙΜΕ ΗΑS CΟΜΕ ΤΗΕ ΤΕΑΟΗΕR SΑΙD"
50 DΑΤΑ "ΤΟ ΤΑLΚ ΟF ΜΑΝΥ ΤΗΙΝGS"
60 DATA "OF SEGS AND SUBS AND MIN AND MAX"
70 DATA “OF MATRICES AND STRINGS"
80 END

The assignment to T$ of LIN$ characters in line 20 causes the PRINT statement to output the verse as it appears in the DATA statement, avoiding separate PRINT statements for each string read.

MՍԼ$(A$,N)

This function gives a string of N repetitions of A$:

10 LET A$ = "JINGLE BELLS"
20 LET B$ = MUL$(A$,2)
30 PRINT B$
40 END

This causes the output of the following string “JINGLE BELLSJINGLE BELLS” or with an &-function:

10 LET A$ = "THREE BLIND MICE"
20 LET B$ = "SEE HOW THEY RUN"
30 LET C$ = MUL$(A$,2) & MUL$(B$,2)
40 PRINT C$
50 END

will give

THREE BLIND MICETHREE BLIND MICESEE HOW THEY RUNSEE HOW THEY RUN

SGN$(X)

SGN$(X) gives + if X is greater than zero.
  - if X is less than zero.
  space if X is equal to zero.

This can be used to test if X is positive or negative. It may be useful in printing

e.g.:

10 PRINT "X IS A "; SGN$(X); "VE NUMBER"

So that if X is not zero the following will be printed:

   X IS A +VE NUMBER
Or X IS A –VE NUMBER

DEL$(A$,B$,N)

This function deletes from A$ a single occurrence of B$, the Nth if N is specified, the first if it is not:

10 LET A$ = "ACCOUNTING"
20 LET B$ = "COUN"
30 LET C$ = DEL$(A$, B$)

C$ is then equal to “ACTING”:

10 LET X$ = “DESSERT"
20 LET Y$ = DEL$(X$,"S",2)

Y$ is then equal to “DESERT”

SDL$(A$,B$,N)

This is also a deletion function, but is used when more than one occurrence of a string is to be deleted. If N is omitted or equals zero, all occurrences of B$ in A$ are deleted. If N appears the first N occurrences are deleted.

e.g.:

10 LET A$ = "ESTEEMED"
20 LET B$ = "E"
30 LET C$ = SDL$(A$,B$,3)

will set C$= “STMED”

10 LET X$ = "PEPPER"
20 LET Y$ = SDL$(X$,"P")

will set Y$ = “EER”

REP$(A$,B$,C$,N)

Rather than simply deleting part of a string, it may be necessary to replace a section of it with different characters. This function replaces B$ in A$ with C$. It replaces only one occurrence of the string: the Nth if N is specified, the first if it is not.

e.g.:

10 LET A$ = "CONSONANT"
20 LET B$ = "ULT"
30 LET C$ = REPS(A$,"ON",B$,2)

will set C$ = “CONSULTANT”

SRP$(A$,B$,C$,N)

This performs a similar function to REP$. It replaces one character string with another, but in more than one place. If N is omitted or equals zero, all occurrences of B$ are replaced by C$ in A$; if N appears, the first N occurrences are replaced.

e.g:

10 LET A$ = "I CAME, I SAW, I CONQUERED"
20 LET B$ = "I"
30 LET C$ = "HE"
40 LET D$ = SRPS(A$,B$,C$,2)

will set $S = “HE CAME, HE SAW, I CONQUERED” and combining the general replace function with the specific one:

10 LET A$ = "THE BOYS WANT TO CLIMB THE TREES."
20 LET B$ = SRP$(A$,"S","")
30 LET C$ = REPS(B$," ","S",3)

will set C$ = “THE BOY WANTS TO CLIMB THE TREE”

These string handling functions can be nested together to any depth, the limit being line length. For example:

10 LET A$ = "ABRACADABRA"
20 LET B$ = "CADEB"
30 LET C$ = SUB$(A$,OCC(B$,"A"),POS(A$,B$,1))

C$ is then equal to “BRACA” since it is five characters of A$ starting from character two. Using “&”, several string handling functions can be put together thus:

10 LET A$ = "STANDARD"
20 LET B$ = "AND"
30 LET C$ = TIM$&DAT$&SGN$(-4)&SEG$(A$,POS(A$,B$),LEN(A$))

C$ would then be equal to “16/19/3828/02/77-ANDARD”.

The CHANGE statement

A statement such as:

30 CΗΑΝGΕ W$ ΤΟ Χ

places numerical codes for the characters of W$ into consecutive elements of array X. X must be one dimensional and if it is not large enough to hold the string an error will be given. The element X(0) will contain the number of characters in the string. The numerical instructions of BASIC can then be used for working on the characters, and the string reassembled with the statement:

50 CΗΑΝGΕ Χ ΤΟ W$

Here the value in X(0) will tell the system how many characters to form into a string. The following example shows a four-character string being reversed using CHANGE

e.g:

10 DIM X(4), Y(4)
20 READ W$
30 CHANGE W$ TO X
40 LET Y(0) = X(0)
50 FΟR J = 1 ΤΟ 4
60 LET Y(5-J) = X(J)
70 ΝΕΧΤ J
80 CHANGE Y TO W$
90 DΑΤΑ RΑΤS
100 END

This sets the reversed W$ to “STAR”. Line 40 sets the first element of Y to the number of characters in the string. Line 60 sets the elements of Y equal to the reverse of the elements of X.