Saturday, May 28, 2011

awk variables

In awk there are many default variables which can be used .
We will see each of them with examples:
acer@ubuntu:~$ cat > record1
John robinson
koren inc
phno:555555555


usha
yyy technologies
ph no:546456456


dddd
aaa technologies
ph no:44444444


acer@ubuntu:~$
consider the above record in which our requirement is to produce a single line for each record having the name with phone no.
In the above record ,the each record is separated by blank line.

For producing we need to
1)change the input record separator(RS) which is default to "\n" to blank line.
2)change the Field separator which is default to " " to  "\n"

acer@ubuntu:~$
acer@ubuntu:~$ cat separate.awk
BEGIN { FS="\n";RS="" }
{ print $1,$NF}
acer@ubuntu:~$ awk -f separate.awk record1
John robinson phno:555555555
usha ph no:546456456
dddd ph no:44444444

Here as you may see from the ouput the each record is converted into a single line.Here $NF refers to the last field in the record.

In the same way we can change the OFS and ORS and change the usual way the ouput is displayed .

In this we change the Output field separator(OFS) to "\t" from default separator space.
and the Output Record separator(ORS)  to "\n\n" from the default separator "\n"

Sample example illustrates the change better.
acer@ubuntu:~$ cat separate.awk
BEGIN { FS="\n";RS="";OFS="\t";ORS="\n\n" }
{ print $1,$NF}
acer@ubuntu:~$
acer@ubuntu:~$ awk -f separate.awk record1
John robinson    phno:555555555

usha    ph no:546456456

dddd    ph no:44444444

acer@ubuntu:~$


Regarding the usage of NR variable in awk.It just prints the record number.
Sample example:
acer@ubuntu:~$ cat ggg
hello life
wonderful
great
marvellous
awesome
acer@ubuntu:~$ awk '{print NR,$1 }' ggg
1 hello
2 wonderful
3 great
4 marvellous
5 awesome
acer@ubuntu:~$


NR can also be used in this way.
acer@ubuntu:~$ cat ggg
hello life
wonderful
great
marvellous
awesome
acer@ubuntu:~$ awk 'NR==1{print $0 ": i am in first row" }' ggg
hello life: i am in first row
acer@ubuntu:~$

"$0" usuage in awk is to print the complete record as from the below example:
acer@ubuntu:~$ awk '{print $0 }' ggg
hello life
wonderful
great
marvellous
awesome
acer@ubuntu:~$

Hope this gives a better idea about awk variables.

Kindly comment was the information useful.



The below table provides overview of all the awk variables available.

Variable
Represents
NR
$0
NF
$1-$n
FS
OFS
RS
ORS
FILENAME
record number of current record
the current record (as a single variable)
number of fields in the current record
fields in the current record
input field seperator (default:
 SPACE or TAB)
output field seperator (default:
 SPACE)
input record seperator (default:
 NEWLINE)
output record seperator (default:
 NEWLINE)
name of the current input file

finding no of blank lines in file using awk in shell



This program prints the no of blank lines in a file using awk in shell

acer@ubuntu:~$
acer@ubuntu:~$
acer@ubuntu:~$ cat > ddd

dsdsd

sdsdsd


dsds
acer@ubuntu:~$ awk 'BEGIN { x=0 }
> /$^/{ x=x+1 } # for counting the no of blank lines
> END { print " no of blank lines in the file  " x }' ddd
 no of blank lines in the file  4
acer@ubuntu:~$




For detailed understanding of BEGIN,END you can see the AWK PROGRAMMING MODEL blog in this site.

Thanks.

Was it useful.Kindly put your comment.


AWK programming model

awk is usually input driven .i.e. it executes the command for the no of lines in the file supplied or the no of lines which is piped to the awk command.

For example:

acer@ubuntu:~$ cat > fff
ewrrwer
ewrw
wer
werwe
ewrwer^Z
[1]+  Stopped                 cat > fff
acer@ubuntu:~$ awk '{ print "hello world" } ' fff
hello world
hello world
hello world
hello world
hello world
acer@ubuntu:~$

Here the hello world is printed  5 times since fff has 5 lines.

There is a exception to it you can use BEGIN or END command which executes without waiting for the input provided by the file or the pipe.

In case of BEGIN command in awk executes before the input from the file is processed.

For example:
acer@ubuntu:~$ cat ggg
hello life
wonderful
great
marvellous
awesome
acer@ubuntu:~$
acer@ubuntu:~$ awk 'BEGIN { print "hello from begin"}  
> /wonderful/ { print "hello during file process" }'  ggg    
hello from begin 
hello during file process
acer@ubuntu:~$

PROCESS IN AWK passes through three simple steps:

It executes the BEGIN COMMAND before any input is read.
                                     |
It process the file or input for the no of lines in file.(main loop)
                                     |
It executes the END command after the main loop ends.

Here the BEGIN and END are optional.

And a note about the END command it does not get executed if there is no lines in the file or input and it waits for input unlike BEGIN command.

acer@ubuntu:~$ awk 'END { print "hello from end command" }'
it waits endlessly

When you provide a input , it executes
acer@ubuntu:~$ echo "hello" | awk 'END { print "hello from end command" }'
hello from end command
acer@ubuntu:~$

These in combination can be used for writing many useful commands.

For example : for now we will write a shell program to print the no of lines in the file

acer@ubuntu:~$ awk 'BEGIN { x=0 }
> { x=x+1 }   # for adding the no of lines
> END { print "no of lines in the file is" x }' ggg
no of lines in the file is5
acer@ubuntu:~$


Hope this clarifies your doubt about AWK PROGRAMMING MODEL.

Kindly comment