SHELL - Lab 2

Grep:

Grep is an acronym that stands for Global Regular Expression Print.

Grep is a Linux / Unix command-line tool used to search for a string of characters in a specified file. The text search pattern is called a regular expression. When it finds a match, it prints the line with the result. The grep command is handy when searching through large log files.

Usage:

The grep command consists of three parts in its most basic form. The first part starts with grep, followed by the pattern that you are searching for. After the string comes the file name that the grep searches through.

The simplest grep command syntax looks like this:

grep findit file

Search multiple files:

To search multiple files with the grep command, insert the filenames you want to search, separated with a space character. In our case, the grep command to match the word moon in three files file1, file2, and file3looks like this example:

grep moon file1 file2 file3

To search all files in the current directory you can use * instead of the filename(s).

Important flags for using grep:

Flag

Example

Description

-r

grep -r text fileName

Searches also in subdirectories of the specified 'fileName'.

-n

grep -n text fileName

Prints the line number of output.

-H

grep -H text fileName

Includes the filename always in the output. By default the filename is only included if several files are searched.

-c

grep -c text fileName

Counts the occurrences of the search term in the specified files.

-l

grep -l text fileName

Lists the files which includes the search term.

-L

grep -L text fileName

Lists the files which do not include the search term.

Regex:

Linux Regular Expressions are special characters which help search data and matching complex patterns. Regular expressions are shortened as 'regexp' or 'regex'. They are used in many Linux programs like grep, bash, rename, sed, etc.

Basic Regular expressions:

Some of the commonly used commands with Regular expressions are tr, sed, vi and grep. Listed below are some of the basic Regex.

Symbol

Descriptions

replaces any character

matches start of string

matches end of string

matches up zero or more times the preceding character

Represent special characters

()

Groups regular expressions

Matches up exactly one character

Example:

Suppose we need to find lines starting with F in 'file.txt' that contains the following text:

Hi
I'm
Feeling
Fine
today

We will use the following:

grep ^F file.txt

Output:

Feeling
Fine

Interval Regular expressions:

These expressions tell us about the number of occurrences of a character in a string.

Expression

Description

{n}

Matches the preceding character appearing atleast 'n' times.

{n,m}

Matches the preceding character appearing 'n' times but not more than m.

Example:

Filter out all lines that contain character 'p' atleast 2 times consecutively in the file 'file.txt' as shown below:

apple
appple
ale
apppple
apaaple

We will use the following:

grep -E p\{2} file.txt

Output:

apple
appple
apppple

Extended regular expressions:

These regular expressions contain combinations of more than one expression.

Expression

Description

Matches one or more occurrence of the previous character.

Matches zero or one occurrence of the previous character.

Example:

Suppose we want to filter out lines where character 'a' precedes character 't' from the file 'file.txt' as follows:

hate
bat
ant
dent
ate

We will use the following:

grep "a\+t" file.txt

Output:

hate
bat
ate

Sed (Stream editor):

sed can be used at the command-line, or within a shell script, to edit a file non-interactively. most useful feature is to do a 'search-and-replace' for one string to another.

Replacing or substituting string : Sed command is mostly used to replace the text in a file. The below simple sed command replaces the word “unix” with “linux” in the file.

sed 's/unix/linux/' file.txt

Replacing the nth occurrence of a pattern in a line : Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word “unix” with “linux” in a line.

sed 's/unix/linux/2' file.txt

Replacing all the occurrence of the pattern in a line : The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.

sed 's/unix/linux/g' file.txt

Replacing from nth occurrence to all occurrences in a line : Use the combination of /1, /2 etc and /g to replace all the patterns from the nth occurrence of a pattern in a line. The following sed command replaces the third, fourth, fifth… “unix” word with “linux” word in a line.

sed 's/unix/linux/3g' file.txt

Replacing string on a specific line number : You can restrict the sed command to replace the string on a specific line number. The following does the operation only on the 3rd line.

sed '3 s/unix/linux/' file.txt

Replacing string on a range of lines : You can specify a range of line numbers to the sed command for replacing a string. Here the sed command replaces the lines with range from 1 to 3.

sed '1,3 s/unix/linux/' geekfile.txt

AWK:

AWK is suitable for pattern search and processing. The script runs to search one or more files to identify matching patterns and if the said patterns perform specific tasks.

Syntax:

awk options 'selection _criteria {action }' input-file > output-file

To demonstrate more about AWK usage, we are going to use the text file called file.txt.

Printing specific columns:

To print the 2nd and 3rd columns, execute the command below.

awk '{print $2 "\t" $3}' file.txt

Printing all lines that match a specific pattern:

If you want to print lines that match a certain pattern, the syntax is as shown:

awk '/variable_to_be_matched/ {print $0}' file.txt

For instance, to match all entries with the letter ‘o’, the syntax will be:

awk '/o/ {print $0}' file.txt

Printing columns that match a specific pattern:

When AWK locates a pattern match, the command will execute the whole record. You can change the default by issuing an instruction to display only certain fields.

For example:

awk '/a/ {print $3 "\t" $4}' file.txt

The above command prints the 3rd and 4th columns where the letter ‘a’ appears in either of the columns

PreviousUnderstanding GIT - Lab 1 NextClass Quiz (15/6/2021)

Last updated 4 years ago

Was this helpful?