Read File Line by Line Shell Slow

sort

File sort utility, ofttimes used as a filter in a piping. This command sorts a text stream or file forwards or backwards, or according to various keys or graphic symbol positions. Using the -chiliad option, it merges presorted input files. The info page lists its many capabilities and options. See Example xi-ten, Example 11-11, and Case A-8.

tsort

Topological sort, reading in pairs of whitespace-separated strings and sorting co-ordinate to input patterns. The original purpose of tsort was to sort a listing of dependencies for an obsolete version of the ld linker in an "ancient" version of UNIX.

The results of a tsort will usually differ markedly from those of the standard sort control, above.

uniq

This filter removes indistinguishable lines from a sorted file. It is often seen in a pipe coupled with sort.

cat list-1 list-ii list-3 | sort | uniq > final.listing # Concatenates the list files, # sorts them, # removes duplicate lines, # and finally writes the result to an output file.

The useful -c option prefixes each line of the input file with its number of occurrences.

                  bash$                                                        cat testfile                                    This line occurs just one time.  This line occurs twice.  This line occurs twice.  This line occurs three times.  This line occurs 3 times.  This line occurs iii times.                  bash$                                                        uniq -c testfile                                                        ane This line occurs merely one time.        ii This line occurs twice.        3 This line occurs iii times.                  bash$                                                        sort testfile | uniq -c | sort -nr                                                        three This line occurs three times.        2 This line occurs twice.        1 This line occurs merely once.                

The sort INPUTFILE | uniq -c | sort -nr command string produces a frequency of occurrence listing on the INPUTFILE file (the -nr options to sort cause a contrary numerical sort). This template finds use in analysis of log files and dictionary lists, and wherever the lexical structure of a certificate needs to exist examined.

Example xvi-12. Give-and-take Frequency Analysis

#!/bin/bash # wf.sh: Crude give-and-take frequency assay on a text file. # This is a more efficient version of the "wf2.sh" script.   # Check for input file on control-line. ARGS=i E_BADARGS=85 E_NOFILE=86  if [ $# -ne "$ARGS" ]  # Correct number of arguments passed to script? then   echo "Usage: `basename $0` filename"   exit $E_BADARGS fi  if [ ! -f "$ane" ]       # Bank check if file exists. then   echo "File \"$1\" does non exist."   exit $E_NOFILE fi    ######################################################## # main () sed -e 'southward/\.//yard'  -e 's/\,//g' -due east 's/ /\ /m' "$1" | tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr #                           ========================= #                            Frequency of occurrence  #  Filter out periods and commas, and #+ change infinite betwixt words to linefeed, #+ and then shift characters to lowercase, and #+ finally prefix occurrence count and sort numerically.  #  Arun Giridhar suggests modifying the higher up to: #  . . . | sort | uniq -c | sort +1 [-f] | sort +0 -nr #  This adds a secondary sort key, so instances of #+ equal occurrence are sorted alphabetically. #  Every bit he explains it: #  "This is effectively a radix sort, start on the #+ least significant cavalcade #+ (discussion or cord, optionally case-insensitive) #+ and final on the virtually meaning column (frequency)." # #  As Frank Wang explains, the above is equivalent to #+       . . . | sort | uniq -c | sort +0 -nr #+ and the following too works: #+       . . . | sort | uniq -c | sort -k1nr -k ########################################################  exit 0  # Exercises: # --------- # 1) Add 'sed' commands to filter out other punctuation, #+   such as semicolons. # 2) Alter the script to also filter out multiple spaces and #+   other whitespace.

                  bash$                                                        cat testfile                                    This line occurs only once.  This line occurs twice.  This line occurs twice.  This line occurs three times.  This line occurs three times.  This line occurs three times.                  bash$                                                        ./wf.sh testfile                                                        six this        half dozen occurs        half-dozen line        3 times        iii iii        2 twice        1 only        1 in one case                

expand, unexpand

The aggrandize filter converts tabs to spaces. It is oft used in a pipe.

The unexpand filter converts spaces to tabs. This reverses the effect of expand.

cut

A tool for extracting fields from files. It is similar to the print $N command set in awk, simply more than limited. It may be simpler to employ cut in a script than awk. Peculiarly of import are the -d (delimiter) and -f (field specifier) options.

Using cut to obtain a listing of the mounted filesystems:

cut -d ' ' -f1,2 /etc/mtab

Using cut to list the Bone and kernel version:

uname -a | cutting -d" " -f1,3,eleven,12

Using cut to extract bulletin headers from an e-mail folder:

                  bash$                                                        grep '^Subject:' read-letters | cut -c10-fourscore                                    Re: Linux suitable for mission-critical apps?  Make MILLIONS WORKING AT HOME!!!  Spam complaint  Re: Spam complaint                

Using cutting to parse a file:

# List all the users in /etc/passwd.  FILENAME=/etc/passwd  for user in $(cut -d: -f1 $FILENAME) do   echo $user done  # Cheers, Oleg Philon for suggesting this.

cut -d ' ' -f2,3 filename is equivalent to awk -F'[ ]' '{ print $ii, $3 }' filename

Note

Information technology is even possible to specify a linefeed equally a delimiter. The fob is to actually embed a linefeed (RETURN) in the control sequence.

                            bash$                                                                                      cut -d'  ' -f3,7,19 testfile                                                        This is line three of testfile.  This is line 7 of testfile.  This is line xix of testfile.                          

Cheers, Jaka Kranjc, for pointing this out.

Run across also Example 16-48.

paste

Tool for merging together different files into a unmarried, multi-column file. In combination with cut, useful for creating organization log files.

                  bash$                                                        cat items                                    alphabet blocks  edifice blocks  cables                  bash$                                                        true cat prices                                    $1.00/dozen  $2.50 ea.  $3.75                  bash$                                                        paste items prices                                    alphabet blocks $1.00/dozen  building blocks $2.l ea.  cables  $3.75                

bring together

Consider this a special-purpose cousin of paste. This powerful utility allows merging 2 files in a meaningful mode, which substantially creates a simple version of a relational database.

The bring together control operates on exactly two files, but pastes together only those lines with a mutual tagged field (normally a numerical label), and writes the effect to stdout. The files to exist joined should be sorted co-ordinate to the tagged field for the matchups to work properly.

File: 1.data  100 Shoes 200 Laces 300 Socks

File: 2.data  100 $xl.00 200 $i.00 300 $2.00

                  fustigate$                                                        join 1.data ii.data                                    File: one.data 2.data   100 Shoes $xl.00  200 Laces $one.00  300 Socks $2.00                

Note

The tagged field appears only in one case in the output.

head

lists the get-go of a file to stdout. The default is x lines, merely a different number can be specified. The control has a number of interesting options.

Example 16-13. Which files are scripts?

#!/bin/bash # script-detector.sh: Detects scripts within a directory.  TESTCHARS=2    # Test start ii characters. SHABANG='#!'   # Scripts brainstorm with a "sha-blindside."  for file in *  # Traverse all the files in current directory. practise   if [[ `head -c$TESTCHARS "$file"` = "$SHABANG" ]]   #      head -c2                      #!   #  The '-c' selection to "caput" outputs a specified   #+ number of characters, rather than lines (the default).   and then     echo "File \"$file\" is a script."   else     echo "File \"$file\" is *not* a script."   fi done    leave 0  #  Exercises: #  --------- #  1) Change this script to have as an optional statement #+    the directory to browse for scripts #+    (rather than just the electric current working directory). # #  2) As information technology stands, this script gives "false positives" for #+    Perl, awk, and other scripting language scripts. #     Right this.

Example sixteen-fourteen. Generating 10-digit random numbers

#!/bin/bash # rnd.sh: Outputs a 10-digit random number  # Script by Stephane Chazelas.  head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p'   # =================================================================== #  # Assay # --------  # caput: # -c4 option takes starting time 4 bytes.  # od: # -N4 option limits output to 4 bytes. # -tu4 option selects unsigned decimal format for output.  # sed:  # -due north option, in combination with "p" flag to the "southward" command, # outputs only matched lines.    # The author of this script explains the action of 'sed', as follows.  # head -c4 /dev/urandom | od -N4 -tu4 | sed -ne '1s/.* //p' # ----------------------------------> |  # Presume output up to "sed" --------> | # is 0000000 1198195154\due north  #  sed begins reading characters: 0000000 1198195154\n. #  Here it finds a newline graphic symbol, #+ and then it is ready to process the first line (0000000 1198195154). #  Information technology looks at its <range><activity>s. The kickoff and only 1 is  #   range     action #   one         s/.* //p  #  The line number is in the range, and then it executes the activeness: #+ tries to substitute the longest cord ending with a space in the line #  ("0000000 ") with nil (//), and if information technology succeeds, prints the result #  ("p" is a flag to the "s" command hither, this is different #+ from the "p" control).  #  sed is now ready to continue reading its input. (Note that earlier #+ continuing, if -n option had non been passed, sed would accept printed #+ the line once over again).  #  Now, sed reads the remainder of the characters, and finds the #+ end of the file. #  It is now set up to process its 2nd line (which is as well numbered '$' as #+ it's the last one). #  It sees it is not matched past any <range>, and so its task is done.  #  In few word this sed commmand means: #  "On the kickoff line only, remove whatever character up to the right-most infinite, #+ and then print it."  # A better way to do this would take been: #           sed -east 's/.* //;q'  # Here, two <range><action>s (could have been written #           sed -eastward 'due south/.* //' -e q):  #   range                    action #   nothing (matches line)   southward/.* // #   nothing (matches line)   q (quit)  #  Hither, sed only reads its first line of input. #  Information technology performs both actions, and prints the line (substituted) earlier #+ quitting (because of the "q" activity) since the "-due north" option is not passed.  # =================================================================== #  # An even simpler altenative to the above one-line script would exist: #           head -c4 /dev/urandom| od -An -tu4  get out

Meet also Example 16-39.
tail

lists the (tail) end of a file to stdout. The default is 10 lines, only this can be inverse with the -n selection. Commonly used to keep track of changes to a system logfile, using the -f option, which outputs lines appended to the file.

Case sixteen-15. Using tail to monitor the system log

#!/bin/bash  filename=sys.log  cat /dev/null > $filename; echo "Creating / cleaning out file." #  Creates the file if it does not already exist, #+ and truncates it to zero length if it does. #  : > filename   and   > filename also work.  tail /var/log/messages > $filename   # /var/log/letters must have world read permission for this to work.  echo "$filename contains tail end of system log."  exit 0

Tip

To list a specific line of a text file, pipe the output of head to tail -n one. For example caput -due north 8 database.txt | tail -n 1 lists the 8th line of the file database.txt.

To fix a variable to a given block of a text file:

var=$(head -northward $m $filename | tail -n $n)  # filename = name of file # m = from kickoff of file, number of lines to end of block # n = number of lines to set variable to (trim from cease of cake)

Note

Newer implementations of tail deprecate the older tail -$LINES filename usage. The standard tail -n $LINES filename is correct.

Run into besides Example 16-5, Example 16-39 and Instance 32-half-dozen.

grep

A multi-purpose file search tool that uses Regular Expressions. Information technology was originally a command/filter in the venerable ed line editor: g/re/p -- global - regular expression - print.

grep blueprint [ file ...]

Search the target file(s) for occurrences of pattern , where design may be literal text or a Regular Expression.

                  bash$                                                        grep '[rst]ystem.$' osinfo.txt                                    The GPL governs the distribution of the Linux operating organisation.                

If no target file(s) specified, grep works equally a filter on stdout, as in a pipage.

                  bash$                                                        ps ax | grep clock                                    765 tty1     S      0:00 xclock  901 pts/1    Southward      0:00 grep clock                

The -i selection causes a example-insensitive search.

The -w option matches only whole words.

The -l option lists only the files in which matches were establish, but not the matching lines.

The -r (recursive) option searches files in the current working directory and all subdirectories below it.

The -n option lists the matching lines, together with line numbers.

                  bash$                                                        grep -n Linux osinfo.txt                                    ii:This is a file containing information near Linux.  vi:The GPL governs the distribution of the Linux operating organisation.                

The -5 (or --capsize-friction match) option filters out matches.

grep pattern1 *.txt | grep -v pattern2  # Matches all lines in "*.txt" files containing "pattern1", # simply ***non*** "pattern2".                

The -c (--count) option gives a numerical count of matches, rather than actually listing the matches.

grep -c txt *.sgml   # (number of occurrences of "txt" in "*.sgml" files)   #   grep -cz . #            ^ dot # means count (-c) zero-separated (-z) items matching "." # that is, non-empty ones (containing at least 1 character). #  printf 'a b\nc  d\n\n\n\north\n\000\n\000e\000\000\nf' | grep -cz .     # iii printf 'a b\nc  d\due north\due north\north\n\n\000\north\000e\000\000\nf' | grep -cz '$'   # 5 printf 'a b\nc  d\n\north\n\n\n\000\n\000e\000\000\nf' | grep -cz '^'   # five # printf 'a b\nc  d\n\n\n\n\n\000\n\000e\000\000\nf' | grep -c '$'    # 9 # By default, newline chars (\n) split up items to match.   # Note that the -z option is GNU "grep" specific.   # Thanks, South.C.

The --colour (or --colour) pick marks the matching string in color (on the console or in an xterm window). Since grep prints out each unabridged line containing the matching pattern, this lets you lot run across exactly what is beingness matched. Run across also the -o selection, which shows simply the matching portion of the line(southward).

Instance 16-16. Printing out the From lines in stored e-mail letters

#!/bin/bash # from.sh  #  Emulates the useful 'from' utility in Solaris, BSD, etc. #  Echoes the "From" header line in all messages #+ in your e-post directory.   MAILDIR=~/mail/*               #  No quoting of variable. Why? # Maybe check if-exists $MAILDIR:   if [ -d $MAILDIR ] . . . GREP_OPTS="-H -A 5 --color"    #  Show file, plus actress context lines                                #+ and display "From" in color. TARGETSTR="^From"              # "From" at beginning of line.  for file in $MAILDIR           #  No quoting of variable. exercise   grep $GREP_OPTS "$TARGETSTR" "$file"   #    ^^^^^^^^^^              #  Again, practise not quote this variable.   repeat washed  exit $?  #  You might wish to pipe the output of this script to 'more' #+ or redirect it to a file . . .

When invoked with more than i target file given, grep specifies which file contains matches.

                  bash$                                                        grep Linux osinfo.txt misc.txt                                    osinfo.txt:This is a file containing information about Linux.  osinfo.txt:The GPL governs the distribution of the Linux operating organisation.  misc.txt:The Linux operating system is steadily gaining in popularity.                

Tip

To force grep to testify the filename when searching only one target file, simply give /dev/zippo equally the 2d file.

                            bash$                                                                                      grep Linux osinfo.txt /dev/null                                                        osinfo.txt:This is a file containing information about Linux.  osinfo.txt:The GPL governs the distribution of the Linux operating system.                          

If there is a successful match, grep returns an exit status of 0, which makes information technology useful in a condition test in a script, particularly in combination with the -q selection to suppress output.

SUCCESS=0                      # if grep lookup succeeds word=Linux filename=data.file  grep -q "$word" "$filename"    #  The "-q" pick                                #+ causes nothing to repeat to stdout. if [ $? -eq $SUCCESS ] # if grep -q "$word" "$filename"   can replace lines v - 7. then   echo "$word plant in $filename" else   repeat "$discussion not plant in $filename" fi

Example 32-six demonstrates how to utilise grep to search for a word pattern in a arrangement logfile.

Example xvi-17. Emulating grep in a script

#!/bin/fustigate # grp.sh: Rudimentary reimplementation of grep.  E_BADARGS=85  if [ -z "$one" ]    # Check for argument to script. then   echo "Usage: `basename $0` blueprint"   get out $E_BADARGS fi    echo  for file in *     # Traverse all files in $PWD. practice   output=$(sed -due north /"$1"/p $file)  # Command substitution.    if [ ! -z "$output" ]           # What happens if "$output" is not quoted?   then     echo -n "$file: "     repeat "$output"   fi              #  sed -ne "/$1/s|^|${file}: |p"  is equivalent to above.    echo washed    echo  exit 0  # Exercises: # --------- # one) Add newlines to output, if more i match in any given file. # ii) Add together features.

How can grep search for two (or more) separate patterns? What if yous desire grep to display all lines in a file or files that contain both "pattern1" and "pattern2"?

One method is to pipe the result of grep pattern1 to grep pattern2.

For example, given the following file:

# Filename: tstfile  This is a sample file. This is an ordinary text file. This file does non incorporate any unusual text. This file is not unusual. Here is some text.

Now, let's search this file for lines containing both "file" and "text" . . .

                bash$                                                  grep file tstfile                                # Filename: tstfile  This is a sample file.  This is an ordinary text file.  This file does not incorporate any unusual text.  This file is not unusual.                bash$                                                  grep file tstfile | grep text                                This is an ordinary text file.  This file does not comprise any unusual text.              

Now, for an interesting recreational use of grep . . .

Example sixteen-18. Crossword puzzle solver

#!/bin/bash # cw-solver.sh # This is actually a wrapper around a one-liner (line 46).  #  Crossword puzzle and anagramming word game solver. #  You know *some* of the letters in the give-and-take you're looking for, #+ so you demand a list of all valid words #+ with the known letters in given positions. #  For example: w...i....n #               ane???5????x # west in position i, 3 unknowns, i in the 5th, 4 unknowns, due north at the end. # (Run across comments at end of script.)   E_NOPATT=71 DICT=/usr/share/dict/word.lst #                    ^^^^^^^^   Looks for word list hither. #  ASCII word list, one give-and-take per line. #  If you happen to need an advisable list, #+ download the author'due south "yawl" word listing package. #  http://ibiblio.org/pub/Linux/libs/yawl-0.3.2.tar.gz #  or #  http://bash.deta.in/yawl-0.three.ii.tar.gz   if [ -z "$1" ]   #  If no word pattern specified then             #+ as a command-line argument . . .   echo           #+ . . . then . . .   repeat "Usage:"  #+ Usage message.   echo   echo ""$0" \"blueprint,\""   echo "where \"pattern\" is in the form"   echo "thirty..10.x..."   echo   repeat "The 10's represent known messages,"   repeat "and the periods are unknown letters (blanks)."   echo "Letters and periods can be in whatever position."   echo "For case, try:   sh cw-solver.sh due west...i....northward"   echo   exit $E_NOPATT fi  echo # =============================================== # This is where all the piece of work gets done. grep ^"$1"$ "$DICT"   # Aye, only one line! #    |    | # ^ is get-go-of-word regex anchor. # $ is cease-of-give-and-take regex anchor.  #  From _Stupid Grep Tricks_, vol. 1, #+ a book the ABS Guide writer may however get effectually #+ to writing . . . ane of these days . . . # =============================================== echo   go out $?  # Script terminates here. #  If at that place are too many words generated, #+ redirect the output to a file.  $ sh cw-solver.sh w...i....n  wellington workingman workingmen

egrep -- extended grep -- is the same as grep -E. This uses a somewhat different, extended ready of Regular Expressions, which tin can make the search a bit more flexible. It also allows the boolean | (or) operator.

                  bash $                                                        egrep 'matches|Matches' file.txt                                    Line 1 matches.  Line 3 Matches.  Line 4 contains matches, but as well Matches                

fgrep -- fast grep -- is the same as grep -F. It does a literal string search (no Regular Expressions), which generally speeds things upward a chip.

Note

On some Linux distros, egrep and fgrep are symbolic links to, or aliases for grep, just invoked with the -E and -F options, respectively.

Example 16-xix. Looking upwards definitions in Webster's 1913 Dictionary

#!/bin/fustigate # dict-lookup.sh  #  This script looks up definitions in the 1913 Webster's Dictionary. #  This Public Domain lexicon is bachelor for download #+ from various sites, including #+ Projection Gutenberg (http://www.gutenberg.org/etext/247). # #  Convert it from DOS to UNIX format (with simply LF at end of line) #+ before using it with this script. #  Store the file in plain, uncompressed ASCII text. #  Set DEFAULT_DICTFILE variable beneath to path/filename.   E_BADARGS=85 MAXCONTEXTLINES=l                        # Maximum number of lines to show. DEFAULT_DICTFILE="/usr/share/dict/webster1913-dict.txt"                                           # Default dictionary file pathname.                                           # Alter this as necessary. #  Notation: #  ---- #  This item edition of the 1913 Webster's #+ begins each entry with an capital letter #+ (lowercase for the remaining characters). #  Merely the *very first line* of an entry begins this manner, #+ and that's why the search algorithm below works.    if [[ -z $(echo "$1" | sed -n '/^[A-Z]/p') ]] #  Must at least specify word to look upward, and #+ it must start with an upper-case letter letter. and then   echo "Usage: `basename $0` Give-and-take-to-ascertain [lexicon-file]"   echo   echo "Note: Discussion to look upward must start with capital letter,"   repeat "with the remainder of the word in lowercase."   echo "--------------------------------------------"   echo "Examples: Abandon, Lexicon, Marking, etc."   leave $E_BADARGS fi   if [ -z "$2" ]                            #  May specify different dictionary                                           #+ every bit an argument to this script. then   dictfile=$DEFAULT_DICTFILE else   dictfile="$2" fi  # --------------------------------------------------------- Definition=$(fgrep -A $MAXCONTEXTLINES "$ane \\" "$dictfile") #                  Definitions in form "Word \..." # #  And, yep, "fgrep" is fast plenty #+ to search even a very big text file.   # Now, snip out just the definition block.  echo "$Definition" | sed -n 'ane,/^[A-Z]/p' | #  Impress from first line of output #+ to the first line of the next entry. sed '$d' | sed '$d' #  Delete final two lines of output #+ (blank line and first line of next entry). # ---------------------------------------------------------  get out $?  # Exercises: # --------- # 1)  Modify the script to have any type of alphabetic input #   + (majuscule, lowercase, mixed case), and convert it #   + to an acceptable format for processing. # # 2)  Convert the script to a GUI application, #   + using something like 'gdialog' or 'zenity' . . . #     The script volition then no longer take its statement(south) #   + from the control-line. # # 3)  Alter the script to parse i of the other available #   + Public Domain Dictionaries, such as the U.South. Demography Agency Gazetteer.

Note

Meet also Example A-41 for an case of speedy fgrep lookup on a large text file.

agrep (approximate grep) extends the capabilities of grep to approximate matching. The search string may differ by a specified number of characters from the resulting matches. This utility is not role of the core Linux distribution.

Tip

To search compressed files, use zgrep, zegrep, or zfgrep. These too work on non-compressed files, though slower than evidently grep, egrep, fgrep. They are handy for searching through a mixed fix of files, some compressed, some non.

To search bzipped files, use bzgrep.

wait

The control wait works similar grep, but does a lookup on a "dictionary," a sorted word list. By default, await searches for a lucifer in /usr/dict/words, merely a different dictionary file may be specified.

Example 16-20. Checking words in a list for validity

#!/bin/bash # lookup: Does a dictionary lookup on each word in a data file.  file=words.information  # Information file from which to read words to test.  repeat repeat "Testing file $file" echo  while [ "$give-and-take" != terminate ]  # Last word in data file. exercise               # ^^^   read give-and-take      # From data file, because of redirection at stop of loop.   wait $word > /dev/null  # Don't want to display lines in dictionary file.   #  Searches for words in the file /usr/share/dict/words   #+ (unremarkably a link to linux.words).   lookup=$?      # Leave status of 'look' command.    if [ "$lookup" -eq 0 ]   then     echo "\"$give-and-take\" is valid."   else     echo "\"$word\" is invalid."   fi    done <"$file"    # Redirects stdin to $file, so "reads" come from there.  echo  go out 0  # ---------------------------------------------------------------- # Lawmaking beneath line will not execute because of "exit" control above.   # Stephane Chazelas proposes the following, more than concise culling:  while read word && [[ $word != end ]] do if look "$give-and-take" > /dev/goose egg    then echo "\"$word\" is valid."    else echo "\"$word\" is invalid."    fi done <"$file"  exit 0

sed, awk

Scripting languages peculiarly suited for parsing text files and control output. May be embedded singly or in combination in pipes and shell scripts.

sed

Non-interactive "stream editor", permits using many ex commands in batch style. It finds many uses in shell scripts.

awk

Programmable file extractor and formatter, good for manipulating and/or extracting fields (columns) in structured text files. Its syntax is similar to C.

wc

wc gives a "discussion count" on a file or I/O stream:

                  bash $                                                        wc /usr/share/doc/sed-iv.ane.ii/README                                    13  70  447 README                  [13 lines  70 words  447 characters]

wc -w gives merely the word count.

wc -50 gives only the line count.

wc -c gives only the byte count.

wc -thousand gives only the character count.

wc -50 gives only the length of the longest line.

Using wc to count how many .txt files are in current working directory:

$ ls *.txt | wc -l #  Volition work as long as none of the "*.txt" files #+ take a linefeed embedded in their name.  #  Alternative ways of doing this are: #      detect . -maxdepth 1 -proper name \*.txt -print0 | grep -cz . #      (shopt -south nullglob; ready -- *.txt; echo $#)  #  Thanks, Southward.C.

Using wc to total upwardly the size of all the files whose names begin with letters in the range d - h

                  bash$                                                        wc [d-h]* | grep total | awk '{impress $3}'                                    71832                

Using wc to count the instances of the word "Linux" in the main source file for this book.

                  fustigate$                                                        grep Linux abs-book.sgml | wc -fifty                                    138                

See also Example 16-39 and Example 20-8.

Certain commands include some of the functionality of wc as options.

... | grep foo | wc -l # This frequently used construct can exist more concisely rendered.  ... | grep -c foo # But use the "-c" (or "--count") choice of grep.  # Thanks, S.C.

tr

grapheme translation filter.

Caution

Must utilize quoting and/or brackets, as appropriate. Quotes prevent the trounce from reinterpreting the special characters in tr command sequences. Brackets should be quoted to prevent expansion by the shell.

Either tr "A-Z" "*" <filename or tr A-Z \* <filename changes all the uppercase letters in filename to asterisks (writes to stdout). On some systems this may not piece of work, simply tr A-Z '[**]' will.

The -d selection deletes a range of characters.

echo "abcdef"                 # abcdef echo "abcdef" | tr -d b-d     # aef   tr -d 0-ix <filename # Deletes all digits from the file "filename".

The --clasp-repeats (or -s) choice deletes all merely the first case of a string of consecutive characters. This option is useful for removing backlog whitespace.

                  bash$                                                        echo "XXXXX" | tr --squeeze-repeats 'X'                                    X                

The -c "complement" choice inverts the character fix to lucifer. With this option, tr acts just upon those characters non matching the specified set.

                  bash$                                                        echo "acfdeb123" | tr -c b-d +                                    +c+d+b++++                

Note that tr recognizes POSIX graphic symbol classes. [1]

                  bash$                                                        echo "abcd2ef1" | tr '[:alpha:]' -                                    ----2--1                

Example 16-21. toupper: Transforms a file to all uppercase.

#!/bin/fustigate # Changes a file to all uppercase.  E_BADARGS=85  if [ -z "$1" ]  # Standard cheque for control-line arg. then   echo "Usage: `basename $0` filename"   exit $E_BADARGS fi    tr a-z A-Z <"$one"  # Same consequence as above, but using POSIX graphic symbol set notation: #        tr '[:lower:]' '[:upper:]' <"$1" # Thanks, S.C.  #     Or even . . . #     cat "$1" | tr a-z A-Z #     Or dozens of other ways . . .  exit 0  #  Exercise: #  Rewrite this script to requite the selection of irresolute a file #+ to *either* upper or lowercase. #  Hint: Utilise either the "instance" or "select" command.

Instance 16-22. lowercase: Changes all filenames in working directory to lowercase.

#!/bin/bash # #  Changes every filename in working directory to all lowercase. # #  Inspired by a script of John Dubois, #+ which was translated into Fustigate by Chet Ramey, #+ and considerably simplified by the author of the ABS Guide.   for filename in *                # Traverse all files in directory. do    fname=`basename $filename`    n=`echo $fname | tr A-Z a-z`  # Modify name to lowercase.    if [ "$fname" != "$northward" ]       # Rename only files not already lowercase.    then      mv $fname $n    fi   done     exit $?   # Code beneath this line will non execute because of "leave". #--------------------------------------------------------# # To run it, delete script above line.  # The to a higher place script will not work on filenames containing blanks or newlines. # Stephane Chazelas therefore suggests the following alternative:   for filename in *    # Not necessary to use basename,                      # since "*" won't return any file containing "/". exercise n=`repeat "$filename/" | tr '[:upper:]' '[:lower:]'` #                             POSIX char set notation. #                    Slash added so that trailing newlines are not #                    removed past command substitution.    # Variable exchange:    n=${due north%/}          # Removes trailing slash, added above, from filename.    [[ $filename == $n ]] || mv "$filename" "$north"                      # Checks if filename already lowercase. done  exit $?

Example 16-23. du: DOS to UNIX text file conversion.

#!/bin/bash # Du.sh: DOS to UNIX text file converter.  E_WRONGARGS=85  if [ -z "$1" ] then   echo "Usage: `basename $0` filename-to-catechumen"   exit $E_WRONGARGS fi  NEWFILENAME=$one.unx  CR='\015'  # Carriage return.            # 015 is octal ASCII code for CR.            # Lines in a DOS text file cease in CR-LF.            # Lines in a UNIX text file finish in LF just.  tr -d $CR < $1 > $NEWFILENAME # Delete CR's and write to new file.  echo "Original DOS text file is \"$ane\"." echo "Converted UNIX text file is \"$NEWFILENAME\"."  exit 0  # Practise: # -------- # Alter the above script to convert from UNIX to DOS.

Example 16-24. rot13: ultra-weak encryption.

#!/bin/bash # rot13.sh: Classic rot13 algorithm, #           encryption that might fool a 3-twelvemonth one-time #           for nearly x minutes.  # Usage: ./rot13.sh filename # or     ./rot13.sh <filename # or     ./rot13.sh and supply keyboard input (stdin)  true cat "$@" | tr 'a-zA-Z' 'due north-za-mN-ZA-G'   # "a" goes to "n", "b" to "o" ... #  The   cat "$@"   construct #+ permits input either from stdin or from files.  leave 0

Example sixteen-25. Generating "Crypto-Quote" Puzzles

#!/bin/bash # crypto-quote.sh: Encrypt quotes  #  Will encrypt famous quotes in a simple monoalphabetic substitution. #  The result is similar to the "Crypto Quote" puzzles #+ seen in the Op Ed pages of the Sunday paper.   key=ETAOINSHRDLUBCFGJMQPVWZYXK # The "key" is nothing more than than a scrambled alphabet. # Changing the "fundamental" changes the encryption.  # The 'cat "$@"' construction gets input either from stdin or from files. # If using stdin, end input with a Command-D. # Otherwise, specify filename as command-line parameter.  cat "$@" | tr "a-z" "A-Z" | tr "A-Z" "$key" #        |  to upper-case letter  |     encrypt        # Will work on lowercase, uppercase, or mixed-example quotes. # Passes not-alphabetic characters through unchanged.   # Endeavour this script with something like: # "Nix so needs reforming every bit other people's habits." # --Marker Twain # # Output is: # "CFPHRCS QF CIIOQ MINFMBRCS EQ FPHIM GIFGUI'Q HETRPQ." # --BEML PZERC  # To reverse the encryption: # cat "$@" | tr "$fundamental" "A-Z"   #  This simple-minded goose egg can exist cleaved by an average 12-yr quondam #+ using merely pencil and paper.  exit 0  #  Exercise: #  -------- #  Modify the script so that it volition either encrypt or decrypt, #+ depending on command-line argument(southward).

Of course, tr lends itself to lawmaking obfuscation.

#!/bin/bash # jabh.sh  x="wftedskaebjgdBstbdbsmnjgz" echo $10 | tr "a-z" 'oh, turtleneck Phrase Jar!'  # Based on the Wikipedia "But another Perl hacker" commodity.

fold

A filter that wraps lines of input to a specified width. This is particularly useful with the -s selection, which breaks lines at word spaces (see Instance xvi-26 and Case A-one).

fmt

Simple-minded file formatter, used as a filter in a piping to "wrap" long lines of text output.

Instance 16-26. Formatted file listing.

#!/bin/fustigate  WIDTH=forty                    # forty columns wide.  b=`ls /usr/local/bin`       # Get a file list...  echo $b | fmt -w $WIDTH  # Could also have been done past #    echo $b | fold - -s -w $WIDTH   exit 0

See also Example 16-5.

col

This deceptively named filter removes contrary line feeds from an input stream. It also attempts to replace whitespace with equivalent tabs. The chief utilise of col is in filtering the output from sure text processing utilities, such as groff and tbl.

column

Column formatter. This filter transforms list-type text output into a "pretty-printed" table by inserting tabs at advisable places.

Example 16-27. Using column to format a directory listing

#!/bin/bash # colms.sh # A minor modification of the case file in the "column" man folio.   (printf "PERMISSIONS LINKS OWNER GROUP SIZE MONTH DAY HH:MM PROG-NAME\northward" \ ; ls -l | sed 1d) | column -t #         ^^^^^^           ^^  #  The "sed 1d" in the pipage deletes the first line of output, #+ which would be "total        N", #+ where "N" is the full number of files establish by "ls -fifty".  # The -t selection to "column" pretty-prints a table.  exit 0

colrm

Column removal filter. This removes columns (characters) from a file and writes the file, lacking the range of specified columns, back to stdout. colrm two four <filename removes the second through fourth characters from each line of the text file filename.

Caution

If the file contains tabs or nonprintable characters, this may cause unpredictable behavior. In such cases, consider using expand and unexpand in a pipe preceding colrm.

nl

Line numbering filter: nl filename lists filename to stdout, simply inserts sequent numbers at the beginning of each non-blank line. If filename omitted, operates on stdin.

The output of nl is very like to cat -b , since, past default nl does not list blank lines.

Example sixteen-28. nl: A self-numbering script.

#!/bin/bash # line-number.sh  # This script echoes itself twice to stdout with its lines numbered.  echo "     line number = $LINENO" # 'nl' sees this as line four #                                   (nl does non number blank lines). #                                   'cat -n' sees it correctly equally line #six.  nl `basename $0`  echo; echo  # Now, permit'due south try it with 'cat -n'  true cat -n `basename $0` # The divergence is that 'cat -northward' numbers the bare lines. # Note that 'nl -ba' will also do and then.  exit 0 # -----------------------------------------------------------------

pr

Print formatting filter. This will paginate files (or stdout) into sections suitable for hard copy printing or viewing on screen. Various options permit row and cavalcade manipulation, joining lines, setting margins, numbering lines, adding page headers, and merging files, among other things. The pr command combines much of the functionality of nl, paste, fold, column, and expand.

pr -o v --width=65 fileZZZ | more gives a nice paginated listing to screen of fileZZZ with margins set at 5 and 65.

A specially useful option is -d, forcing double-spacing (same effect as sed -G).

gettext

The GNU gettext bundle is a gear up of utilities for localizing and translating the text output of programs into foreign languages. While originally intended for C programs, it now supports quite a number of programming and scripting languages.

The gettext program works on shell scripts. See the info page .

msgfmt

A program for generating binary message catalogs. Information technology is used for localization.

iconv

A utility for converting file(s) to a different encoding (character set). Its chief utilize is for localization.

# Convert a string from UTF-viii to UTF-sixteen and print to the BookList function write_utf8_string {     STRING=$1     BOOKLIST=$2     echo -due north "$STRING" | iconv -f UTF8 -t UTF16 | \     cutting -b 3- | tr -d \\n >> "$BOOKLIST" }  #  From Peter Knowles' "booklistgen.sh" script #+ for converting files to Sony Librie/PRS-50X format. #  (http://booklistgensh.peterknowles.com)

recode

Consider this a fancier version of iconv, above. This very versatile utility for converting a file to a different encoding scheme. Note that recode is not part of the standard Linux installation.

TeX, gs

TeX and Postscript are text markup languages used for preparing copy for printing or formatted video display.

TeX is Donald Knuth'southward elaborate typsetting system. It is often convenient to write a shell script encapsulating all the options and arguments passed to i of these markup languages.

Ghostscript (gs) is a GPL-ed Postscript interpreter.

texexec

Utility for processing TeX and pdf files. Found in /usr/bin on many Linux distros, information technology is actually a shell wrapper that calls Perl to invoke Tex.

texexec --pdfarrange --result=Concatenated.pdf *pdf  #  Concatenates all the pdf files in the electric current working directory #+ into the merged file, Concatenated.pdf . . . #  (The --pdfarrange option repaginates a pdf file. See also --pdfcombine.) #  The in a higher place command-line could be parameterized and put into a shell script.

enscript

Utility for converting evidently text file to PostScript

For example, enscript filename.txt -p filename.ps produces the PostScript output file filename.ps.

groff, tbl, eqn

Yet another text markup and display formatting language is groff. This is the enhanced GNU version of the venerable UNIX roff/troff display and typesetting package. Manpages use groff.

The tbl table processing utility is considered role of groff, as its function is to convert table markup into groff commands.

The eqn equation processing utility is also part of groff, and its function is to convert equation markup into groff commands.

Instance 16-29. manview: Viewing formatted manpages

#!/bin/bash # manview.sh: Formats the source of a human folio for viewing.  #  This script is useful when writing man folio source. #  It lets you look at the intermediate results on the fly #+ while working on it.  E_WRONGARGS=85  if [ -z "$one" ] so   echo "Usage: `basename $0` filename"   exit $E_WRONGARGS fi  # --------------------------- groff -Tascii -man $1 | less # From the man page for groff. # ---------------------------  #  If the man page includes tables and/or equations, #+ so the higher up lawmaking will barf. #  The following line can handle such cases. # #   gtbl < "$1" | geqn -Tlatin1 | groff -Tlatin1 -mtty-char -man # #   Thanks, S.C.  exit $?   # Run across besides the "maned.sh" script.

Run into also Example A-39.

lex, yacc

The lex lexical analyzer produces programs for pattern matching. This has been replaced by the nonproprietary flex on Linux systems.

The yacc utility creates a parser based on a set of specifications. This has been replaced by the nonproprietary bison on Linux systems.

lottscoduchis.blogspot.com

Source: https://tldp.org/LDP/abs/html/textproc.html

0 Response to "Read File Line by Line Shell Slow"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel