PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

     GNU Parallel Tutorial
          This tutorial shows off much of GNU parallel's
          functionality. The tutorial is meant to learn the options in
          and syntax of GNU parallel.  The tutorial is not to show
          realistic examples from the real world.

        Reader's guide
          If you prefer reading a book buy GNU Parallel 2018 at
          https://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html
          or download it at: https://doi.org/10.5281/zenodo.1146014

          Otherwise start by watching the intro videos for a quick
          introduction:
          https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

          Then browse through the EXAMPLEs after the list of OPTIONS
          in man parallel (Use LESS=+/EXAMPLE: man parallel). That
          will give you an idea of what GNU parallel is capable of.

          If you want to dive even deeper: spend a couple of hours
          walking through the tutorial (man parallel_tutorial). Your
          command line will love you for it.

          Finally you may want to look at the rest of the manual (man
          parallel) if you have special needs not already covered.

          If you want to know the design decisions behind GNU
          parallel, try: man parallel_design. This is also a good
          intro if you intend to change GNU parallel.

     Prerequisites
          To run this tutorial you must have the following:

          parallel >= version 20160822
                   Install the newest version using your package
                   manager (recommended for security reasons), the way
                   described in README, or with this command:

                     $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
                        fetch -o - http://pi.dk/3 ) > install.sh
                     $ sha1sum install.sh
                     12345678 3374ec53 bacb199b 245af2dd a86df6c9
                     $ md5sum install.sh
                     029a9ac0 6e8b5bc6 052eac57 b2c3c9ca
                     $ sha512sum install.sh
                     40f53af6 9e20dae5 713ba06c f517006d 9897747b ed8a4694 b1acba1b 1464beb4
                     60055629 3f2356f3 3e9c4e3c 76e3f3af a9db4b32 bd33322b 975696fc e6b23cfb
                     $ bash install.sh

                   This will also install the newest version of the

     Page 1                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

                   tutorial which you can see by running this:

                     man parallel_tutorial

                   Most of the tutorial will work on older versions,
                   too.

          abc-file:
                   The file can be generated by this command:

                     parallel -k echo ::: A B C > abc-file

          def-file:
                   The file can be generated by this command:

                     parallel -k echo ::: D E F > def-file

          abc0-file:
                   The file can be generated by this command:

                     perl -e 'printf "A\0B\0C\0"' > abc0-file

          abc_-file:
                   The file can be generated by this command:

                     perl -e 'printf "A_B_C_"' > abc_-file

          tsv-file.tsv
                   The file can be generated by this command:

                     perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv

          num8     The file can be generated by this command:

                     perl -e 'for(1..8){print "$_\n"}' > num8

          num128   The file can be generated by this command:

                     perl -e 'for(1..128){print "$_\n"}' > num128

          num30000 The file can be generated by this command:

                     perl -e 'for(1..30000){print "$_\n"}' > num30000

          num1000000
                   The file can be generated by this command:

                     perl -e 'for(1..1000000){print "$_\n"}' > num1000000

          num_%header
                   The file can be generated by this command:

     Page 2                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

                     (echo %head1; echo %head2; \
                      perl -e 'for(1..10){print "$_\n"}') > num_%header

          fixedlen The file can be generated by this command:

                     perl -e 'print "HHHHAAABBBCCC"' > fixedlen

     $SERVER1 and $SERVER2 must work.
          For remote running: ssh login on 2 servers with no password in
                     SERVER1=server.example.com
                     SERVER2=server2.example.net

                   So you must be able to do this without entering a
                   password:

                     ssh $SERVER1 echo works
                     ssh $SERVER2 echo works

                   It can be setup by running 'ssh-keygen -t dsa;
                   ssh-copy-id $SERVER1' and using an empty
                   passphrase, or you can use ssh-agent.

     Input sources
          GNU parallel reads input from input sources. These can be
          files, the command line, and stdin (standard input or a
          pipe).

        A single input source
          Input can be read from the command line:

            parallel echo ::: A B C

          Output (the order may be different because the jobs are run
          in parallel):

            A
            B
            C

          The input source can be a file:

            parallel -a abc-file echo

          Output: Same as above.

          STDIN (standard input) can be the input source:

            cat abc-file | parallel echo

          Output: Same as above.

        Multiple input sources

     Page 3                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          GNU parallel can take multiple input sources given on the
          command line. GNU parallel then generates all combinations
          of the input sources:

            parallel echo ::: A B C ::: D E F

          Output (the order may be different):

            A D
            A E
            A F
            B D
            B E
            B F
            C D
            C E
            C F

          The input sources can be files:

            parallel -a abc-file -a def-file echo

          Output: Same as above.

          STDIN (standard input) can be one of the input sources using
          -:

            cat abc-file | parallel -a - -a def-file echo

          Output: Same as above.

          Instead of -a files can be given after :::::

            cat abc-file | parallel echo :::: - def-file

          Output: Same as above.

          ::: and :::: can be mixed:

            parallel echo ::: A B C :::: def-file

          Output: Same as above.

          Linking arguments from input sources

          With --link you can link the input sources and get one
          argument from each input source:

            parallel --link echo ::: A B C ::: D E F

          Output (the order may be different):

     Page 4                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            A D
            B E
            C F

          If one of the input sources is too short, its values will
          wrap:

            parallel --link echo ::: A B C D E ::: F G

          Output (the order may be different):

            A F
            B G
            C F
            D G
            E F

          For more flexible linking you can use :::+ and ::::+. They
          work like ::: and :::: except they link the previous input
          source to this input source.

          This will link ABC to GHI:

            parallel echo :::: abc-file :::+ G H I :::: def-file

          Output (the order may be different):

            A G D
            A G E
            A G F
            B H D
            B H E
            B H F
            C I D
            C I E
            C I F

          This will link GHI to DEF:

            parallel echo :::: abc-file ::: G H I ::::+ def-file

          Output (the order may be different):

            A G D
            A H E
            A I F
            B G D
            B H E
            B I F
            C G D
            C H E
            C I F

     Page 5                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          If one of the input sources is too short when using :::+ or
          ::::+, the rest will be ignored:

            parallel echo ::: A B C D E :::+ F G

          Output (the order may be different):

            A F
            B G

        Changing the argument separator.
          GNU parallel can use other separators than ::: or ::::. This
          is typically useful if ::: or :::: is used in the command to
          run:

            parallel --arg-sep ,, echo ,, A B C :::: def-file

          Output (the order may be different):

            A D
            A E
            A F
            B D
            B E
            B F
            C D
            C E
            C F

          Changing the argument file separator:

            parallel --arg-file-sep // echo ::: A B C // def-file

          Output: Same as above.

        Changing the argument delimiter
          GNU parallel will normally treat a full line as a single
          argument: It uses \n as argument delimiter. This can be
          changed with -d:

            parallel -d _ echo :::: abc_-file

          Output (the order may be different):

            A
            B
            C

          NUL can be given as \0:

            parallel -d '\0' echo :::: abc0-file

     Page 6                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          Output: Same as above.

          A shorthand for -d '\0' is -0 (this will often be used to
          read files from find ... -print0):

            parallel -0 echo :::: abc0-file

          Output: Same as above.

        End-of-file value for input source
          GNU parallel can stop reading when it encounters a certain
          value:

            parallel -E stop echo ::: A B stop C D

          Output:

            A
            B

        Skipping empty lines
          Using --no-run-if-empty GNU parallel will skip empty lines.

            (echo 1; echo; echo 2) | parallel --no-run-if-empty echo

          Output:

            1
            2

     Building the command line
        No command means arguments are commands
          If no command is given after parallel the arguments
          themselves are treated as commands:

            parallel ::: ls 'echo foo' pwd

          Output (the order may be different):

            [list of files in current dir]
            foo
            [/path/to/current/working/dir]

          The command can be a script, a binary or a Bash function if
          the function is exported using export -f:

            # Only works in Bash
            my_func() {
              echo in my_func $1
            }
            export -f my_func
            parallel my_func ::: 1 2 3

     Page 7                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          Output (the order may be different):

            in my_func 1
            in my_func 2
            in my_func 3

        Replacement strings
          The 7 predefined replacement strings

          GNU parallel has several replacement strings. If no
          replacement strings are used the default is to append {}:

            parallel echo ::: A/B.C

          Output:

            A/B.C

          The default replacement string is {}:

            parallel echo {} ::: A/B.C

          Output:

            A/B.C

          The replacement string {.} removes the extension:

            parallel echo {.} ::: A/B.C

          Output:

            A/B

          The replacement string {/} removes the path:

            parallel echo {/} ::: A/B.C

          Output:

            B.C

          The replacement string {//} keeps only the path:

            parallel echo {//} ::: A/B.C

          Output:

            A

          The replacement string {/.} removes the path and the
          extension:

     Page 8                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            parallel echo {/.} ::: A/B.C

          Output:

            B

          The replacement string {#} gives the job number:

            parallel echo {#} ::: A B C

          Output (the order may be different):

            1
            2
            3

          The replacement string {%} gives the job slot number
          (between 1 and number of jobs to run in parallel):

            parallel -j 2 echo {%} ::: A B C

          Output (the order may be different and 1 and 2 may be
          swapped):

            1
            2
            1

          Changing the replacement strings

          The replacement string {} can be changed with -I:

            parallel -I ,, echo ,, ::: A/B.C

          Output:

            A/B.C

          The replacement string {.} can be changed with
          --extensionreplace:

            parallel --extensionreplace ,, echo ,, ::: A/B.C

          Output:

            A/B

          The replacement string {/} can be replaced with
          --basenamereplace:

            parallel --basenamereplace ,, echo ,, ::: A/B.C

     Page 9                      20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          Output:

            B.C

          The replacement string {//} can be changed with
          --dirnamereplace:

            parallel --dirnamereplace ,, echo ,, ::: A/B.C

          Output:

            A

          The replacement string {/.} can be changed with
          --basenameextensionreplace:

            parallel --basenameextensionreplace ,, echo ,, ::: A/B.C

          Output:

            B

          The replacement string {#} can be changed with --seqreplace:

            parallel --seqreplace ,, echo ,, ::: A B C

          Output (the order may be different):

            1
            2
            3

          The replacement string {%} can be changed with
          --slotreplace:

            parallel -j2 --slotreplace ,, echo ,, ::: A B C

          Output (the order may be different and 1 and 2 may be
          swapped):

            1
            2
            1

          Perl expression replacement string

          When predefined replacement strings are not flexible enough
          a perl expression can be used instead. One example is to
          remove two extensions: foo.tar.gz becomes foo

            parallel echo '{= s:\.[^.]+$::;s:\.[^.]+$::; =}' ::: foo.tar.gz

     Page 10                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          Output:

            foo

          In {= =} you can access all of GNU parallel's internal
          functions and variables. A few are worth mentioning.

          total_jobs() returns the total number of jobs:

            parallel echo Job {#} of {= '$_=total_jobs()' =} ::: {1..5}

          Output:

            Job 1 of 5
            Job 2 of 5
            Job 3 of 5
            Job 4 of 5
            Job 5 of 5

          Q(...) shell quotes the string:

            parallel echo {} shell quoted is {= '$_=Q($_)' =} ::: '*/!#$'

          Output:

            */!#$ shell quoted is \*/\!\#\$

          skip() skips the job:

            parallel echo {= 'if($_==3) { skip() }' =} ::: {1..5}

          Output:

            1
            2
            4
            5

          @arg contains the input source variables:

            parallel echo {= 'if($arg[1]==$arg[2]) { skip() }' =} \
              ::: {1..3} ::: {1..3}

          Output:

            1 2
            1 3
            2 1
            2 3
            3 1
            3 2

     Page 11                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          If the strings {= and =} cause problems they can be replaced
          with --parens:

            parallel --parens ,,,, echo ',, s:\.[^.]+$::;s:\.[^.]+$::; ,,' \
              ::: foo.tar.gz

          Output:

            foo

          To define a shorthand replacement string use --rpl:

            parallel --rpl '.. s:\.[^.]+$::;s:\.[^.]+$::;' echo '..' \
              ::: foo.tar.gz

          Output: Same as above.

          If the shorthand starts with { it can be used as a
          positional replacement string, too:

            parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{..}'
              ::: foo.tar.gz

          Output: Same as above.

          If the shorthand contains matching parenthesis the
          replacement string becomes a dynamic replacement string and
          the string in the parenthesis can be accessed as $$1. If
          there are multiple matching parenthesis, the matched strings
          can be accessed using $$2, $$3 and so on.

          You can think of this as giving arguments to the replacement
          string. Here we give the argument .tar.gz to the replacement
          string {%_ssss_tttt_rrrr_iiii_nnnn_gggg} which removes string:

            parallel --rpl '{%(.+?)} s/$$1$//;' echo {%.tar.gz}.zip ::: foo.tar.gz

          Output:

            foo.zip

          Here we give the two arguments tar.gz and zip to the
          replacement string {/_ssss_tttt_rrrr_iiii_nnnn_gggg_1111/_ssss_tttt_rrrr_iiii_nnnn_gggg_2222} which replaces string1
          with string2:

            parallel --rpl '{/(.+?)/(.*?)} s/$$1/$$2/;' echo {/tar.gz/zip} \
              ::: foo.tar.gz

          Output:

            foo.zip

     Page 12                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          GNU parallel's 7 replacement strings are implemented as
          this:

            --rpl '{} '
            --rpl '{#} $_=$job->seq()'
            --rpl '{%} $_=$job->slot()'
            --rpl '{/} s:.*/::'
            --rpl '{//} $Global::use{"File::Basename"} ||=
                     eval "use File::Basename; 1;"; $_ = dirname($_);'
            --rpl '{/.} s:.*/::; s:\.[^/.]+$::;'
            --rpl '{.} s:\.[^/.]+$::'

          Positional replacement strings

          With multiple input sources the argument from the individual
          input sources can be accessed with {number}:

            parallel echo {1} and {2} ::: A B ::: C D

          Output (the order may be different):

            A and C
            A and D
            B and C
            B and D

          The positional replacement strings can also be modified
          using /, //, /., and  .:

            parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/B.C D/E.F

          Output (the order may be different):

            /=B.C //=A /.=B .=A/B
            /=E.F //=D /.=E .=D/E

          If a position is negative, it will refer to the input source
          counted from behind:

            parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} \
              ::: A B ::: C D ::: E F

          Output (the order may be different):

            1=A 2=C 3=E -1=E -2=C -3=A
            1=A 2=C 3=F -1=F -2=C -3=A
            1=A 2=D 3=E -1=E -2=D -3=A
            1=A 2=D 3=F -1=F -2=D -3=A
            1=B 2=C 3=E -1=E -2=C -3=B
            1=B 2=C 3=F -1=F -2=C -3=B
            1=B 2=D 3=E -1=E -2=D -3=B
            1=B 2=D 3=F -1=F -2=D -3=B

     Page 13                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          Positional perl expression replacement string

          To use a perl expression as a positional replacement string
          simply prepend the perl expression with number and space:

            parallel echo '{=2 s:\.[^.]+$::;s:\.[^.]+$::; =} {1}' \
              ::: bar ::: foo.tar.gz

          Output:

            foo bar

          If a shorthand defined using --rpl starts with { it can be
          used as a positional replacement string, too:

            parallel --rpl '{..} s:\.[^.]+$::;s:\.[^.]+$::;' echo '{2..} {1}' \
              ::: bar ::: foo.tar.gz

          Output: Same as above.

          Input from columns

          The columns in a file can be bound to positional replacement
          strings using --colsep. Here the columns are separated by
          TAB (\t):

            parallel --colsep '\t' echo 1={1} 2={2} :::: tsv-file.tsv

          Output (the order may be different):

            1=f1 2=f2
            1=A 2=B
            1=C 2=D

          Header defined replacement strings

          With --header GNU parallel will use the first value of the
          input source as the name of the replacement string. Only the
          non-modified version {} is supported:

            parallel --header : echo f1={f1} f2={f2} ::: f1 A B ::: f2 C D

          Output (the order may be different):

            f1=A f2=C
            f1=A f2=D
            f1=B f2=C
            f1=B f2=D

          It is useful with --colsep for processing files with TAB
          separated values:

     Page 14                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            parallel --header : --colsep '\t' echo f1={f1} f2={f2} \
              :::: tsv-file.tsv

          Output (the order may be different):

            f1=A f2=B
            f1=C f2=D

          More pre-defined replacement strings with --plus

          --plus adds the replacement strings {+/} {+.} {+..} {+...}
          {..}  {...} {/..} {/...} {##}. The idea being that {+foo}
          matches the opposite of {foo} and {} = {+/}/{/} = {.}.{+.} =
          {+/}/{/.}.{+.} = {..}.{+..} = {+/}/{/..}.{+..} =
          {...}.{+...} = {+/}/{/...}.{+...}.

            parallel --plus echo {} ::: dir/sub/file.ex1.ex2.ex3
            parallel --plus echo {+/}/{/} ::: dir/sub/file.ex1.ex2.ex3
            parallel --plus echo {.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
            parallel --plus echo {+/}/{/.}.{+.} ::: dir/sub/file.ex1.ex2.ex3
            parallel --plus echo {..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
            parallel --plus echo {+/}/{/..}.{+..} ::: dir/sub/file.ex1.ex2.ex3
            parallel --plus echo {...}.{+...} ::: dir/sub/file.ex1.ex2.ex3
            parallel --plus echo {+/}/{/...}.{+...} ::: dir/sub/file.ex1.ex2.ex3

          Output:

            dir/sub/file.ex1.ex2.ex3

          {##} is simply the number of jobs:

            parallel --plus echo Job {#} of {##} ::: {1..5}

          Output:

            Job 1 of 5
            Job 2 of 5
            Job 3 of 5
            Job 4 of 5
            Job 5 of 5

          Dynamic replacement strings with --plus

          --plus also defines these dynamic replacement strings:

          {:-_ssss_tttt_rrrr_iiii_nnnn_gggg}         Default value is string if the argument
                             is empty.

          {:_nnnn_uuuu_mmmm_bbbb_eeee_rrrr}          Substring from number till end of string.

          {:_nnnn_uuuu_mmmm_bbbb_eeee_rrrr_1111:_nnnn_uuuu_mmmm_bbbb_eeee_rrrr_2222} Substring from number1 to number2.

     Page 15                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          {#_ssss_tttt_rrrr_iiii_nnnn_gggg}          If the argument starts with string,
                             remove it.

          {%_ssss_tttt_rrrr_iiii_nnnn_gggg}          If the argument ends with string, remove
                             it.

          {/_ssss_tttt_rrrr_iiii_nnnn_gggg_1111/_ssss_tttt_rrrr_iiii_nnnn_gggg_2222} Replace string1 with string2.

          {^_ssss_tttt_rrrr_iiii_nnnn_gggg}          If the argument starts with string, upper
                             case it. string must be a single letter.

          {^^_ssss_tttt_rrrr_iiii_nnnn_gggg}         If the argument contains string, upper
                             case it. string must be a single letter.

          {,_ssss_tttt_rrrr_iiii_nnnn_gggg}          If the argument starts with string, lower
                             case it. string must be a single letter.

          {,,_ssss_tttt_rrrr_iiii_nnnn_gggg}         If the argument contains string, lower
                             case it. string must be a single letter.

          They are inspired from Bash:

            unset myvar
            echo ${myvar:-myval}
            parallel --plus echo {:-myval} ::: "$myvar"

            myvar=abcAaAdef
            echo ${myvar:2}
            parallel --plus echo {:2} ::: "$myvar"

            echo ${myvar:2:3}
            parallel --plus echo {:2:3} ::: "$myvar"

            echo ${myvar#bc}
            parallel --plus echo {#bc} ::: "$myvar"
            echo ${myvar#abc}
            parallel --plus echo {#abc} ::: "$myvar"

            echo ${myvar%de}
            parallel --plus echo {%de} ::: "$myvar"
            echo ${myvar%def}
            parallel --plus echo {%def} ::: "$myvar"

            echo ${myvar/def/ghi}
            parallel --plus echo {/def/ghi} ::: "$myvar"

            echo ${myvar^a}
            parallel --plus echo {^a} ::: "$myvar"
            echo ${myvar^^a}
            parallel --plus echo {^^a} ::: "$myvar"

            myvar=AbcAaAdef

     Page 16                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            echo ${myvar,A}
            parallel --plus echo '{,A}' ::: "$myvar"
            echo ${myvar,,A}
            parallel --plus echo '{,,A}' ::: "$myvar"

          Output:

            myval
            myval
            cAaAdef
            cAaAdef
            cAa
            cAa
            abcAaAdef
            abcAaAdef
            AaAdef
            AaAdef
            abcAaAdef
            abcAaAdef
            abcAaA
            abcAaA
            abcAaAghi
            abcAaAghi
            AbcAaAdef
            AbcAaAdef
            AbcAAAdef
            AbcAAAdef
            abcAaAdef
            abcAaAdef
            abcaaadef
            abcaaadef

        More than one argument
          With --xargs GNU parallel will fit as many arguments as
          possible on a single line:

            cat num30000 | parallel --xargs echo | wc -l

          Output (if you run this under Bash on GNU/Linux):

            2

          The 30000 arguments fitted on 2 lines.

          The maximal length of a single line can be set with -s. With
          a maximal line length of 10000 chars 17 commands will be
          run:

            cat num30000 | parallel --xargs -s 10000 echo | wc -l

          Output:

     Page 17                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            17

          For better parallelism GNU parallel can distribute the
          arguments between all the parallel jobs when end of file is
          met.

          Below GNU parallel reads the last argument when generating
          the second job. When GNU parallel reads the last argument,
          it spreads all the arguments for the second job over 4 jobs
          instead, as 4 parallel jobs are requested.

          The first job will be the same as the --xargs example above,
          but the second job will be split into 4 evenly sized jobs,
          resulting in a total of 5 jobs:

            cat num30000 | parallel --jobs 4 -m echo | wc -l

          Output (if you run this under Bash on GNU/Linux):

            5

          This is even more visible when running 4 jobs with 10
          arguments. The 10 arguments are being spread over 4 jobs:

            parallel --jobs 4 -m echo ::: 1 2 3 4 5 6 7 8 9 10

          Output:

            1 2 3
            4 5 6
            7 8 9
            10

          A replacement string can be part of a word. -m will not
          repeat the context:

            parallel --jobs 4 -m echo pre-{}-post ::: A B C D E F G

          Output (the order may be different):

            pre-A B-post
            pre-C D-post
            pre-E F-post
            pre-G-post

          To repeat the context use -X which otherwise works like -m:

            parallel --jobs 4 -X echo pre-{}-post ::: A B C D E F G

          Output (the order may be different):

     Page 18                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            pre-A-post pre-B-post
            pre-C-post pre-D-post
            pre-E-post pre-F-post
            pre-G-post

          To limit the number of arguments use -N:

            parallel -N3 echo ::: A B C D E F G H

          Output (the order may be different):

            A B C
            D E F
            G H

          -N also sets the positional replacement strings:

            parallel -N3 echo 1={1} 2={2} 3={3} ::: A B C D E F G H

          Output (the order may be different):

            1=A 2=B 3=C
            1=D 2=E 3=F
            1=G 2=H 3=

          -N0 reads 1 argument but inserts none:

            parallel -N0 echo foo ::: 1 2 3

          Output:

            foo
            foo
            foo

        Quoting
          Command lines that contain special characters may need to be
          protected from the shell.

          The perl program print "@ARGV\n" basically works like echo.

            perl -e 'print "@ARGV\n"' A

          Output:

            A

          To run that in parallel the command needs to be quoted:

            parallel perl -e 'print "@ARGV\n"' ::: This wont work

          Output:

     Page 19                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            [Nothing]

          To quote the command use -q:

            parallel -q perl -e 'print "@ARGV\n"' ::: This works

          Output (the order may be different):

            This
            works

          Or you can quote the critical part using \':

            parallel perl -e \''print "@ARGV\n"'\' ::: This works, too

          Output (the order may be different):

            This
            works,
            too

          GNU parallel can also \-quote full lines. Simply run this:

            parallel --shellquote
            Warning: Input is read from the terminal. You either know what you
            Warning: are doing (in which case: YOU ARE AWESOME!) or you forgot
            Warning: ::: or :::: or to pipe data into parallel. If so
            Warning: consider going through the tutorial: man parallel_tutorial
            Warning: Press CTRL-D to exit.
            perl -e 'print "@ARGV\n"'
            [CTRL-D]

          Output:

            perl\ -e\ \'print\ \"@ARGV\\n\"\'

          This can then be used as the command:

            parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works

          Output (the order may be different):

            This
            also
            works

        Trimming space
          Space can be trimmed on the arguments using --trim:

            parallel --trim r echo pre-{}-post ::: ' A '

          Output:

     Page 20                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            pre- A-post

          To trim on the left side:

            parallel --trim l echo pre-{}-post ::: ' A '

          Output:

            pre-A -post

          To trim on the both sides:

            parallel --trim lr echo pre-{}-post ::: ' A '

          Output:

            pre-A-post

        Respecting the shell
          This tutorial uses Bash as the shell. GNU parallel respects
          which shell you are using, so in zsh you can do:

            parallel echo \={} ::: zsh bash ls

          Output:

            /usr/bin/zsh
            /bin/bash
            /bin/ls

          In csh you can do:

            parallel 'set a="{}"; if( { test -d "$a" } ) echo "$a is a dir"' ::: *

          Output:

            [somedir] is a dir

          This also becomes useful if you use GNU parallel in a shell
          script: GNU parallel will use the same shell as the shell
          script.

     Controlling the output
          The output can prefixed with the argument:

            parallel --tag echo foo-{} ::: A B C

          Output (the order may be different):

            A       foo-A
            B       foo-B
            C       foo-C

     Page 21                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          To prefix it with another string use --tagstring:

            parallel --tagstring {}-bar echo foo-{} ::: A B C

          Output (the order may be different):

            A-bar   foo-A
            B-bar   foo-B
            C-bar   foo-C

          To see what commands will be run without running them use
          --dryrun:

            parallel --dryrun echo {} ::: A B C

          Output (the order may be different):

            echo A
            echo B
            echo C

          To print the command before running them use --verbose:

            parallel --verbose echo {} ::: A B C

          Output (the order may be different):

            echo A
            echo B
            A
            echo C
            B
            C

          GNU parallel will postpone the output until the command
          completes:

            parallel -j2 'printf "%s-start\n%s" {} {};
              sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1

          Output:

            2-start
            2-middle
            2-end
            1-start
            1-middle
            1-end
            4-start
            4-middle
            4-end

     Page 22                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          To get the output immediately use --ungroup:

            parallel -j2 --ungroup 'printf "%s-start\n%s" {} {};
              sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1

          Output:

            4-start
            42-start
            2-middle
            2-end
            1-start
            1-middle
            1-end
            -middle
            4-end

          --ungroup is fast, but can cause half a line from one job to
          be mixed with half a line of another job. That has happened
          in the second line, where the line '4-middle' is mixed with
          '2-start'.

          To avoid this use --linebuffer:

            parallel -j2 --linebuffer 'printf "%s-start\n%s" {} {};
              sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1

          Output:

            4-start
            2-start
            2-middle
            2-end
            1-start
            1-middle
            1-end
            4-middle
            4-end

          To force the output in the same order as the arguments use
          --keep-order/-k:

            parallel -j2 -k 'printf "%s-start\n%s" {} {};
              sleep {};printf "%s\n" -middle;echo {}-end' ::: 4 2 1

          Output:

     Page 23                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            4-start
            4-middle
            4-end
            2-start
            2-middle
            2-end
            1-start
            1-middle
            1-end

        Saving output into files
          GNU parallel can save the output of each job into files:

            parallel --files echo ::: A B C

          Output will be similar to this:

            /tmp/pAh6uWuQCg.par
            /tmp/opjhZCzAX4.par
            /tmp/W0AT_Rph2o.par

          By default GNU parallel will cache the output in files in
          /tmp. This can be changed by setting $TMPDIR or --tmpdir:

            parallel --tmpdir /var/tmp --files echo ::: A B C

          Output will be similar to this:

            /var/tmp/N_vk7phQRc.par
            /var/tmp/7zA4Ccf3wZ.par
            /var/tmp/LIuKgF_2LP.par

          Or:

            TMPDIR=/var/tmp parallel --files echo ::: A B C

          Output: Same as above.

          The output files can be saved in a structured way using
          --results:

            parallel --results outdir echo ::: A B C

          Output:

            A
            B
            C

          These files were also generated containing the standard
          output (stdout), standard error (stderr), and the sequence
          number (seq):

     Page 24                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            outdir/1/A/seq
            outdir/1/A/stderr
            outdir/1/A/stdout
            outdir/1/B/seq
            outdir/1/B/stderr
            outdir/1/B/stdout
            outdir/1/C/seq
            outdir/1/C/stderr
            outdir/1/C/stdout

          --header : will take the first value as name and use that in
          the directory structure. This is useful if you are using
          multiple input sources:

            parallel --header : --results outdir echo ::: f1 A B ::: f2 C D

          Generated files:

            outdir/f1/A/f2/C/seq
            outdir/f1/A/f2/C/stderr
            outdir/f1/A/f2/C/stdout
            outdir/f1/A/f2/D/seq
            outdir/f1/A/f2/D/stderr
            outdir/f1/A/f2/D/stdout
            outdir/f1/B/f2/C/seq
            outdir/f1/B/f2/C/stderr
            outdir/f1/B/f2/C/stdout
            outdir/f1/B/f2/D/seq
            outdir/f1/B/f2/D/stderr
            outdir/f1/B/f2/D/stdout

          The directories are named after the variables and their
          values.

     Controlling the execution
        Number of simultaneous jobs
          The number of concurrent jobs is given with --jobs/-j:

            /usr/bin/time parallel -N0 -j64 sleep 1 :::: num128

          With 64 jobs in parallel the 128 sleeps will take 2-8
          seconds to run - depending on how fast your machine is.

          By default --jobs is the same as the number of CPU cores. So
          this:

            /usr/bin/time parallel -N0 sleep 1 :::: num128

          should take twice the time of running 2 jobs per CPU core:

            /usr/bin/time parallel -N0 --jobs 200% sleep 1 :::: num128

     Page 25                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          --jobs 0 will run as many jobs in parallel as possible:

            /usr/bin/time parallel -N0 --jobs 0 sleep 1 :::: num128

          which should take 1-7 seconds depending on how fast your
          machine is.

          --jobs can read from a file which is re-read when a job
          finishes:

            echo 50% > my_jobs
            /usr/bin/time parallel -N0 --jobs my_jobs sleep 1 :::: num128 &
            sleep 1
            echo 0 > my_jobs
            wait

          The first second only 50% of the CPU cores will run a job.
          Then 0 is put into my_jobs and then the rest of the jobs
          will be started in parallel.

          Instead of basing the percentage on the number of CPU cores
          GNU parallel can base it on the number of CPUs:

            parallel --use-cpus-instead-of-cores -N0 sleep 1 :::: num8

        Shuffle job order
          If you have many jobs (e.g. by multiple combinations of
          input sources), it can be handy to shuffle the jobs, so you
          get different values run. Use --shuf for that:

            parallel --shuf echo ::: 1 2 3 ::: a b c ::: A B C

          Output:

            All combinations but different order for each run.

        Interactivity
          GNU parallel can ask the user if a command should be run
          using --interactive:

            parallel --interactive echo ::: 1 2 3

          Output:

            echo 1 ?...y
            echo 2 ?...n
            1
            echo 3 ?...y
            3

          GNU parallel can be used to put arguments on the command
          line for an interactive command such as emacs to edit one

     Page 26                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          file at a time:

            parallel --tty emacs ::: 1 2 3

          Or give multiple argument in one go to open multiple files:

            parallel -X --tty vi ::: 1 2 3

        A terminal for every job
          Using --tmux GNU parallel can start a terminal for every job
          run:

            seq 10 20 | parallel --tmux 'echo start {}; sleep {}; echo done {}'

          This will tell you to run something similar to:

            tmux -S /tmp/tmsrPrO0 attach

          Using normal tmux keystrokes (CTRL-b n or CTRL-b p) you can
          cycle between windows of the running jobs. When a job is
          finished it will pause for 10 seconds before closing the
          window.

        Timing
          Some jobs do heavy I/O when they start. To avoid a
          thundering herd GNU parallel can delay starting new jobs.
          --delay X will make sure there is at least X seconds between
          each start:

            parallel --delay 2.5 echo Starting {}\;date ::: 1 2 3

          Output:

            Starting 1
            Thu Aug 15 16:24:33 CEST 2013
            Starting 2
            Thu Aug 15 16:24:35 CEST 2013
            Starting 3
            Thu Aug 15 16:24:38 CEST 2013

          If jobs taking more than a certain amount of time are known
          to fail, they can be stopped with --timeout. The accuracy of
          --timeout is 2 seconds:

            parallel --timeout 4.1 sleep {}\; echo {} ::: 2 4 6 8

          Output:

            2
            4

          GNU parallel can compute the median runtime for jobs and

     Page 27                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          kill those that take more than 200% of the median runtime:

            parallel --timeout 200% sleep {}\; echo {} ::: 2.1 2.2 3 7 2.3

          Output:

            2.1
            2.2
            3
            2.3

        Progress information
          Based on the runtime of completed jobs GNU parallel can
          estimate the total runtime:

            parallel --eta sleep ::: 1 3 2 2 1 3 3 2 1

          Output:

            Computers / CPU cores / Max jobs to run
            1:local / 2 / 2

            Computer:jobs running/jobs completed/%of started jobs/
              Average seconds to complete
            ETA: 2s 0left 1.11avg  local:0/9/100%/1.1s

          GNU parallel can give progress information with --progress:

            parallel --progress sleep ::: 1 3 2 2 1 3 3 2 1

          Output:

            Computers / CPU cores / Max jobs to run
            1:local / 2 / 2

            Computer:jobs running/jobs completed/%of started jobs/
              Average seconds to complete
            local:0/9/100%/1.1s

          A progress bar can be shown with --bar:

            parallel --bar sleep ::: 1 3 2 2 1 3 3 2 1

          And a graphic bar can be shown with --bar and zenity:

            seq 1000 | parallel -j10 --bar '(echo -n {};sleep 0.1)' \
              2> >(perl -pe 'BEGIN{$/="\r";$|=1};s/\r/\n/g' |
                   zenity --progress --auto-kill --auto-close)

          A logfile of the jobs completed so far can be generated with
          --joblog:

     Page 28                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            parallel --joblog /tmp/log exit  ::: 1 2 3 0
            cat /tmp/log

          Output:

            Seq Host Starttime      Runtime Send Receive Exitval Signal Command
            1   :    1376577364.974 0.008   0    0       1       0      exit 1
            2   :    1376577364.982 0.013   0    0       2       0      exit 2
            3   :    1376577364.990 0.013   0    0       3       0      exit 3
            4   :    1376577365.003 0.003   0    0       0       0      exit 0

          The log contains the job sequence, which host the job was
          run on, the start time and run time, how much data was
          transferred, the exit value, the signal that killed the job,
          and finally the command being run.

          With a joblog GNU parallel can be stopped and later pickup
          where it left off. It it important that the input of the
          completed jobs is unchanged.

            parallel --joblog /tmp/log exit  ::: 1 2 3 0
            cat /tmp/log
            parallel --resume --joblog /tmp/log exit  ::: 1 2 3 0 0 0
            cat /tmp/log

          Output:

            Seq Host Starttime      Runtime Send Receive Exitval Signal Command
            1   :    1376580069.544 0.008   0    0       1       0      exit 1
            2   :    1376580069.552 0.009   0    0       2       0      exit 2
            3   :    1376580069.560 0.012   0    0       3       0      exit 3
            4   :    1376580069.571 0.005   0    0       0       0      exit 0

            Seq Host Starttime      Runtime Send Receive Exitval Signal Command
            1   :    1376580069.544 0.008   0    0       1       0      exit 1
            2   :    1376580069.552 0.009   0    0       2       0      exit 2
            3   :    1376580069.560 0.012   0    0       3       0      exit 3
            4   :    1376580069.571 0.005   0    0       0       0      exit 0
            5   :    1376580070.028 0.009   0    0       0       0      exit 0
            6   :    1376580070.038 0.007   0    0       0       0      exit 0

          Note how the start time of the last 2 jobs is clearly
          different from the second run.

          With --resume-failed GNU parallel will re-run the jobs that
          failed:

            parallel --resume-failed --joblog /tmp/log exit  ::: 1 2 3 0 0 0
            cat /tmp/log

          Output:

     Page 29                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            Seq Host Starttime      Runtime Send Receive Exitval Signal Command
            1   :    1376580069.544 0.008   0    0       1       0      exit 1
            2   :    1376580069.552 0.009   0    0       2       0      exit 2
            3   :    1376580069.560 0.012   0    0       3       0      exit 3
            4   :    1376580069.571 0.005   0    0       0       0      exit 0
            5   :    1376580070.028 0.009   0    0       0       0      exit 0
            6   :    1376580070.038 0.007   0    0       0       0      exit 0
            1   :    1376580154.433 0.010   0    0       1       0      exit 1
            2   :    1376580154.444 0.022   0    0       2       0      exit 2
            3   :    1376580154.466 0.005   0    0       3       0      exit 3

          Note how seq 1 2 3 have been repeated because they had exit
          value different from 0.

          --retry-failed does almost the same as --resume-failed.
          Where --resume-failed reads the commands from the command
          line (and ignores the commands in the joblog),
          --retry-failed ignores the command line and reruns the
          commands mentioned in the joblog.

            parallel --retry-failed --joblog /tmp/log
            cat /tmp/log

          Output:

            Seq Host Starttime      Runtime Send Receive Exitval Signal Command
            1   :    1376580069.544 0.008   0    0       1       0      exit 1
            2   :    1376580069.552 0.009   0    0       2       0      exit 2
            3   :    1376580069.560 0.012   0    0       3       0      exit 3
            4   :    1376580069.571 0.005   0    0       0       0      exit 0
            5   :    1376580070.028 0.009   0    0       0       0      exit 0
            6   :    1376580070.038 0.007   0    0       0       0      exit 0
            1   :    1376580154.433 0.010   0    0       1       0      exit 1
            2   :    1376580154.444 0.022   0    0       2       0      exit 2
            3   :    1376580154.466 0.005   0    0       3       0      exit 3
            1   :    1376580164.633 0.010   0    0       1       0      exit 1
            2   :    1376580164.644 0.022   0    0       2       0      exit 2
            3   :    1376580164.666 0.005   0    0       3       0      exit 3

        Termination
          Unconditional termination

          By default GNU parallel will wait for all jobs to finish
          before exiting.

          If you send GNU parallel the TERM signal, GNU parallel will
          stop spawning new jobs and wait for the remaining jobs to
          finish. If you send GNU parallel the TERM signal again, GNU
          parallel will kill all running jobs and exit.

          Termination dependent on job status

     Page 30                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          For certain jobs there is no need to continue if one of the
          jobs fails and has an exit code different from 0. GNU
          parallel will stop spawning new jobs with --halt
          soon,fail=1:

            parallel -j2 --halt soon,fail=1 echo {}\; exit {} ::: 0 0 1 2 3

          Output:

            0
            0
            1
            parallel: This job failed:
            echo 1; exit 1
            parallel: Starting no more jobs. Waiting for 1 jobs to finish.
            2

          With --halt now,fail=1 the running jobs will be killed
          immediately:

            parallel -j2 --halt now,fail=1 echo {}\; exit {} ::: 0 0 1 2 3

          Output:

            0
            0
            1
            parallel: This job failed:
            echo 1; exit 1

          If --halt is given a percentage this percentage of the jobs
          must fail before GNU parallel stops spawning more jobs:

            parallel -j2 --halt soon,fail=20% echo {}\; exit {} \
              ::: 0 1 2 3 4 5 6 7 8 9

          Output:

            0
            1
            parallel: This job failed:
            echo 1; exit 1
            2
            parallel: This job failed:
            echo 2; exit 2
            parallel: Starting no more jobs. Waiting for 1 jobs to finish.
            3
            parallel: This job failed:
            echo 3; exit 3

          If you are looking for success instead of failures, you can
          use success. This will finish as soon as the first job

     Page 31                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          succeeds:

            parallel -j2 --halt now,success=1 echo {}\; exit {} ::: 1 2 3 0 4 5 6

          Output:

            1
            2
            3
            0
            parallel: This job succeeded:
            echo 0; exit 0

          GNU parallel can retry the command with --retries. This is
          useful if a command fails for unknown reasons now and then.

            parallel -k --retries 3 \
              'echo tried {} >>/tmp/runs; echo completed {}; exit {}' ::: 1 2 0
            cat /tmp/runs

          Output:

            completed 1
            completed 2
            completed 0

            tried 1
            tried 2
            tried 1
            tried 2
            tried 1
            tried 2
            tried 0

          Note how job 1 and 2 were tried 3 times, but 0 was not
          retried because it had exit code 0.

          Termination signals (advanced)

          Using --termseq you can control which signals are sent when
          killing children. Normally children will be killed by
          sending them SIGTERM, waiting 200 ms, then another SIGTERM,
          waiting 100 ms, then another SIGTERM, waiting 50 ms, then a
          SIGKILL, finally waiting 25 ms before giving up. It looks
          like this:

     Page 32                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            show_signals() {
              perl -e 'for(keys %SIG) {
                  $SIG{$_} = eval "sub { print \"Got $_\\n\"; }";
                }
                while(1){sleep 1}'
            }
            export -f show_signals
            echo | parallel --termseq TERM,200,TERM,100,TERM,50,KILL,25 \
              -u --timeout 1 show_signals

          Output:

            Got TERM
            Got TERM
            Got TERM

          Or just:

            echo | parallel -u --timeout 1 show_signals

          Output: Same as above.

          You can change this to SIGINT, SIGTERM, SIGKILL:

            echo | parallel --termseq INT,200,TERM,100,KILL,25 \
              -u --timeout 1 show_signals

          Output:

            Got INT
            Got TERM

          The SIGKILL does not show because it cannot be caught, and
          thus the child dies.

        Limiting the resources
          To avoid overloading systems GNU parallel can look at the
          system load before starting another job:

            parallel --load 100% echo load is less than {} job per cpu ::: 1

          Output:

            [when then load is less than the number of cpu cores]
            load is less than 1 job per cpu

          GNU parallel can also check if the system is swapping.

            parallel --noswap echo the system is not swapping ::: now

          Output:

     Page 33                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            [when then system is not swapping]
            the system is not swapping now

          Some jobs need a lot of memory, and should only be started
          when there is enough memory free. Using --memfree GNU
          parallel can check if there is enough memory free.
          Additionally, GNU parallel will kill off the youngest job if
          the memory free falls below 50% of the size. The killed job
          will put back on the queue and retried later.

            parallel --memfree 1G echo will run if more than 1 GB is ::: free

          GNU parallel can run the jobs with a nice value. This will
          work both locally and remotely.

            parallel --nice 17 echo this is being run with nice -n ::: 17

          Output:

            this is being run with nice -n 17

     Remote execution
          GNU parallel can run jobs on remote servers. It uses ssh to
          communicate with the remote machines.

        Sshlogin
          The most basic sshlogin is -S host:

            parallel -S $SERVER1 echo running on ::: $SERVER1

          Output:

            running on [$SERVER1]

          To use a different username prepend the server with
          username@:

            parallel -S username@$SERVER1 echo running on ::: username@$SERVER1

          Output:

            running on [username@$SERVER1]

          The special sshlogin : is the local machine:

            parallel -S : echo running on ::: the_local_machine

          Output:

            running on the_local_machine

          If ssh is not in $PATH it can be prepended to $SERVER1:

     Page 34                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            parallel -S '/usr/bin/ssh '$SERVER1 echo custom ::: ssh

          Output:

            custom ssh

          The ssh command can also be given using --ssh:

            parallel --ssh /usr/bin/ssh -S $SERVER1 echo custom ::: ssh

          or by setting $PARALLEL_SSH:

            export PARALLEL_SSH=/usr/bin/ssh
            parallel -S $SERVER1 echo custom ::: ssh

          Several servers can be given using multiple -S:

            parallel -S $SERVER1 -S $SERVER2 echo ::: running on more hosts

          Output (the order may be different):

            running
            on
            more
            hosts

          Or they can be separated by ,:

            parallel -S $SERVER1,$SERVER2 echo ::: running on more hosts

          Output: Same as above.

          Or newline:

            # This gives a \n between $SERVER1 and $SERVER2
            SERVERS="`echo $SERVER1; echo $SERVER2`"
            parallel -S "$SERVERS" echo ::: running on more hosts

          They can also be read from a file (replace user@ with the
          user on $SERVER2):

            echo $SERVER1 > nodefile
            # Force 4 cores, special ssh-command, username
            echo 4//usr/bin/ssh user@$SERVER2 >> nodefile
            parallel --sshloginfile nodefile echo ::: running on more hosts

          Output: Same as above.

          Every time a job finished, the --sshloginfile will be re-
          read, so it is possible to both add and remove hosts while
          running.

     Page 35                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          The special --sshloginfile .. reads from
          ~/.parallel/sshloginfile.

          To force GNU parallel to treat a server having a given
          number of CPU cores prepend the number of core followed by /
          to the sshlogin:

            parallel -S 4/$SERVER1 echo force {} cpus on server ::: 4

          Output:

            force 4 cpus on server

          Servers can be put into groups by prepending @groupname to
          the server and the group can then be selected by appending
          @groupname to the argument if using --hostgroup:

            parallel --hostgroup -S @grp1/$SERVER1 -S @grp2/$SERVER2 echo {} \
              ::: run_on_grp1@grp1 run_on_grp2@grp2

          Output:

            run_on_grp1
            run_on_grp2

          A host can be in multiple groups by separating the groups
          with +, and you can force GNU parallel to limit the groups
          on which the command can be run with -S @groupname:

            parallel -S @grp1 -S @grp1+grp2/$SERVER1 -S @grp2/SERVER2 echo {} \
              ::: run_on_grp1 also_grp1

          Output:

            run_on_grp1
            also_grp1

        Transferring files
          GNU parallel can transfer the files to be processed to the
          remote host. It does that using rsync.

            echo This is input_file > input_file
            parallel -S $SERVER1 --transferfile {} cat ::: input_file

          Output:

            This is input_file

          If the files are processed into another file, the resulting
          file can be transferred back:

     Page 36                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            echo This is input_file > input_file
            parallel -S $SERVER1 --transferfile {} --return {}.out \
              cat {} ">"{}.out ::: input_file
            cat input_file.out

          Output: Same as above.

          To remove the input and output file on the remote server use
          --cleanup:

            echo This is input_file > input_file
            parallel -S $SERVER1 --transferfile {} --return {}.out --cleanup \
              cat {} ">"{}.out ::: input_file
            cat input_file.out

          Output: Same as above.

          There is a shorthand for --transferfile {} --return
          --cleanup called --trc:

            echo This is input_file > input_file
            parallel -S $SERVER1 --trc {}.out cat {} ">"{}.out ::: input_file
            cat input_file.out

          Output: Same as above.

          Some jobs need a common database for all jobs. GNU parallel
          can transfer that using --basefile which will transfer the
          file before the first job:

            echo common data > common_file
            parallel --basefile common_file -S $SERVER1 \
              cat common_file\; echo {} ::: foo

          Output:

            common data
            foo

          To remove it from the remote host after the last job use
          --cleanup.

        Working dir
          The default working dir on the remote machines is the login
          dir. This can be changed with --workdir mydir.

          Files transferred using --transferfile and --return will be
          relative to mydir on remote computers, and the command will
          be executed in the dir mydir.

          The special mydir value ... will create working dirs under
          ~/.parallel/tmp on the remote computers. If --cleanup is

     Page 37                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          given these dirs will be removed.

          The special mydir value . uses the current working dir.  If
          the current working dir is beneath your home dir, the value
          . is treated as the relative path to your home dir. This
          means that if your home dir is different on remote computers
          (e.g. if your login is different) the relative path will
          still be relative to your home dir.

            parallel -S $SERVER1 pwd ::: ""
            parallel --workdir . -S $SERVER1 pwd ::: ""
            parallel --workdir ... -S $SERVER1 pwd ::: ""

          Output:

            [the login dir on $SERVER1]
            [current dir relative on $SERVER1]
            [a dir in ~/.parallel/tmp/...]

        Avoid overloading sshd
          If many jobs are started on the same server, sshd can be
          overloaded. GNU parallel can insert a delay between each job
          run on the same server:

            parallel -S $SERVER1 --sshdelay 0.2 echo ::: 1 2 3

          Output (the order may be different):

            1
            2
            3

          sshd will be less overloaded if using --controlmaster, which
          will multiplex ssh connections:

            parallel --controlmaster -S $SERVER1 echo ::: 1 2 3

          Output: Same as above.

        Ignore hosts that are down
          In clusters with many hosts a few of them are often down.
          GNU parallel can ignore those hosts. In this case the host
          173.194.32.46 is down:

            parallel --filter-hosts -S 173.194.32.46,$SERVER1 echo ::: bar

          Output:

            bar

        Running the same commands on all hosts
          GNU parallel can run the same command on all the hosts:

     Page 38                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            parallel --onall -S $SERVER1,$SERVER2 echo ::: foo bar

          Output (the order may be different):

            foo
            bar
            foo
            bar

          Often you will just want to run a single command on all
          hosts with out arguments. --nonall is a no argument --onall:

            parallel --nonall -S $SERVER1,$SERVER2 echo foo bar

          Output:

            foo bar
            foo bar

          When --tag is used with --nonall and --onall the --tagstring
          is the host:

            parallel --nonall --tag -S $SERVER1,$SERVER2 echo foo bar

          Output (the order may be different):

            $SERVER1 foo bar
            $SERVER2 foo bar

          --jobs sets the number of servers to log in to in parallel.

        Transferring environment variables and functions
          env_parallel is a shell function that transfers all aliases,
          functions, variables, and arrays. You active it by running:

            source `which env_parallel.bash`

          Replace bash with the shell you use.

          Now you can use env_parallel instead of parallel and still
          have your environment:

            alias myecho=echo
            myvar="Joe's var is"
            env_parallel -S $SERVER1 'myecho $myvar' ::: green

          Output:

            Joe's var is green

          The disadvantage is that if your environment is huge
          env_parallel will fail.

     Page 39                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          When env_parallel fails, you can still use --env to tell GNU
          parallel to transfer an environment variable to the remote
          system.

            MYVAR='foo bar'
            export MYVAR
            parallel --env MYVAR -S $SERVER1 echo '$MYVAR' ::: baz

          Output:

            foo bar baz

          This works for functions, too, if your shell is Bash:

            # This only works in Bash
            my_func() {
              echo in my_func $1
            }
            export -f my_func
            parallel --env my_func -S $SERVER1 my_func ::: baz

          Output:

            in my_func baz

          GNU parallel can copy all user defined variables and
          functions to the remote system. It just needs to record
          which ones to ignore in ~/.parallel/ignored_vars. Do that by
          running this once:

            parallel --record-env
            cat ~/.parallel/ignored_vars

          Output:

            [list of variables to ignore - including $PATH and $HOME]

          Now all other variables and functions defined will be copied
          when using --env _.

            # The function is only copied if using Bash
            my_func2() {
              echo in my_func2 $VAR $1
            }
            export -f my_func2
            VAR=foo
            export VAR

            parallel --env _ -S $SERVER1 'echo $VAR; my_func2' ::: bar

          Output:

     Page 40                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            foo
            in my_func2 foo bar

          If you use env_parallel the variables, functions, and
          aliases do not even need to be exported to be copied:

            NOT='not exported var'
            alias myecho=echo
            not_ex() {
              myecho in not_exported_func $NOT $1
            }
            env_parallel --env _ -S $SERVER1 'echo $NOT; not_ex' ::: bar

          Output:

            not exported var
            in not_exported_func not exported var bar

        Showing what is actually run
          --verbose will show the command that would be run on the
          local machine.

          When using --cat, --pipepart, or when a job is run on a
          remote machine, the command is wrapped with helper scripts.
          -vv shows all of this.

            parallel -vv --pipepart --block 1M wc :::: num30000

          Output:

            <num30000 perl -e 'while(@ARGV) { sysseek(STDIN,shift,0) || die;
            $left = shift; while($read = sysread(STDIN,$buf, ($left > 131072
            ? 131072 : $left))){ $left -= $read; syswrite(STDOUT,$buf); } }'
            0 0 0 168894 | (wc)
              30000   30000  168894

          When the command gets more complex, the output is so hard to
          read, that it is only useful for debugging:

            my_func3() {
              echo in my_func $1 > $1.out
            }
            export -f my_func3
            parallel -vv --workdir ... --nice 17 --env _ --trc {}.out \
              -S $SERVER1 my_func3 {} ::: abc-file

          Output will be similar to:

     Page 41                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            ( ssh server -- mkdir -p ./.parallel/tmp/aspire-1928520-1;rsync
            --protocol 30 -rlDzR -essh ./abc-file
            server:./.parallel/tmp/aspire-1928520-1 );ssh server -- exec perl -e
            \''@GNU_Parallel=("use","IPC::Open3;","use","MIME::Base64");
            eval"@GNU_Parallel";my$eval=decode_base64(join"",@ARGV);eval$eval;'\'
            c3lzdGVtKCJta2RpciIsIi1wIiwiLS0iLCIucGFyYWxsZWwvdG1wL2FzcGlyZS0xOTI4N
            TsgY2hkaXIgIi5wYXJhbGxlbC90bXAvYXNwaXJlLTE5Mjg1MjAtMSIgfHxwcmludChTVE
            BhcmFsbGVsOiBDYW5ub3QgY2hkaXIgdG8gLnBhcmFsbGVsL3RtcC9hc3BpcmUtMTkyODU
            iKSAmJiBleGl0IDI1NTskRU5WeyJPTERQV0QifT0iL2hvbWUvdGFuZ2UvcHJpdmF0L3Bh
            IjskRU5WeyJQQVJBTExFTF9QSUQifT0iMTkyODUyMCI7JEVOVnsiUEFSQUxMRUxfU0VRI
            0BiYXNoX2Z1bmN0aW9ucz1xdyhteV9mdW5jMyk7IGlmKCRFTlZ7IlNIRUxMIn09fi9jc2
            ByaW50IFNUREVSUiAiQ1NIL1RDU0ggRE8gTk9UIFNVUFBPUlQgbmV3bGluZXMgSU4gVkF
            TL0ZVTkNUSU9OUy4gVW5zZXQgQGJhc2hfZnVuY3Rpb25zXG4iOyBleGVjICJmYWxzZSI7
            YXNoZnVuYyA9ICJteV9mdW5jMygpIHsgIGVjaG8gaW4gbXlfZnVuYyBcJDEgPiBcJDEub
            Xhwb3J0IC1mIG15X2Z1bmMzID4vZGV2L251bGw7IjtAQVJHVj0ibXlfZnVuYzMgYWJjLW
            RzaGVsbD0iJEVOVntTSEVMTH0iOyR0bXBkaXI9Ii90bXAiOyRuaWNlPTE3O2RveyRFTlZ
            MRUxfVE1QfT0kdG1wZGlyLiIvcGFyIi5qb2luIiIsbWFweygwLi45LCJhIi4uInoiLCJB
            KVtyYW5kKDYyKV19KDEuLjUpO313aGlsZSgtZSRFTlZ7UEFSQUxMRUxfVE1QfSk7JFNJ
            fT1zdWJ7JGRvbmU9MTt9OyRwaWQ9Zm9yazt1bmxlc3MoJHBpZCl7c2V0cGdycDtldmFse
            W9yaXR5KDAsMCwkbmljZSl9O2V4ZWMkc2hlbGwsIi1jIiwoJGJhc2hmdW5jLiJAQVJHVi
            JleGVjOiQhXG4iO31kb3skcz0kczwxPzAuMDAxKyRzKjEuMDM6JHM7c2VsZWN0KHVuZGV
            mLHVuZGVmLCRzKTt9dW50aWwoJGRvbmV8fGdldHBwaWQ9PTEpO2tpbGwoU0lHSFVQLC0k
            dW5sZXNzJGRvbmU7d2FpdDtleGl0KCQ/JjEyNz8xMjgrKCQ/JjEyNyk6MSskPz4+OCk=;
            _EXIT_status=$?; mkdir -p ./.; rsync --protocol 30 --rsync-path=cd\
            ./.parallel/tmp/aspire-1928520-1/./.\;\ rsync -rlDzR -essh
            server:./abc-file.out ./.;ssh server -- \(rm\ -f\
            ./.parallel/tmp/aspire-1928520-1/abc-file\;\ sh\ -c\ \'rmdir\
            ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\ ./.parallel/\
            2\>/dev/null\'\;rm\ -rf\ ./.parallel/tmp/aspire-1928520-1\;\);ssh
            server -- \(rm\ -f\ ./.parallel/tmp/aspire-1928520-1/abc-file.out\;\
            sh\ -c\ \'rmdir\ ./.parallel/tmp/aspire-1928520-1/\ ./.parallel/tmp/\
            ./.parallel/\ 2\>/dev/null\'\;rm\ -rf\
            ./.parallel/tmp/aspire-1928520-1\;\);ssh server -- rm -rf
            .parallel/tmp/aspire-1928520-1; exit $_EXIT_status;

     Saving output to shell variables (advanced)
          GNU parset will set shell variables to the output of GNU
          parallel. GNU parset has one important limitation: It cannot
          be part of a pipe. In particular this means it cannot read
          anything from standard input (stdin) or pipe output to
          another program.

          To use GNU parset prepend command with destination
          variables:

            parset myvar1,myvar2 echo ::: a b
            echo $myvar1
            echo $myvar2

          Output:

     Page 42                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            a
            b

          If you only give a single variable, it will be treated as an
          array:

            parset myarray seq {} 5 ::: 1 2 3
            echo "${myarray[1]}"

          Output:

            2
            3
            4
            5

          The commands to run can be an array:

            cmd=("echo '<<joe  \"double  space\"  cartoon>>'" "pwd")
            parset data ::: "${cmd[@]}"
            echo "${data[0]}"
            echo "${data[1]}"

          Output:

            <<joe  "double  space"  cartoon>>
            [current dir]

     Saving to an SQL base (advanced)
          GNU parallel can save into an SQL base. Point GNU parallel
          to a table and it will put the joblog there together with
          the variables and the output each in their own column.

        CSV as SQL base
          The simplest is to use a CSV file as the storage table:

            parallel --sqlandworker csv:///%2Ftmp/log.csv \
              seq ::: 10 ::: 12 13 14
            cat /tmp/log.csv

          Note how '/' in the path must be written as %2F.

          Output will be similar to:

     Page 43                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
              Command,V1,V2,Stdout,Stderr
            1,:,1458254498.254,0.069,0,9,0,0,"seq 10 12",10,12,"10
            11
            12
            ",
            2,:,1458254498.278,0.080,0,12,0,0,"seq 10 13",10,13,"10
            11
            12
            13
            ",
            3,:,1458254498.301,0.083,0,15,0,0,"seq 10 14",10,14,"10
            11
            12
            13
            14
            ",

          A proper CSV reader (like LibreOffice or R's read.csv) will
          read this format correctly - even with fields containing
          newlines as above.

          If the output is big you may want to put it into files using
          --results:

            parallel --results outdir --sqlandworker csv:///%2Ftmp/log2.csv \
              seq ::: 10 ::: 12 13 14
            cat /tmp/log2.csv

          Output will be similar to:

            Seq,Host,Starttime,JobRuntime,Send,Receive,Exitval,_Signal,
              Command,V1,V2,Stdout,Stderr
            1,:,1458824738.287,0.029,0,9,0,0,
              "seq 10 12",10,12,outdir/1/10/2/12/stdout,outdir/1/10/2/12/stderr
            2,:,1458824738.298,0.025,0,12,0,0,
              "seq 10 13",10,13,outdir/1/10/2/13/stdout,outdir/1/10/2/13/stderr
            3,:,1458824738.309,0.026,0,15,0,0,
              "seq 10 14",10,14,outdir/1/10/2/14/stdout,outdir/1/10/2/14/stderr

        DBURL as table
          The CSV file is an example of a DBURL.

          GNU parallel uses a DBURL to address the table. A DBURL has
          this format:

            vendor://[[user][:password]@][host][:port]/[database[/table]

          Example:

     Page 44                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            mysql://scott:tiger@my.example.com/mydatabase/mytable
            postgresql://scott:tiger@pg.example.com/mydatabase/mytable
            sqlite3:///%2Ftmp%2Fmydatabase/mytable
            csv:///%2Ftmp/log.csv

          To refer to /tmp/mydatabase with sqlite or csv you need to
          encode the / as %2F.

          Run a job using sqlite on mytable in /tmp/mydatabase:

            DBURL=sqlite3:///%2Ftmp%2Fmydatabase
            DBURLTABLE=$DBURL/mytable
            parallel --sqlandworker $DBURLTABLE echo ::: foo bar ::: baz quuz

          To see the result:

            sql $DBURL 'SELECT * FROM mytable ORDER BY Seq;'

          Output will be similar to:

            Seq|Host|Starttime|JobRuntime|Send|Receive|Exitval|_Signal|
              Command|V1|V2|Stdout|Stderr
            1|:|1451619638.903|0.806||8|0|0|echo foo baz|foo|baz|foo baz
            |
            2|:|1451619639.265|1.54||9|0|0|echo foo quuz|foo|quuz|foo quuz
            |
            3|:|1451619640.378|1.43||8|0|0|echo bar baz|bar|baz|bar baz
            |
            4|:|1451619641.473|0.958||9|0|0|echo bar quuz|bar|quuz|bar quuz
            |

          The first columns are well known from --joblog. V1 and V2
          are data from the input sources. Stdout and Stderr are
          standard output and standard error, respectively.

        Using multiple workers
          Using an SQL base as storage costs overhead in the order of
          1 second per job.

          One of the situations where it makes sense is if you have
          multiple workers.

          You can then have a single master machine that submits jobs
          to the SQL base (but does not do any of the work):

            parallel --sqlmaster $DBURLTABLE echo ::: foo bar ::: baz quuz

          On the worker machines you run exactly the same command
          except you replace --sqlmaster with --sqlworker.

            parallel --sqlworker $DBURLTABLE echo ::: foo bar ::: baz quuz

     Page 45                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          To run a master and a worker on the same machine use
          --sqlandworker as shown earlier.

     --pipe
          The --pipe functionality puts GNU parallel in a different
          mode: Instead of treating the data on stdin (standard input)
          as arguments for a command to run, the data will be sent to
          stdin (standard input) of the command.

          The typical situation is:

            command_A | command_B | command_C

          where command_B is slow, and you want to speed up command_B.

        Chunk size
          By default GNU parallel will start an instance of command_B,
          read a chunk of 1 MB, and pass that to the instance. Then
          start another instance, read another chunk, and pass that to
          the second instance.

            cat num1000000 | parallel --pipe wc

          Output (the order may be different):

            165668  165668 1048571
            149797  149797 1048579
            149796  149796 1048572
            149797  149797 1048579
            149797  149797 1048579
            149796  149796 1048572
             85349   85349  597444

          The size of the chunk is not exactly 1 MB because GNU
          parallel only passes full lines - never half a line, thus
          the blocksize is only 1 MB on average. You can change the
          block size to 2 MB with --block:

            cat num1000000 | parallel --pipe --block 2M wc

          Output (the order may be different):

            315465  315465 2097150
            299593  299593 2097151
            299593  299593 2097151
             85349   85349  597444

          GNU parallel treats each line as a record. If the order of
          records is unimportant (e.g. you need all lines processed,
          but you do not care which is processed first), then you can
          use --roundrobin. Without --roundrobin GNU parallel will
          start a command per block; with --roundrobin only the

     Page 46                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          requested number of jobs will be started (--jobs). The
          records will then be distributed between the running jobs:

            cat num1000000 | parallel --pipe -j4 --roundrobin wc

          Output will be similar to:

            149797  149797 1048579
            299593  299593 2097151
            315465  315465 2097150
            235145  235145 1646016

          One of the 4 instances got a single record, 2 instances got
          2 full records each, and one instance got 1 full and 1
          partial record.

        Records
          GNU parallel sees the input as records. The default record
          is a single line.

          Using -N140000 GNU parallel will read 140000 records at a
          time:

            cat num1000000 | parallel --pipe -N140000 wc

          Output (the order may be different):

            140000  140000  868895
            140000  140000  980000
            140000  140000  980000
            140000  140000  980000
            140000  140000  980000
            140000  140000  980000
            140000  140000  980000
             20000   20000  140001

          Note how that the last job could not get the full 140000
          lines, but only 20000 lines.

          If a record is 75 lines -L can be used:

            cat num1000000 | parallel --pipe -L75 wc

          Output (the order may be different):

     Page 47                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            165600  165600 1048095
            149850  149850 1048950
            149775  149775 1048425
            149775  149775 1048425
            149850  149850 1048950
            149775  149775 1048425
             85350   85350  597450
                25      25     176

          Note how GNU parallel still reads a block of around 1 MB;
          but instead of passing full lines to wc it passes full 75
          lines at a time. This of course does not hold for the last
          job (which in this case got 25 lines).

        Fixed length records
          Fixed length records can be processed by setting --recend ''
          and --block _rrrr_eeee_cccc_oooo_rrrr_dddd_ssss_iiii_zzzz_eeee. A header of size n can be processed
          with --header .{_nnnn}.

          Here is how to process a file with a 4-byte header and a
          3-byte record size:

            cat fixedlen | parallel --pipe --header .{4} --block 3 --recend '' \
              'echo start; cat; echo'

          Output:

            start
            HHHHAAA
            start
            HHHHCCC
            start
            HHHHBBB

          It may be more efficient to increase --block to a multiplum
          of the record size.

        Record separators
          GNU parallel uses separators to determine where two records
          split.

          --recstart gives the string that starts a record; --recend
          gives the string that ends a record. The default is --recend
          '\n' (newline).

          If both --recend and --recstart are given, then the record
          will only split if the recend string is immediately followed
          by the recstart string.

          Here the --recend is set to ', ':

     Page 48                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            echo /foo, bar/, /baz, qux/, | \
              parallel -kN1 --recend ', ' --pipe echo JOB{#}\;cat\;echo END

          Output:

            JOB1
            /foo, END
            JOB2
            bar/, END
            JOB3
            /baz, END
            JOB4
            qux/,
            END

          Here the --recstart is set to /:

            echo /foo, bar/, /baz, qux/, | \
              parallel -kN1 --recstart / --pipe echo JOB{#}\;cat\;echo END

          Output:

            JOB1
            /foo, barEND
            JOB2
            /, END
            JOB3
            /baz, quxEND
            JOB4
            /,
            END

          Here both --recend and --recstart are set:

            echo /foo, bar/, /baz, qux/, | \
              parallel -kN1 --recend ', ' --recstart / --pipe \
              echo JOB{#}\;cat\;echo END

          Output:

            JOB1
            /foo, bar/, END
            JOB2
            /baz, qux/,
            END

          Note the difference between setting one string and setting
          both strings.

          With --regexp the --recend and --recstart will be treated as
          a regular expression:

     Page 49                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            echo foo,bar,_baz,__qux, | \
              parallel -kN1 --regexp --recend ,_+ --pipe \
              echo JOB{#}\;cat\;echo END

          Output:

            JOB1
            foo,bar,_END
            JOB2
            baz,__END
            JOB3
            qux,
            END

          GNU parallel can remove the record separators with
          --remove-rec-sep/--rrs:

            echo foo,bar,_baz,__qux, | \
              parallel -kN1 --rrs --regexp --recend ,_+ --pipe \
              echo JOB{#}\;cat\;echo END

          Output:

            JOB1
            foo,barEND
            JOB2
            bazEND
            JOB3
            qux,
            END

        Header
          If the input data has a header, the header can be repeated
          for each job by matching the header with --header. If
          headers start with % you can do this:

            cat num_%header | \
              parallel --header '(%.*\n)*' --pipe -N3 echo JOB{#}\;cat

          Output (the order may be different):

            JOB1
            %head1
            %head2
            1
            2
            3
            JOB2
            %head1
            %head2
            4
            5

     Page 50                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            6
            JOB3
            %head1
            %head2
            7
            8
            9
            JOB4
            %head1
            %head2
            10

          If the header is 2 lines, --header 2 will work:

            cat num_%header | parallel --header 2 --pipe -N3 echo JOB{#}\;cat

          Output: Same as above.

        --pipepart
          --pipe is not very efficient. It maxes out at around 500
          MB/s. --pipepart can easily deliver 5 GB/s. But there are a
          few limitations. The input has to be a normal file (not a
          pipe) given by -a or :::: and -L/-l/-N do not work. --recend
          and --recstart, however, do work, and records can often be
          split on that alone.

            parallel --pipepart -a num1000000 --block 3m wc

          Output (the order may be different):

           444443  444444 3000002
           428572  428572 3000004
           126985  126984  888890

     Shebang
        Input data and parallel command in the same file
          GNU parallel is often called as this:

            cat input_file | parallel command

          With --shebang the input_file and parallel can be combined
          into the same script.

          UNIX shell scripts start with a shebang line like this:

            #!/bin/bash

          GNU parallel can do that, too. With --shebang the arguments
          can be listed in the file. The parallel command is the first
          line of the script:

            #!/usr/bin/parallel --shebang -r echo

     Page 51                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            foo
            bar
            baz

          Output (the order may be different):

            foo
            bar
            baz

        Parallelizing existing scripts
          GNU parallel is often called as this:

            cat input_file | parallel command
            parallel command ::: foo bar

          If command is a script, parallel can be combined into a
          single file so this will run the script in parallel:

            cat input_file | command
            command foo bar

          This perl script perl_echo works like echo:

            #!/usr/bin/perl

            print "@ARGV\n"

          It can be called as this:

            parallel perl_echo ::: foo bar

          By changing the #!-line it can be run in parallel:

            #!/usr/bin/parallel --shebang-wrap /usr/bin/perl

            print "@ARGV\n"

          Thus this will work:

            perl_echo foo bar

          Output (the order may be different):

            foo
            bar

          This technique can be used for:

          Perl:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/perl

     Page 52                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

                     print "Arguments @ARGV\n";

          Python:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/python

                     import sys
                     print 'Arguments', str(sys.argv)

          Bash/sh/zsh/Korn shell:
                     #!/usr/bin/parallel --shebang-wrap /bin/bash

                     echo Arguments "$@"

          csh:
                     #!/usr/bin/parallel --shebang-wrap /bin/csh

                     echo Arguments "$argv"

          Tcl:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/tclsh

                     puts "Arguments $argv"

          R:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave

                     args <- commandArgs(trailingOnly = TRUE)
                     print(paste("Arguments ",args))

          GNUplot:
                     #!/usr/bin/parallel --shebang-wrap ARG={} /usr/bin/gnuplot

                     print "Arguments ", system('echo $ARG')

          Ruby:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/ruby

                     print "Arguments "
                     puts ARGV

          Octave:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/octave

                     printf ("Arguments");
                     arg_list = argv ();
                     for i = 1:nargin
                       printf (" %s", arg_list{i});
                     endfor
                     printf ("\n");

          Common LISP:

     Page 53                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

                     #!/usr/bin/parallel --shebang-wrap /usr/bin/clisp

                     (format t "~&~S~&" 'Arguments)
                     (format t "~&~S~&" *args*)

          PHP:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/php
                     <?php
                     echo "Arguments";
                     foreach(array_slice($argv,1) as $v)
                     {
                       echo " $v";
                     }
                     echo "\n";
                     ?>

          Node.js:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/node

                     var myArgs = process.argv.slice(2);
                     console.log('Arguments ', myArgs);

          LUA:
                     #!/usr/bin/parallel --shebang-wrap /usr/bin/lua

                     io.write "Arguments"
                     for a = 1, #arg do
                       io.write(" ")
                       io.write(arg[a])
                     end
                     print("")

          C#:
                     #!/usr/bin/parallel --shebang-wrap ARGV={} /usr/bin/csharp

                     var argv = Environment.GetEnvironmentVariable("ARGV");
                     print("Arguments "+argv);

     Semaphore
          GNU parallel can work as a counting semaphore. This is
          slower and less efficient than its normal mode.

          A counting semaphore is like a row of toilets. People
          needing a toilet can use any toilet, but if there are more
          people than toilets, they will have to wait for one of the
          toilets to become available.

          An alias for parallel --semaphore is sem.

          sem will follow a person to the toilets, wait until a toilet
          is available, leave the person in the toilet and exit.

     Page 54                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          sem --fg will follow a person to the toilets, wait until a
          toilet is available, stay with the person in the toilet and
          exit when the person exits.

          sem --wait will wait for all persons to leave the toilets.

          sem does not have a queue discipline, so the next person is
          chosen randomly.

          -j sets the number of toilets.

        Mutex
          The default is to have only one toilet (this is called a
          mutex). The program is started in the background and sem
          exits immediately. Use --wait to wait for all sems to
          finish:

            sem 'sleep 1; echo The first finished' &&
              echo The first is now running in the background &&
              sem 'sleep 1; echo The second finished' &&
              echo The second is now running in the background
            sem --wait

          Output:

            The first is now running in the background
            The first finished
            The second is now running in the background
            The second finished

          The command can be run in the foreground with --fg, which
          will only exit when the command completes:

            sem --fg 'sleep 1; echo The first finished' &&
              echo The first finished running in the foreground &&
              sem --fg 'sleep 1; echo The second finished' &&
              echo The second finished running in the foreground
            sem --wait

          The difference between this and just running the command, is
          that a mutex is set, so if other sems were running in the
          background only one would run at a time.

          To control which semaphore is used, use
          --semaphorename/--id. Run this in one terminal:

            sem --id my_id -u 'echo First started; sleep 10; echo First done'

          and simultaneously this in another terminal:

            sem --id my_id -u 'echo Second started; sleep 10; echo Second done'

     Page 55                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          Note how the second will only be started when the first has
          finished.

        Counting semaphore
          A mutex is like having a single toilet: When it is in use
          everyone else will have to wait. A counting semaphore is
          like having multiple toilets: Several people can use the
          toilets, but when they all are in use, everyone else will
          have to wait.

          sem can emulate a counting semaphore. Use --jobs to set the
          number of toilets like this:

            sem --jobs 3 --id my_id -u 'echo Start 1; sleep 5; echo 1 done' &&
            sem --jobs 3 --id my_id -u 'echo Start 2; sleep 6; echo 2 done' &&
            sem --jobs 3 --id my_id -u 'echo Start 3; sleep 7; echo 3 done' &&
            sem --jobs 3 --id my_id -u 'echo Start 4; sleep 8; echo 4 done' &&
            sem --wait --id my_id

          Output:

            Start 1
            Start 2
            Start 3
            1 done
            Start 4
            2 done
            3 done
            4 done

        Timeout
          With --semaphoretimeout you can force running the command
          anyway after a period (positive number) or give up (negative
          number):

            sem --id foo -u 'echo Slow started; sleep 5; echo Slow ended' &&
            sem --id foo --semaphoretimeout 1 'echo Forced running after 1 sec' &&
            sem --id foo --semaphoretimeout -2 'echo Give up after 2 secs'
            sem --id foo --wait

          Output:

            Slow started
            parallel: Warning: Semaphore timed out. Stealing the semaphore.
            Forced running after 1 sec
            parallel: Warning: Semaphore timed out. Exiting.
            Slow ended

          Note how the 'Give up' was not run.

     Informational
          GNU parallel has some options to give short information

     Page 56                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          about the configuration.

          --help will print a summary of the most important options:

            parallel --help

          Output:

            Usage:

            parallel [options] [command [arguments]] < list_of_arguments
            parallel [options] [command [arguments]] (::: arguments|:::: argfile(s))...
            cat ... | parallel --pipe [options] [command [arguments]]

            -j n            Run n jobs in parallel
            -k              Keep same order
            -X              Multiple arguments with context replace
            --colsep regexp Split input on regexp for positional replacements
            {} {.} {/} {/.} {#} {%} {= perl code =} Replacement strings
            {3} {3.} {3/} {3/.} {=3 perl code =}    Positional replacement strings
            With --plus:    {} = {+/}/{/} = {.}.{+.} = {+/}/{/.}.{+.} = {..}.{+..} =
                            {+/}/{/..}.{+..} = {...}.{+...} = {+/}/{/...}.{+...}

            -S sshlogin     Example: foo@server.example.com
            --slf ..        Use ~/.parallel/sshloginfile as the list of sshlogins
            --trc {}.bar    Shorthand for --transfer --return {}.bar --cleanup
            --onall         Run the given command with argument on all sshlogins
            --nonall        Run the given command with no arguments on all sshlogins

            --pipe          Split stdin (standard input) to multiple jobs.
            --recend str    Record end separator for --pipe.
            --recstart str  Record start separator for --pipe.

            See 'man parallel' for details

            Academic tradition requires you to cite works you base your article on.
            When using programs that use GNU Parallel to process data for publication
            please cite:

              O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
              ;login: The USENIX Magazine, February 2011:42-47.

          When asking for help, always report the full output of this:

            parallel --version

          Output:

     Page 57                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            GNU parallel 20210122
            Copyright (C) 2007-2021 Ole Tange, http://ole.tange.dk and Free Software
            Foundation, Inc.
            License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
            This is free software: you are free to change and redistribute it.
            GNU parallel comes with no warranty.

            Web site: https://www.gnu.org/software/parallel

            When using programs that use GNU Parallel to process data for publication
            please cite as described in the manpage.

          In scripts --minversion can be used to ensure the user has
          at least this version:

            parallel --minversion 20130722 && \
              echo Your version is at least 20130722.

          Output:

            20160322
            Your version is at least 20130722.

          With --max-line-length-allowed GNU parallel will report the
          maximal size of the command line:

            parallel --max-line-length-allowed

          Output (may vary on different systems):

            131071

          --number-of-cpus and --number-of-cores run system specific
          code to determine the number of CPUs and CPU cores on the
          system. On unsupported platforms they will return 1:

            parallel --number-of-cpus
            parallel --number-of-cores

          Output (may vary on different systems):

            4
            64

     Profiles
          The defaults for GNU parallel can be changed systemwide by
          putting the command line options in /etc/parallel/config.
          They can be changed for a user by putting them in
          ~/.parallel/config.

          Profiles work the same way, but have to be referred to with
          --profile:

     Page 58                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

            echo '--nice 17' > ~/.parallel/nicetimeout
            echo '--timeout 300%' >> ~/.parallel/nicetimeout
            parallel --profile nicetimeout echo ::: A B C

          Output:

            A
            B
            C

          Profiles can be combined:

            echo '-vv --dry-run' > ~/.parallel/dryverbose
            parallel --profile dryverbose --profile nicetimeout echo ::: A B C

          Output:

            echo A
            echo B
            echo C

     Spread the word
          I hope you have learned something from this tutorial.

          If you like GNU parallel:

          +o (Re-)walk through the tutorial if you have not done so in
            the past year
            (https://www.gnu.org/software/parallel/parallel_tutorial.html)

          +o Give a demo at your local user group/your team/your
            colleagues

          +o Post the intro videos and the tutorial on Reddit,
            Mastodon, Diaspora*, forums, blogs, Identi.ca, Google+,
            Twitter, Facebook, Linkedin, and mailing lists

          +o Request or write a review for your favourite blog or
            magazine (especially if you do something cool with GNU
            parallel)

          +o Invite me for your next conference

          If you use GNU parallel for research:

          +o Please cite GNU parallel in you publications

          If GNU parallel saves you money:

          +o (Have your company) donate to FSF or become a member
            https://my.fsf.org/donate/

     Page 59                     20210822            (printed 5/21/22)

     PARALLEL_TUTORIAL(7)      (2021-08-28)       PARALLEL_TUTORIAL(7)

          (C) 2013-2021 Ole Tange, GFDLv1.3+ (See
          LICENSES/GFDL-1.3-or-later.txt)

     Page 60                     20210822            (printed 5/21/22)