Shell Script - compress one file at a time inside the Current Folder

Asked

Viewed 1,827 times

1

It is the following personal ... I will explain in the least details, in a simple way to illustrate the idea.

Example

for N in `ls $HOME`
do
    tar zcvf `basename`.tgz $N
done

See that I want to compact The source directory along with one file at a time being like this:

/home/user/0.tgz

/home/user/1.tgz

/home/user/2.tgz

etc. ...

For this to occur, you need to compress the file followed by its source path

  • 1

    The object of a file tar is to be a collection of files, then apply the gzip compression on top and turn into a .tar.gz, or tgz by short. The ideal for your case would be to run gzip only over the files

  • @Jeffersonquesado If you can help me with a practical example with gz or zip, thank you already

  • I’m not gonna give you a full answer, but try for f in ./*; do gzip "$f"; done

  • Gzip reference I used https://www.lifewire.com/example-uses-of-the-linux-gzip-command-4078675

  • 1

    @Jeffersonquesado All right! Alias I’m trying for 1 hour. rsrs

  • shell script is just that. One day it doesn’t trip us up. Check me here when you have the result of the suggestion I gave

  • @Jeffersonquesado Almost there! With this attempt - for N in ls $HOME/*; gzip -f "$N"; done. What is missing is to compress the folder with its file included thus doing successively for each particular file. But I am grateful for your comment. D

  • 1

    Subpast too? That’s work for the find pasta/... today I need to go, hope to have helped

  • 1

    @Jeffersonquesado So Jefferson Quesado, as requested [...]You mark me here when you have the result of the suggestion I gave. I am warning you that I got what I wanted. Hugs.

Show 4 more comments

3 answers

3


I think the biggest problem here is the file/directory with white space. The command below searches for any file/directory from the current directory and executes tar with the desired parameters, treating blanks:

find . -name \* -print0 -exec tar -zcvf '{}'.tgz '{}' \;
  • Wow! I will test your tip. Although I concluded today a solution somewhat similar to yours, but uses 3 lines. I was already preparing to post. But in any case I will make available here for alternative and programming logic purposes.

  • Perfect! Accept my upVote followed by my absolute vote. vlw!

  • It’s always good to be able to help! Thanks, [ ].

  • 1

    Take a look at what I had done before, put the script I wrote in the reply to add value to the post and logical contribute to the community experience.

  • I saw it now. I think there’s an error in find /* . It wouldn’t be find $HOME ?

  • But try to run this way as you mentioned [...]I think there is an error in find /* . Wouldn’t find $HOME ? - You will see that this combination does not comply with the purpose. Hence the use of the command ls $HOME |, this implemented.

Show 1 more comment

1

It follows an alternative, which had been concluded before the Member’s reply cemdorst. I put here only to serve as a study can be adapted for various purposes:

 ls $HOME | while read LINE; do find /* -name "$LINE"; done | while read N; do tar -zcf ${N##*/}.tgz "$N" &>/dev/null; done

Explanation

"$LINE" - Note the double quotes surrounding the variable LINE, and this in turn is receiving the output stream from the command ls where I am pointing to the file(s) and/or directory(s). If in this(s) file(s) and directory(s) had blank characters (blank space), then double quotes "" will treat so that the shell does not misunderstand what we want.

Example:

  • /home/user/Downloads/Music/Mais_que_palavras_-_More_than_words_(Brazilian Version). ogg

then the name of the package would be like this ...

  • Mais_que_palavras_-_More_than_words_(Version

Because this, it occurred???

Because there was a blank space at this point -> (Brazilian version)

That is, the white space has ended our joy (minus the job)! The package was still created successfully, to correct this would be necessary the -print0 as our dear colleague did cemdorst, in his reply.

${N##*/} - Already at this other point, I’m just doing a little Expansion of the Shell Parameter so that the name of the package is the same as the name of the file in question, that is, the name of the file that comes after the last bar Slash /.

Example:

  • /home/user/Downloads/Music/Mais_que_palavras_-_More_than_words_(Brazilian version). ogg

then the name of the package would be like this ...

  • Mais_que_palavras_-_More_than_words_(Brazilian version). tgz

Note that the use of this expression serves for this, capture the file name after the last backslash.

  • shell script: no matter how long you know it, no matter how many courses you have attended or taught: it will always have one more detail that is new to you

  • 1

    @Jeffersonquesado So the existence of several languages like Perl,Tcl among others that makes use of shell along with extensive use of command it is impossible to store everything in our memory failure. And on the other hand know them completely. Already I got confused with the function of Bash with that of the Javascript for being somewhat identical. I was developing a web page, implemented a function in Javascript with resources from Bash. kkkkk

1

Let’s talk about Unix

The current directory is also known as Working directory.

In the Unix world, there is the command pwd which means print Working directory. Basically is a built-in available in almost all available shells. We also have the variable $PWD which is basically a variable with this value. This Unix.SE response deals with more nuances.

In addition, it also has a relative identifier of the current directory: ./. All paths that start with ./ are related to the current directory; for example, ./a.out is the file a.out located in the current directory, while ./foo/bar is the file bar inside the briefcase foo which is in the current directory. It is also possible to leave the implicit understanding of the path of a file when withdrawing ./: a.out and foo/bar of the above examples.

When compiling a C file with GCC, it produces the file a.out. To run this file, one usually speaks to by ./a.out, but do not speak the reason. Why not simply by a.out?, is already in the current directory? Because the first shell command needs to be:

  1. A built-in
  2. A function
  3. The complete path (either relative or absolute) of an executable
  4. The name of an executable that is in $PATH

The variable $PATH is related to other things, but it is possible to put . in it.

Shell expansion

An expansion in the Shell Script context is to change one text by another. For example, echo $PWD will not simply make an impression of the current path. Here will happen a variable expansion, where the text $PWD will be replaced by the contents of the variable $PWD (/home/jeff for example) and will pass this result to the command built-in echo, as if the user had written echo /home/jeff.

Another widely used expansion is expansion glob. You use this expansion when you use ls foo/*.tar.* or rm *.log. This expansion is very useful for listing sets of files that follow a pattern. But, by default, this is not a regular expression thing?

Well, yes. Regular expressions recognize text patterns. Expressions glob, in turn, recognize patterns in file paths.

A quick list of what will be recognized in expansions glob:

  • * : anything of any size; equivalent to .* of regular expressions; Obs: ignores files in other directories, takes into account only the directory from which the * on the path to be recognized
  • ? : any single character equivalent to . of regular expressions; Obs: this any character cannot be a delimiter if directory
  • [abc] : is identical to the list of regular expressions; Obs: as in others, cannot represent directory change

For example, if we want to add the files to the git cast pom.xml that are within directories any of the directories that start with java:

git add java*/*/pom.xml

Imagine that this is the folder structure:

jeff/
javado/
  -> jsp/
    -> pom.xml
  -> escovado/
javali/
  -> namesa/
    -> pom.xml
  -> janta/
    -> pom.xml
  -> muito/
javax/

Slowly explaining:

  • git add : command to add files to the git cast
  • java*/ : all directories, from the current directory, starting with java, as javado, javali, javax
  • java*/*/ : all subdirectories that have as parent directory the directory that works on match described above; example: javado/jsp, javado/escovado, javali/namesa, javali/janta, javali/muito; in this example, javax/ has no subdirectories
  • java*/*/pom.xml : the files pom.xml which are in the directory of match previous; example: javado/jsp/pom.xml, javali/namesa/pom.xml and javali/janta/pom.xml

That said above, the shell will interpret the command line as

git add javado/jsp/pom.xml javali/namesa/pom.xml javali/janta/pom.xml

and this will be the executed command.

Another very common shell expansion is command expansion. In short, the expansion will execute an internal command and take the output of this program and replace it on the processing line. This expansion is indicated by the presence of tick oi de$(dólar parêntese; nesse último caso, é necessário ter também um)`

Let’s go back to the example above. Let’s assume that the pom.xml are the only files .xml of the entire project.

To fetch all files ending in .xml, I can use the command find that way:

find ./ -name '*.xml'

Gradually:

  • find : program name
  • ./ : the search directory of find will be the current directory
  • -name : an argument from the command find, indicates that the next argument will be the name glob-like of the desired file
  • '*.xml' : first of all, this is the argument of the argument -name; secondly, by using simple quotes, I avoided any and all expansion; third, such as find is recursive, *.xml will positively mark any file .xml no matter how deep this file is

The exit from that command would be:

javado/jsp/pom.xml javali/janta/pom.xml javali/namesa/pom.xml

From there, we can use command expansion for git.

git add `find ./ -name '*.xml'`

After expansion, the command is:

git add javado/jsp/pom.xml javali/janta/pom.xml javali/namesa/pom.xml

The command find allows many more actions than just printing files that fit into given expressions; see man page command on Linux Die, I’m just taking advantage of his pattern that prints the files found in this case

Compression / Aggregation of files

There are two common tasks with file management: compression and aggregation. But what are each of them?

  • aggregation: the aggregation of files puts multiple files in a single corner, so that you don’t need to send each file individually; the format tar aggregates files in an uncompressed way, whereas the format zip aggregates compressed files (each file can even have its own compression)
  • compression: compression is about decreasing the number of bytes of a file; the format gz is a very standard compression in the Unix world, bz2 is also a popular compression format; the inputs of a zip are already compressed, so it is difficult to compress the file zip all the more

It is worth noting that tar.gz is the compaction gzip applied on an aggregator file tar; the extent tgz is a reduced way of writing tar.gz. The extent tar.bz2 is analogous to tar.gz for compaction bz2.

The command tar serves to create aggregated files tar, as well as manipulate them as well. To create a file tar, we can use the following command:

tar -cf agregado.tar arquivoA arquivoB

Gradually:

  • tar : command to create/manipulate aggregate files
  • -cf : equivalent to -c -f, in that order; it is very common for short command line options to be aggregated preceded by a single dash -
  • -c : flag to indicate that a new file will be created
  • -f : flag that requires argument, indicates which file will be manipulated (in this example, the file will be created)
  • agregado.tar : the argument of the flag -f, is the file name being manipulated
  • arquivoA arquivoB : all other arguments other than flag arguments are the paths of the files/directories that will be aggregated; in this case only two files, arquivoA and arquivoB

At the beginning, the command tar only served for the purpose of manipulating files tar, but then due to the popularity of the files tar compressed, it has started to have compression options as well. The following example creates the tar.gz from the previous example in a single command:

tar -czf compact.tar.gz arquivoA arquivoB

New here only the flag -z, indicating that compaction should be used gzip about the tar generated.

Compressing all files from a directory individually

As each file will be individually compressed, we don’t need to aggregate them. So just call the compactor, gzip. Knowing which command will be called, we now need to know how we will iterate over files.

I will put the command to be executed in a function, compactfile, so another compression command can be used.

Creation of compactfile

A priori, we just need to call the command gzip with the desired file.

compactfile() {
    arquivo_original="$1"

    gzip "$arquivo_original"
}

And if I want one dry run? Well, we could emulate a dry run thus:

compactfile_dryrun() {
    arquivo_original="$1"

    echo gzip "$arquivo_original"
}

Well, we can try to unify these two commands into one... let’s delegate to the next moment how is the detection made if it is dry run, for now it will be hard coded whether it is or not dry run:

compactfile() {
    dryrun=true # se não for dry run, só por false aqui
    base_cmd='gzip'

    arquivo_original="$1"

    cmd_args="$base_cmd \"$arquivo_original\""

    $dryrun && cmd_final="echo $cmd_args" || cmd_final="$cmd_args"

    $cmd_final
}

Well, now about the detection of dry run... I normally see using two alternatives for this: -n and --dry-run. Let’s put this detection with a case?

compactfile() {
    dryrun=false

    case "$1" in
         -n|--dry-run)
             dryrun=true # detectei que quero fazer uma dry run
             shift # remove o primeiro argumento e puxa os seguintes em uma posição
             ;;
    esac

    base_cmd='gzip'

    arquivo_original="$1"

    cmd_args="$base_cmd \"$arquivo_original\""

    $dryrun && cmd_final="echo $cmd_args" || cmd_final="$cmd_args"

    $cmd_final
}

Iteration for, list with glob

Iterating over a pure glob has the disadvantage of not recursively accessing the files. We can only catch those who are at the level of the current directory (I will do dry run only to show what the effects will be each point):

compactdir=.
for arq in $compactdir/*; do
    if [ -f "$arq" ]; then
        compactfile -n "$arq"
    fi
done

Well, now I’ve shown two new shell structures:

  • if : syntactic structure of the form if CMD; then CMD_IF; else CMD_ELSE; fi; part of else is optional; if CMD return true (return 0), executes the command block CMD_IF; if false (return other than 0) and exists else, executes the command block CMD_ELSE;
  • commando test conditions, or syntax [ conditions ] : the syntactic structure that begins with [ has the same effect as the command test, only requires it to be closed with a ]; in the case, -f is a unary operator that checks for the existence of a common file (directories do not serve), returned true if it exists and false otherwise; you can read more about the command test in the documentation

Note about the syntax [ conditions ]:
The spaces are MANDATORY, otherwise the shell will understand something else

Iteration for, using command expansion find

compactdir=.
for arq in `find $compactdir/`; do
    if [ -f "$arq" ]; then
        compactfile -n "$arq"
    fi
done

Iterating on own find

The @cemdorst put this in his answer. I must admit I didn’t know this until then. Adapting what he used with the custom compression function compactfile, we have the code to follow.

I must also admit that it took me a long time to answer

compactdir=.
find $compactdir -print0 -exec compactfile -n '{}' \;

Browser other questions tagged

You are not signed in. Login or sign up in order to post.