Artificial intelligent assistant

grep and print how many times my pattern in file 1 is present in file2 I have a file1 (list of my pattern) like this: file1 Fatty_acid_degradation Aminobenzoate_degradation Amino_sugar_and_nucleotide_sugar_metabolism Amoebiasis and I have a file2 (list of all the patterns). file2 Fatty_acid_degradation Fatty_acid_degradation Fatty_acid_degradation Bacterial_invasion_of_epithelial_cells Bacterial_invasion_of_epithelial_cells Bacterial_invasion_of_epithelial_cells Bacterial_invasion_of_epithelial_cells I would like to grep and count how many times each of my patterns in file1 is present in file2 and obtain a table (tab separated) like this: Fatty_acid_degradation 3

The simplest approach would be to `grep` each of the patterns and then count them:


$ grep -Fwf file1 file2 | sort | uniq -c
3 Fatty_acid_degradation


The `grep` options are `-f` to give a file as a list of patterns to search for, `-F` to specify that the pattern should be treated as a string and not a regular expression and `-w` to ensure that the pattern is matched only against entire words (so that `regulation_of_expression` is not matched against `upregulation_of_excpression` for example).

Then, you can use whatever tool you prefer to change the format:


$ grep -Fwf file1 file2 | sort | uniq -c | sed -r 's/.*([0-9]+) *(.*)/\2\t\1/'
$ grep -Fwf file1 file2 | sort | uniq -c | perl -lane 'print "$F[1]\t$F[0]"'
$ grep -Fwf file1 file2 | sort | uniq -c | awk -vOFS="\t" '{print $2,$1}'


All of the above return


Fatty_acid_degradation 3

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 5c83244dcdae1c924d10d8f71f3d3595