Extract only rows with duplicated strings in tab-delimited table I have a long list of data with 10 tab delimited columns. First two columns are the IDs. I would like to retrieve rows of selected IDs. I started with ...--prophetes.ai

Extract only rows with duplicated strings in tab-delimited table I have a long list of data with 10 tab delimited columns. First two columns are the IDs. I would like to retrieve rows of selected IDs. I started with renaming the selected IDs, so that each of them prepended with `comp-`. Then I tried to extract the rows with selected IDs present in both column 1 and 2. file: comp-AA11232.1 GR55896.1 AB55887.1 comp-FR87559.1 comp-AC11232.1 comp-AE55888.1 comp-AC66742.1 comp-AD87559.1 Desired output: comp-AC11232.1 comp-AE55888.1 comp-AC66742.1 comp-AD87559.1 I was using `sed -n '/comp\-.\tcomp\-./p' file`. The output files were all those that met criteria, but unfortunately some of the rows with same criteria missed out in the output files. Not sure what is happening here. Any idea? Or is there any better approach with grep/awk/sed in this case?

awk -F'\t' '$1 ~/^comp-/ && $2 ~/^comp-/' infile

same but pass the pattern from a parameter:

awk -F'\t' -v pat='comp-' '$1 ~"^" pat && $2 ~"^" pat' infile

or compare as string match and still pass from a parameter:

awk -F'\t' -v str='comp-' 'index($1, str)==1 && index($2, str)==1' infile

see also How do I find the text that matches a pattern? for other matching options.