Finding and deleting duplicated records I want to find the duplicated record in which same characters are duplicated For example, the pattern that I want to find is 'AA' or 'AAAAA' I try to use the grep command to fin...--prophetes.ai

Finding and deleting duplicated records I want to find the duplicated record in which same characters are duplicated For example, the pattern that I want to find is 'AA' or 'AAAAA' I try to use the grep command to find it. But it doesn't work well Here is the example that I tried, ATCTAGCGATCGATAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG TATCTATCTATCTATCTCATACTTCGCATCGCTAGCTCGACTGCATAGGACTAGCATAAAAAGCATCAGCTACCGCCTCAGCATCGACTACGATACG TAGTCGATCGACAGCTACGCATGCATCCGACTACGATCGACTAGCTAGCGCTAGACTACGTACCGATAAGCACTACGTCAGCCTAGACTCACGACT GATCGATCGATCGACTACGCAGCTACGAGATCGATCGATCGATCGTAGCTAGCTCATACTACACACGCATATACGTGTCGATgctAGTAACTACAT ACGCTAGCTAGCTACGATCAATCGAGCTATCGATCAGCTACGATCTAGAGATCGATCGATGCTGATAGCTACGATCagcactgatGCATCGCTGAT

The question is somewhat unclear. Assuming you want to find all substrings comprised of same nucleic acid 2 or more times, sed can help:

sed -r 's:([ACGTacgt])\1+:\
>&\
:g;s:^[^>]+$::mg;s:\
+>?:\
:g' INPUT

**Output:**

TT
GG
AAAAA
CC
CC
CC
CC
AA
CC
AA
AA

For a specific nucleic acid, use for example `[Aa]` instead, at the start of the script.