Artificial intelligent assistant

Finding and deleting duplicated records I want to find the duplicated record in which same characters are duplicated For example, the pattern that I want to find is 'AA' or 'AAAAA' I try to use the grep command to find it. But it doesn't work well Here is the example that I tried, ATCTAGCGATCGATAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG TATCTATCTATCTATCTCATACTTCGCATCGCTAGCTCGACTGCATAGGACTAGCATAAAAAGCATCAGCTACCGCCTCAGCATCGACTACGATACG TAGTCGATCGACAGCTACGCATGCATCCGACTACGATCGACTAGCTAGCGCTAGACTACGTACCGATAAGCACTACGTCAGCCTAGACTCACGACT GATCGATCGATCGACTACGCAGCTACGAGATCGATCGATCGATCGTAGCTAGCTCATACTACACACGCATATACGTGTCGATgctAGTAACTACAT ACGCTAGCTAGCTACGATCAATCGAGCTATCGATCAGCTACGATCTAGAGATCGATCGATGCTGATAGCTACGATCagcactgatGCATCGCTGAT

The question is somewhat unclear. Assuming you want to find all substrings comprised of same nucleic acid 2 or more times, sed can help:


sed -r 's:([ACGTacgt])\1+:\
>&\
:g;s:^[^>]+$::mg;s:\
+>?:\
:g' INPUT


**Output:**


TT
GG
AAAAA
CC
CC
CC
CC
AA
CC
AA
AA


For a specific nucleic acid, use for example `[Aa]` instead, at the start of the script.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy 72dff188d8894261a5c7670958142fbb