remove duplicated pattern/entries within each field in CSV file How do I remove duplicated entries within each separate fields with below sample as data. 0x,9.4,,,#0,#UNIX#unix,#cli#L#فا#0#فا#0#L#SE#Cli#...--prophetes.ai

remove duplicated pattern/entries within each field in CSV file How do I remove duplicated entries within each separate fields with below sample as data. 0x,9.4,,,#0,#UNIX#unix,#cli#L#فا#0#فا#0#L#SE#Cli#SE,#فارسی#فارسی#۱#1#١#1,bsh,#V & v expected output(either delete all duplicated ones, case-insensitive, difference in Unicode "Persian `#۱`/Arabic `#١`", order of entries and which entry (ignore case) should keep doesn't matter here): 0x,9.4,,,#0,#unix,#cli#L#فا#0#SE,#فارسی#١#۱#1,bsh,#V & v The pattern is in this format `#x`, `x` means anything in one or more length of characters. Unicode table for Persian/Arabic languages alphabet/numbers differences

Using a perl command line in a shell (just a few lines) with a proper csv parser :

perl -CS -Mopen=":std,IN,OUT,IO,:encoding(utf8)" -MText::CSV -lne '
BEGIN{
our $csv = Text::CSV->new({ sep_char => "," });
sub uniq { my %seen; grep !$seen{lc $_}++, @_; }
};
$csv->parse($_) or die "parse error";
print join ",", map { join "#", uniq split /#/ } $csv->fields();
' file.csv

## Output :

0x,9.4,,,#0,#UNIX,#cli#L#فا#0#SE,#فارسی#۱#1#١,bsh,#V & v

## Note :

* require to install `Text::CSV` perl module : `sudo apt-get install libtext-csv-perl` for debian and derivative