Artificial intelligent assistant

on-line tally of unique lines `uniq -c` is very useful for counting the number of times the same line appears consecutively $ seq 1 1000 | awk '{ if ($1 > 100 && $1 <= 200) { print "hi" } else {print "bye"} }' | uniq -c 100 bye 100 hi 800 bye However, in order to get a tally for each unique line I have to sort the input first, which seems a bit inefficient. $ seq 1 1000 | awk '{ if ($1 > 100 && $1 <= 200) { print "hi" } else {print "bye"} }' | sort | uniq -c Is there an idiomatic way to tally all occurrences of unique lines using an on-line algorithm?

Sorting the input first is about as efficient as it gets.

You can do it with an awk one-liner:


awk '{++seen[$0]} END {for (line in count) printf "%7d %s\
", count[line], line}'


Which one is more efficient (in memory and CPU time) depends on the data and on the implementation. `sort` is less efficient in theory because it does extra work, but on the other hand it has one job and does it well, whereas awk is a general-purpose tool. If there are a lot of duplicates, awk uses less memory and is probably a little faster. On the other hand, many implementations of sort can cope with huge data sets that don't fit in RAM, whereas awk will just thrash.

xcX3v84RxoQ-4GxG32940ukFUIEgYdPy f25dea330dbf256cacd4db16f2778205