Author

Topic: [CHART] Press Section Statistics (Read 4177 times)

sr. member
Activity: 453
Merit: 250
April 28, 2013, 12:42:31 PM
#5
It's hard to get a good impression with so many long thin lines. I suggest plotting weekly sums or weekly moving average instead.
Version with curve-fitted bezier trendline:
legendary
Activity: 1284
Merit: 1001
April 28, 2013, 11:20:20 AM
#4
It's hard to get a good impression with so many long thin lines. I suggest plotting weekly sums or weekly moving average instead.
sr. member
Activity: 453
Merit: 250
April 28, 2013, 08:00:02 AM
#3
Cool, thanks for the tips.
My /tmp is just a ramdisk so I tend to abuse it.
legendary
Activity: 1288
Merit: 1076
April 28, 2013, 07:09:03 AM
#2
Nice script.

My first advice: don't use tempfiles.  They always mess up your directory as you always forget to remove them.

Just make proper unix pipes:  reading stdin, output to stdout.

Code:
#!/bin/bash

total_articles=1760
decrement=40

function scrape {
  curl "$1" | sed -rn 's#.*([0-9]{4}-[0-1][0-9]-[0-3][0-9]).*#\2#p' ;
}

{
    for ((x=total_articles; x>40; x-=decrement))
    do
        scrape "https://bitcointalk.org/index.php?board=77.$x"
        sleep 5 # This is here just to be kind to the server, remove for speedup.
    done

    scrape "https://bitcointalk.org/index.php?board=77"
} |
sort |
uniq -c |
sed -r 's/^ *([0-9]+) (.*)/\1,\2/'

Not tested yet but this should work as well as your initial code.

Update:  Second advice:  provide your parameters as arguments to your script, with default values

Code:
total_articles="${1:-1760}"
decrement="${2:-40}"
sr. member
Activity: 453
Merit: 250
April 28, 2013, 05:28:12 AM
#1
Getting to grips with gnuplot has been on my ttd list for a while. So I pulled down some data from this right here section.

Here are the results.


You can see 4 clear media cycles in the last year. My only question is how big is the next one going to be?  Shocked

for any linux geeks who want to play:

press_scaper.sh
Code:
#!/bin/bash

#  press_scraper.sh - scrape and collate bitcoin press articles, output csv.
#  usage            - ./press_scraper.sh

# This program is free software. It comes without any warranty, to
# the extent permitted by applicable law. You can redistribute it
# and/or modify it under the terms of the Do What The Fuck You Want
# To Public License, Version 2, as published by Sam Hocevar. See
# http://sam.zoy.org/wtfpl/COPYING for more details.

total_articles=1760
decrement=40
tempfile=$(mktemp)
outfile=press_articles.csv

[ -f $tempfile ] || { echo "Error: Could not make temporary file. Exiting..."; \
  exit 1 ; }

function scrape {
  curl "$1" | sed -rn 's#.*([0-9]{4}-[0-1][0-9]-[0-3][0-9]).*#\2#p' ;
}

for ((x=total_articles; x>40; x-=decrement))
do
  scrape "https://bitcointalk.org/index.php?board=77.$x" >> $tempfile
  sleep 5 # This is here just to be kind to the server, remove for speedup.
done

scrape "https://bitcointalk.org/index.php?board=77" >> $tempfile

sort $tempfile | uniq -c | sed -r 's/^ *([0-9]+) (.*)/\1,\2/' >$outfile

gnuplot_commands
Code:
reset
clear
set xdata time
set format x "%Y-%m-%d"
set timefmt "%Y-%m-%d"
set datafile separator ","
set style fill solid noborder
set xtics rotate by -90 out nomirror 604800
set ytics out nomirror
set grid ytics
set ylabel "Press hits/day"
set xrange ["2012-04-07":"2013-04-26"]
set yrange [0:*]
set boxwidth 43200 absolute
set datafile separator ","
set term pngcairo truecolor font "Arial,11" size 1200,1200
set output "press_hits.png"
plot "press_articles.csv" using 2:1 with boxes ti "Press Article Frequency" lt 1 linecolor rgb "#FF0000"
Jump to: