[CHART] Press Section Statistics | Bitcointalksearch.org

kiko

sr. member

Activity: 453

Merit: 250

Quote from: Grinder on April 28, 2013, 10:20:20 AM

It's hard to get a good impression with so many long thin lines. I suggest plotting weekly sums or weekly moving average instead.

Version with curve-fitted bezier trendline:

Grinder

legendary

Activity: 1284

Merit: 1001

It's hard to get a good impression with so many long thin lines. I suggest plotting weekly sums or weekly moving average instead.

kiko

sr. member

Activity: 453

Merit: 250

Cool, thanks for the tips.
My /tmp is just a ramdisk so I tend to abuse it.

grondilu

legendary

Activity: 1288

Merit: 1080

Nice script.

My first advice: don't use tempfiles. They always mess up your directory as you always forget to remove them.

Just make proper unix pipes: reading stdin, output to stdout.

Code:

#!/bin/bash

total_articles=1760
decrement=40

function scrape {
  curl "$1" | sed -rn 's#.*([0-9]{4}-[0-1][0-9]-[0-3][0-9]).*#\2#p' ;
}

{
   for ((x=total_articles; x>40; x-=decrement))
   do
   scrape "https://bitcointalk.org/index.php?board=77.$x"
   sleep 5 # This is here just to be kind to the server, remove for speedup.
   done

   scrape "https://bitcointalk.org/index.php?board=77"
} |
sort |
uniq -c |
sed -r 's/^ *([0-9]+) (.*)/\1,\2/'

Not tested yet but this should work as well as your initial code.

Update: Second advice: provide your parameters as arguments to your script, with default values

Code:

total_articles="${1:-1760}"
decrement="${2:-40}"

kiko

sr. member

Activity: 453

Merit: 250

Getting to grips with gnuplot has been on my ttd list for a while. So I pulled down some data from this right here section.

Here are the results.

You can see 4 clear media cycles in the last year. My only question is how big is the next one going to be? Shocked

for any linux geeks who want to play:

press_scaper.sh

Code:

#!/bin/bash

# press_scraper.sh - scrape and collate bitcoin press articles, output csv.
# usage - ./press_scraper.sh

# This program is free software. It comes without any warranty, to
# the extent permitted by applicable law. You can redistribute it
# and/or modify it under the terms of the Do What The Fuck You Want
# To Public License, Version 2, as published by Sam Hocevar. See
# http://sam.zoy.org/wtfpl/COPYING for more details.

total_articles=1760
decrement=40
tempfile=$(mktemp)
outfile=press_articles.csv

[ -f $tempfile ] || { echo "Error: Could not make temporary file. Exiting..."; \
exit 1 ; }

function scrape {
curl "$1" | sed -rn 's#.*([0-9]{4}-[0-1][0-9]-[0-3][0-9]).*#\2#p' ;
}

for ((x=total_articles; x>40; x-=decrement))
do
scrape "https://bitcointalk.org/index.php?board=77.$x" >> $tempfile
sleep 5 # This is here just to be kind to the server, remove for speedup.
done

scrape "https://bitcointalk.org/index.php?board=77" >> $tempfile

sort $tempfile | uniq -c | sed -r 's/^ *([0-9]+) (.*)/\1,\2/' >$outfile

gnuplot_commands

Code:

reset
clear
set xdata time
set format x "%Y-%m-%d"
set timefmt "%Y-%m-%d"
set datafile separator ","
set style fill solid noborder
set xtics rotate by -90 out nomirror 604800
set ytics out nomirror
set grid ytics
set ylabel "Press hits/day"
set xrange ["2012-04-07":"2013-04-26"]
set yrange [0:*]
set boxwidth 43200 absolute
set datafile separator ","
set term pngcairo truecolor font "Arial,11" size 1200,1200
set output "press_hits.png"
plot "press_articles.csv" using 2:1 with boxes ti "Press Article Frequency" lt 1 linecolor rgb "#FF0000"

Topic: [CHART] Press Section Statistics (Read 4192 times)