RとStataでの週番号の扱い
ずっと扱ってるデータで、なんかおかしいなと思っていたことでついに突き止めたっぽいので備忘録として残しておく。
扱っているデータが、日ごと(もしくはもっと細かいレベル)を週ごとにまとめたデータっぽいのであるが、あるイベントが起こった後かどうかを調べるために、ある日より前か後かを調べる、というのでRで日から週番号に変換する作業を行った。例えば、2018年1月17日は2018年の3週目、といった感じである。
Rでの週番号に変換する方法は知る限り2つ、strftime()とlubridate::week()である。strftime()の方ではフォーマットが二種類あって、%Wか%Vの2つがある。ここで見てもちょっとよくわからなかったので、自分で動かしてみた。
日毎の週番号を一ヶ月分作る関数をざっくり作ってみて、2008年の1月を出力してみる。
cal_fun = function(yr,mon){ # Week number with %W week_W = strftime(as.Date( paste(yr,mon,c(1:31),sep = "-"), format="%Y-%m-%d"), "%W") # Week number with %V week_V = strftime(as.Date( paste(yr,mon,c(1:31),sep = "-"), format="%Y-%m-%d"), "%V") # Week number with lubridate::week week_lub =lubridate::week(as.Date( paste(yr,mon,c(1:31),sep = "-"), format="%Y-%m-%d")) # 曜日 week_day = weekdays(as.Date( paste(yr,mon,c(1:31),sep = "-"), format="%Y-%m-%d") ) cal_temp = cbind(week_W,week_V,week_lub,week_day) cal_temp } cal_fun(2008,1)
> cal_fun(2008,1) week_W week_V week_lub week_day [1,] "00" "01" "1" "Tuesday" [2,] "00" "01" "1" "Wednesday" [3,] "00" "01" "1" "Thursday" [4,] "00" "01" "1" "Friday" [5,] "00" "01" "1" "Saturday" [6,] "00" "01" "1" "Sunday" [7,] "01" "02" "1" "Monday" [8,] "01" "02" "2" "Tuesday" [9,] "01" "02" "2" "Wednesday" [10,] "01" "02" "2" "Thursday" [11,] "01" "02" "2" "Friday" [12,] "01" "02" "2" "Saturday" [13,] "01" "02" "2" "Sunday" [14,] "02" "03" "2" "Monday" [15,] "02" "03" "3" "Tuesday" [16,] "02" "03" "3" "Wednesday" [17,] "02" "03" "3" "Thursday" [18,] "02" "03" "3" "Friday" [19,] "02" "03" "3" "Saturday" [20,] "02" "03" "3" "Sunday" [21,] "03" "04" "3" "Monday" [22,] "03" "04" "4" "Tuesday" [23,] "03" "04" "4" "Wednesday" [24,] "03" "04" "4" "Thursday" [25,] "03" "04" "4" "Friday" [26,] "03" "04" "4" "Saturday" [27,] "03" "04" "4" "Sunday" [28,] "04" "05" "4" "Monday" [29,] "04" "05" "5" "Tuesday" [30,] "04" "05" "5" "Wednesday" [31,] "04" "05" "5" "Thursday"
ルール的には
- %Wと%Vは月曜日始まり。
- lubridate::week()は火曜日始まり。
- %Vは最初から1週目がスタートで、%Wは最初の月曜日から1週目がスタート
と思いきや、%Vは前年の週を持ち越すパターンもあるっぽい。
> cal_fun(2006,1) week_W week_V week_lub week_day [1,] "00" "52" "1" "Sunday" [2,] "01" "01" "1" "Monday" [3,] "01" "01" "1" "Tuesday" [4,] "01" "01" "1" "Wednesday" [5,] "01" "01" "1" "Thursday" [6,] "01" "01" "1" "Friday" [7,] "01" "01" "1" "Saturday" [8,] "01" "01" "2" "Sunday" [9,] "02" "02" "2" "Monday" [10,] "02" "02" "2" "Tuesday" [11,] "02" "02" "2" "Wednesday" [12,] "02" "02" "2" "Thursday" [13,] "02" "02" "2" "Friday" [14,] "02" "02" "2" "Saturday" [15,] "02" "02" "3" "Sunday" [16,] "03" "03" "3" "Monday" [17,] "03" "03" "3" "Tuesday" [18,] "03" "03" "3" "Wednesday" [19,] "03" "03" "3" "Thursday" [20,] "03" "03" "3" "Friday" [21,] "03" "03" "3" "Saturday" [22,] "03" "03" "4" "Sunday" [23,] "04" "04" "4" "Monday" [24,] "04" "04" "4" "Tuesday" [25,] "04" "04" "4" "Wednesday" [26,] "04" "04" "4" "Thursday" [27,] "04" "04" "4" "Friday" [28,] "04" "04" "4" "Saturday" [29,] "04" "04" "5" "Sunday" [30,] "05" "05" "5" "Monday" [31,] "05" "05" "5" "Tuesday"
月曜始まりなのは変わらないけど、なんで??
もともとのデータはStataで作られたっぽいので、Stataの方も確認してみた。
Stataはdi week()で週番号が得られるようだ。
どの年でも、1月1日は1週目として扱われるっぽい。
. di week(mdy(1,1,2008)) 1 . di week(mdy(1,1,2007)) 1 . di week(mdy(1,1,2006)) 1 . di week(mdy(1,1,2005)) 1 . di week(mdy(1,1,2004)) 1 . di week(mdy(1,1,2003)) 1 . di week(mdy(1,1,2002)) 1
そして、火曜日始まりのようだ。(2008年1月7日は月曜日)
. di week(mdy(1,1,2008)) 1 . di week(mdy(1,6,2008)) 1 . di week(mdy(1,7,2008)) 1 . di week(mdy(1,8,2008)) 2 . di week(mdy(1,9,2008)) 2
月曜日はわかるけど、火曜日始まりってなんやねん。
個人的に元旦は1週目扱い、かつ日曜日始まりがほしいので以下の関数を作ってみた。
sun_start = function(date){ # Week number with %W date_fm = as.Date(date,format="%Y-%m-%d")+1 week_W2 = as.numeric( strftime(date_fm,"%W") ) # If the first day is week 00, add 1 week_W2 = ifelse( strftime(as.Date(paste(lubridate::year(date_fm),"01","01",sep = "-"))+1,"%W") == "00", week_W2 + 1, week_W2 ) # return result week_W2 }
ちょっと雑だが、事足りそうである。
cal_fun2 = function(yr,mon){ # Week number with %W week_W = strftime(as.Date( paste(yr,mon,c(1:31),sep = "-"), format="%Y-%m-%d"), "%W") week_W2 = sun_start(as.Date( paste(yr,mon,c(1:31),sep = "-"), format="%Y-%m-%d")) week_day = weekdays(as.Date( paste(yr,mon,c(1:31),sep = "-"), format="%Y-%m-%d") ) cal_temp = cbind(week_W,week_W2,week_day) cal_temp } cal_fun2(2005,1)
> cal_fun2(2005,1) week_W week_W2 week_day [1,] "00" "1" "Saturday" [2,] "00" "2" "Sunday" [3,] "01" "2" "Monday" [4,] "01" "2" "Tuesday" [5,] "01" "2" "Wednesday" [6,] "01" "2" "Thursday" [7,] "01" "2" "Friday" [8,] "01" "2" "Saturday" [9,] "01" "3" "Sunday" [10,] "02" "3" "Monday" [11,] "02" "3" "Tuesday" [12,] "02" "3" "Wednesday" [13,] "02" "3" "Thursday" [14,] "02" "3" "Friday" [15,] "02" "3" "Saturday" [16,] "02" "4" "Sunday" [17,] "03" "4" "Monday" [18,] "03" "4" "Tuesday" [19,] "03" "4" "Wednesday" [20,] "03" "4" "Thursday" [21,] "03" "4" "Friday" [22,] "03" "4" "Saturday" [23,] "03" "5" "Sunday" [24,] "04" "5" "Monday" [25,] "04" "5" "Tuesday" [26,] "04" "5" "Wednesday" [27,] "04" "5" "Thursday" [28,] "04" "5" "Friday" [29,] "04" "5" "Saturday" [30,] "04" "6" "Sunday" [31,] "05" "6" "Monday"
1月1日は曜日にかかわらず1週目扱い、その後は日曜日から曜日番号が変わるようになっている。