
The spRingsteen package provides a number of dataframes describing
the songs, albums, tours, and setlists of Bruce Springsteen’s career.
The data (collected from Brucebase) is provided in a
tidy form which is easily analyzed in R. The scripts which
are used to scrape the data in their entirety, alongside a SQLite
representation of the data may be viewed at a second repository springsteen_db.
You can install the released version of spRingsteen from CRAN with:
install.packages("spRingsteen")Alternatively, you can install the development version of spRingsteen from GitHub like so:
remotes::install_github("obrienjoey/spRingsteen")While the spRingsteen CRAN version
is updated every few months, the Github (Dev)
version is updated on a daily basis. The update_data
function enables to overcome this gap and keep the installed version
with the most recent data available on the Github version:
library(spRingsteen)
update_data()Note: must restart the R session to have the updates available
The package includes datasets around the career of Bruce Springsteen.
For example, the touring history of him and his numerous bands is stored
in concerts:
library(spRingsteen)
library(dplyr)
concerts
#> # A tibble: 2,930 x 6
#>    gig_key                                       date       location        state city  country
#>    <chr>                                         <date>     <chr>           <chr> <chr> <chr>  
#>  1 /gig:1973-01-03-main-point-bryn-mawr-pa-early 1973-01-03 THE MAIN POINT~ PA    <NA>  USA    
#>  2 /gig:1973-01-03-main-point-bryn-mawr-pa-late  1973-01-03 THE MAIN POINT~ PA    <NA>  USA    
#>  3 /gig:1973-01-04-main-point-bryn-mawr-pa-early 1973-01-04 THE MAIN POINT~ PA    <NA>  USA    
#>  4 /gig:1973-01-04-main-point-bryn-mawr-pa-late  1973-01-04 THE MAIN POINT~ PA    <NA>  USA    
#>  5 /gig:1973-01-05-main-point-bryn-mawr-pa-early 1973-01-05 THE MAIN POINT~ PA    <NA>  USA    
#>  6 /gig:1973-01-05-main-point-bryn-mawr-pa-late  1973-01-05 THE MAIN POINT~ PA    <NA>  USA    
#>  7 /gig:1973-01-06-main-point-bryn-mawr-pa-early 1973-01-06 THE MAIN POINT~ PA    <NA>  USA    
#>  8 /gig:1973-01-06-main-point-bryn-mawr-pa-late  1973-01-06 THE MAIN POINT~ PA    <NA>  USA    
#>  9 /gig:1973-01-08-paul-s-mall-boston-ma-early   1973-01-08 PAUL'S MALL, B~ MA    <NA>  USA    
#> 10 /gig:1973-01-08-paul-s-mall-boston-ma-late    1973-01-08 PAUL'S MALL, B~ MA    <NA>  USA    
#> # ... with 2,920 more rows
# how many concerts have occurred in each country?
concerts %>% 
  count(country, sort = TRUE)
#> # A tibble: 39 x 2
#>    country       n
#>    <chr>     <int>
#>  1 USA        2261
#>  2 Canada       96
#>  3 England      88
#>  4 Australia    56
#>  5 Germany      52
#>  6 Spain        51
#>  7 Italy        50
#>  8 France       43
#>  9 Sweden       37
#> 10 Ireland      26
#> # ... with 29 more rowsIt also has information of the setlists performed in these shows
which are stored in setlists.
setlists
#> # A tibble: 52,100 x 4
#>    gig_key                                       song_key                     song  song_number
#>    <chr>                                         <chr>                        <chr>       <int>
#>  1 /gig:1973-01-03-main-point-bryn-mawr-pa-early /song:it-s-hard-to-be-a-sai~ It's~           1
#>  2 /gig:1973-01-03-main-point-bryn-mawr-pa-early /song:santa-ana              Sant~           2
#>  3 /gig:1973-01-03-main-point-bryn-mawr-pa-early /song:secret-to-the-blues    Secr~           3
#>  4 /gig:1973-01-03-main-point-bryn-mawr-pa-early /song:new-york-song          New ~           4
#>  5 /gig:1973-01-08-paul-s-mall-boston-ma-early   /song:growin-up              Grow~           1
#>  6 /gig:1973-01-09-wbcn-studio-boston-ma         /song:satin-doll             Sati~           1
#>  7 /gig:1973-01-09-wbcn-studio-boston-ma         /song:bishop-danced          Bish~           2
#>  8 /gig:1973-01-09-wbcn-studio-boston-ma         /song:wild-billy-s-circus-s~ Circ~           3
#>  9 /gig:1973-01-09-wbcn-studio-boston-ma         /song:song-for-orphans       Song~           4
#> 10 /gig:1973-01-09-wbcn-studio-boston-ma         /song:does-this-bus-stop-at~ Does~           5
#> # ... with 52,090 more rows
# what song has been played most by Springsteen?
setlists %>%
  count(song, sort = TRUE)
#> # A tibble: 994 x 2
#>    song                            n
#>    <chr>                       <int>
#>  1 Born To Run                  1710
#>  2 Thunder Road                 1440
#>  3 The Promised Land            1387
#>  4 Badlands                     1195
#>  5 Tenth Avenue Freeze-Out      1107
#>  6 Dancing In The Dark          1050
#>  7 Born In The U.s.a.           1011
#>  8 The Rising                    881
#>  9 Rosalita (Come Out Tonight)   812
#> 10 Hungry Heart                  737
#> # ... with 984 more rows
# which song has most frequently opened a show?
setlists %>%
  filter(song_number == 1) %>%
  count(song, sort = TRUE) %>%
  slice(1)
#> # A tibble: 1 x 2
#>   song           n
#>   <chr>      <int>
#> 1 Growin' Up   272Further details of the songs themselves are available in
songs, including the album of appearance and also the full
lyrics in some cases. This allows for some text mining or sentiment
analysis using a package like tidytext.
library(tidytext)
#> Warning: package 'tidytext' was built under R version 4.1.3
# what word appears most frequently in the **Born in the U.S.A** album?
songs %>% 
  filter(album == "Born In The U.S.A.") %>% 
  select(title, lyrics) %>% 
  unnest_tokens(word, lyrics) %>% 
  count(word, sort = TRUE) %>% 
  anti_join(stop_words, by = 'word')
#> # A tibble: 513 x 2
#>    word        n
#>    <chr>   <int>
#>  1 la        158
#>  2 yeah       47
#>  3 alright    41
#>  4 sha        40
#>  5 glory      37
#>  6 days       35
#>  7 u.s.a      32
#>  8 born       30
#>  9 hoo        27
#> 10 baby       26
#> # ... with 503 more rowsLastly, the tour table contains the tours associated
with each concert.
tours %>% 
  count(tour, sort = TRUE)
#> # A tibble: 24 x 2
#>    tour                                                   n
#>    <chr>                                              <int>
#>  1 Non-tour Shows                                       575
#>  2 Springsteen On Broadway                              268
#>  3 The River Tour                                       213
#>  4 The Wild, The Innocent & The E Street Shuffle Tour   197
#>  5 Born In The U.S.A. Tour                              156
#>  6 Greetings From Asbury Park Tour                      147
#>  7 Wrecking Ball Tour                                   134
#>  8 The Reunion Tour                                     132
#>  9 The Ghost Of Tom Joad Tour                           128
#> 10 The Rising Tour                                      120
#> # ... with 14 more rowsOf course the real advantage of this package is in combining the different dataframes in order to infer useful information:
# what was the most played song on each tour?
setlists %>% 
  left_join(tours, by = 'gig_key') %>%
  count(song, tour) %>%
  group_by(tour) %>%
  filter(n == max(n)) %>%
  arrange(desc(tour))
#> # A tibble: 95 x 3
#> # Groups:   tour [25]
#>    song                       tour                            n
#>    <chr>                      <chr>                       <int>
#>  1 Death To My Hometown       Wrecking Ball Tour            134
#>  2 Leap Of Faith              World Tour 1992-93            103
#>  3 American Land              Working On A Dream Tour        83
#>  4 Born To Run                Working On A Dream Tour        83
#>  5 The Promised Land          Vote For Change                22
#>  6 Adam Raised A Cain         Tunnel Of Love Express Tour    67
#>  7 All That Heaven Will Allow Tunnel Of Love Express Tour    67
#>  8 Born In The U.s.a.         Tunnel Of Love Express Tour    67
#>  9 Born To Run                Tunnel Of Love Express Tour    67
#> 10 Brilliant Disguise         Tunnel Of Love Express Tour    67
#> # ... with 85 more rows