Connecting R to PostgreSQL on Linux

Share Tweet

Connecting to databases is a critical piece of data anlaysis in R. In most analytic roles the data we consume is going to be found in databases. Of these some of the most common are SQL databases like MS SQL Server, PostgreSQL, and Oracle in addition to many others. In this how-to blog, I’ll walk you through the major steps of configuring your machine and R to be able to connect to a PostgreSQL Server database from R on Ubuntu using the RPostgreSQL, odbc, and RJDBC packages in R. Similar steps can be followed to set up connections to other databases, however, driver installation and configuration will likely be slightly different.

1 – RPostgreSQL Package Setup

The first step in setting up a connection to a PostgreSQL database is to first download the PostgreSQL header files and static library, libpq-dev. In order to do this on Ubuntu open the terminal and install it using the following command:

sudo apt-get install libpq-dev

Once the libpq-dev package is installed the next step is to install the RPostgreSQL package in R. If you need to authenticate, I highly recommend the getPass package which will prompt you for your password. RStudio also has a .rs.askForPassword() function that works similar to the getPass() function, but it relies on using RStudio. I’ve confirmed that getPass works in bash, emacs, RStudio, and when knitting your Rmd files. So however you submit your R code it will work the same.

# Install the package in R
install.packages("RPostgreSQL")
library(RPostgreSQL)
## Loading required package: DBI
library(getPass)
pgdrv <- dbDriver(drvName = "PostgreSQL")

db <-DBI::dbConnect(pgdrv,
                    dbname="postgres",
                    host="localhost", port=5432,
                    user = 'postgres',
                    password = getPass("Enter Password:"))
## Please enter password in TK window (Alt+Tab)
# Write to database
DBI::dbWriteTable(db, "mtcars", mtcars)
## [1] TRUE
DBI::dbDisconnect(db)
## [1] TRUE

Perfect! Your database connection should be working simply by adding the proper arguments in your dbConnect() function. You may need to tweak the host, port and user based on your PostgreSQL server setup.

2 – odbc Package Setup

In case you are a fan of odbc, the next section will walk you through the steps of creating your database connection via odbc.

In the past I have used the RODBC package but recently I have found that the odbc package plays much nicer with other database tools like DBI and dbplyr. Plus it has very similar syntax to the RJDBC package and for consistency sake I’ve made the switch.

Once again, the first step is to install the necessary debian packages. In this case we need to install the unixodbc and unixodbc-dev packages and the odbc-postgresql driver.

# Install the unixODBC library
apt-get install unixodbc unixodbc-dev

# PostgreSQL ODBC Drivers
apt-get install odbc-postgresql

Set up connection with connection string

Okay we are now ready to connect via odbc. Note the slight difference in the names of the arguments of the dbConnect() function.

db <- DBI::dbConnect(odbc::odbc(),
                     Driver = "PostgreSQL Unicode",
                     Database = "postgres",
                     UserName = "postgres",
                     Password = getPass("Enter Password:"),
                     Servername = "localhost",
                     Port = 5432)
## Please enter password in TK window (Alt+Tab)

Set up connection with DSN

If you don’t want to have to worry about defining each of these arguments each time you connect to PostgreSQL via odbc, you can define the configuration in your odbcinst.ini file. The following steps walk you through the process:

  1. Make sure the /etc/odbcinst.ini has the drivers set up. This should have been configured automatically when installing odbc-postgresql with apt-get. This is what it would look like:

    [PostgreSQL Unicode]
    Driver = psqlodbca.so
    Setup = libodbcpsqlS.so
    Debug = 0
    CommLog = 1
    UsageCount = 1
  2. Now define your DSN by modifying the odbc.ini file:

    [PostgreSQL]
    Driver = PostgreSQL Unicode
    Database = postgres
    Servername = localhost
    UserName = postgres
    Password = postgres
  3. Connect to your database by referencing your DSN name specified in the square brackets of the odbc.ini file:

# Connect to the database
db <- dbConnect(odbc::odbc(), "PostgreSQL")

# Pull the Data into an R dataframe
DBI::dbGetQuery(db,"SELECT * FROM MTCARS")
##              row.names  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 6              Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 7           Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 8            Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 9             Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 10            Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 11           Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 12          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 13          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 14         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 15  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 16 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 17   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 18            Fiat 128 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 19         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 20      Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 21       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## 22    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 23         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 24          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 25    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 26           Fiat X1-9 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## 27       Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 28        Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 29      Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 30        Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 31       Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 32          Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# Close the Connection
DBI::dbDisconnect(db)

Now you are ready to begin your analysis with your data!

3 – RJDBC Package Setup

Finally, the last way to configure a connection to the PostgreSQL database can be done via the RJDBC package. The first step in this configuration is to download the jdbc jar file from here. I’ve put this in my home directory, ~, and will reference this file in the JDBC() function below. Once you have the jar file you can install the RJDBC package in R.

install.packages('RJDBC')

Now you are ready to connect. Once again, notice the slight tweaks to the arguments of the dbConnect() function. Because I’m defining the url argument with the host, port and database name, there is no need for these additional arguments.

library(RJDBC)
## Loading required package: rJava
db <- DBI::dbConnect(RJDBC::JDBC("org.postgresql.Driver","~/postgresql-42.2.2.jar"),
               url = "jdbc:postgresql://localhost:5432/postgres",
               user = "postgres",
               password = getPass("Enter Password:"))
## Please enter password in TK window (Alt+Tab)
# Pull the Data into an R dataframe
DBI::dbGetQuery(db,"SELECT * FROM MTCARS")
##              row.names  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 6              Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 7           Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 8            Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 9             Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 10            Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 11           Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 12          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 13          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 14         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 15  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 16 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 17   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 18            Fiat 128 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 19         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 20      Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 21       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## 22    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 23         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 24          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 25    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 26           Fiat X1-9 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## 27       Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 28        Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 29      Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 30        Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 31       Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 32          Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
# Close the Connection
DBI::dbDisconnect(db)
## [1] TRUE

Alright! We’ve walked through several different configurations in connecting to a PostgreSQL database on Ubuntu. You’ll only need one of these setups, but I think it’s nice to understand each of your options so you can create the best setup that works for you and/or your organization.

Share Tweet
__data