=== modified file 'src/docbkx/en/dhis2_r.xml' --- src/docbkx/en/dhis2_r.xml 2012-03-04 05:25:33 +0000 +++ src/docbkx/en/dhis2_r.xml 2012-03-05 08:13:55 +0000 @@ -2,10 +2,13 @@ DHIS2 and R integration -
- DHIS2 and R +
+ Introduction R is freely available, open source statistical computing environment. R refers to both the computer programming language, as well as the software which can be used to create and run R scripts. There are numerous sources on the web which describe the extensive set of features of R. R is a natural extension to DHIS2, as it provides powerful statistical routines, data manipulation functions, and visualization tools. This chapter will describe how to setup R and DHIS2 on the same server, and will provide a simple example of how to retrieve data from the DHIS2 database into an R data frame and perform some basic calculations. +
+
+ Using ODBC to retrieve data from DHIS2 into R In this example, we will use a system-wide ODBC connector which will be used to retrieve data from the DHIS2 database. There are some disadvantages with this approach, as ODBC is slower than other methods and it does raise some security concerns by providing a system-wide connector to all users. However, it is a convenient method to provide a connection to multiple users. The use of the R package RODBC will be used in this case. Other alternatives would be the use of the RPostgreSQL package, which can interface directly through the Postgresql driver. First, we will install R and some other required and useful packages. Invoke the following command: apt-get install r-base r-cran-odbc r-cran-lattice odbc-postgresql @@ -133,6 +136,65 @@ 11 2011-11-01 51.42526 48.57474 12 2011-12-01 50.68933 49.31067 We can see that the male and female attendances are very similar for each month of the year, with seemingly higher male attendance relative to female attendance in the month of December. - In this example, we show how to retreive data from the DHIS2 database and manipulate in with some simple R commands. The basic pattern for using DHIS2 and R together, will be the retrieval of data from the DHIS2 database with an SQL query into an R data frame, followed by whatever routines (statistical analysis, plotting, etc) which may be required. + In this example, we showed how to retreive data from the DHIS2 database and manipulate in with some simple R commands. The basic pattern for using DHIS2 and R together, will be the retrieval of data from the DHIS2 database with an SQL query into an R data frame, followed by whatever routines (statistical analysis, plotting, etc) which may be required. +
+
+ Using R with MyDatamart + MyDatamart provides useful interface to the DHIS2 database by making a local copy of the database available on a users desktop. This means that the user does not need direct access to the database and the data can be worked with offline on the users local machine. In this example, we will have used the demo database. Data was downloaded at the district level for Jan 2011-Dec 201l. Consult the MyDatamart section in this manual for more detailed information. + + First, lets load some required packages. If you do not have these packages already installed in your version of R, you will need to do so before proceeding with the example. + library("DBI") +library("RSQLite") +library("lattice") +library("latticeExtra") + Next, we are going to connect to the local copy of the MyDatamart database. In this case, it was located at C:\dhis2\sl.dmart. + dbPath<-"C:\\dhis2\\sl.dmart" +drv<-dbDriver("SQLite") +db<-dbConnect(drv,dbPath) + Let suppose we have been asked to compare ANC 1, 2, 3 coverage rates for each district for 2011. We can define an SQL query to retrieve data from the MyDatamart database into an R data frame as follows. + #An SQL query which will retreive all indicators +#at OU2 le +sql<-"SELECT * FROM pivotsource_indicator_ou2_m +WHERE year = '2011'" +#Execute the query into a new result set +rs<-dbSendQuery(db,sql) +#Put the entire result set into a new data frame +Inds<-fetch(rs,n=-1) +#Clean up a bit +dbClearResult(rs) +dbDisconnect(db) + We used one of the pre-existing Pivot Source queries in the database to get all of the indicator values. Of course, we could have retrieved only the ANC indicators, but we did not exactly know how the data was structured, or how the columns were named, so lets take a closer look. + #Get the name of the columns +colnames(Inds) +#output not shown for brevity +levels(as.factor(Inds$indshort)) + We see from the colnames commnand that there is an column called "indshort" which looks like it contains some indicator names. We can see the names using the second command. After we have determined which ones we need (ANC 1, 2, and 3), lets further subset the data so that we only have these. + #Subset the data for ANC +ANC<-Inds[grep("ANC (1|2|3) Coverage",as.factor(Inds$indshort)),] + We just used R's grep function to retrieve all the rows and columns of the Inds data frame which matched the regular expression "ANC (1|2|3) Coverage" and put this into a new data frame called "ANC". + By looking at the data with the str(ANC) command, we will notice that the time periods are not ordered correctly, so lets fix this before we try and create a plot of the data. + #Lets reorder the months +MonthOrder<-c('Jan','Feb','Mar','Apr', +'May','Jun','Jul','Aug','Sep','Oct','Nov','Dec') +ANC$month<-factor(ANC$month,levels=MonthOrder) + Next, we need to actually calculate the indicator value from the numerator, factor and denominator. + #Calculate the indicator value +ANC$value<-ANC$numxfactor/ANC$denominatorvalue + Finally, lets create a simple trellis plot which compares ANC 1, 2, 3 for each district by month and save it to our local working directory in a file called "District_ANC.png". + png(filename="District_ANC.png",width=1024,height=768) +plot.new() + xyplot(value ~ month | ou2, data=ANC, type="a", main="District ANC Comparison Sierra Leone 2011", + groups=indshort,xlab="Month",ylab="ANC Coverage", + scales = list(x = list(rot=90)), + key = simpleKey(levels(factor(ANC$indshort)), + points=FALSE,lines=TRUE,corner=c(1,1))) + mtext(date(), side=1, line=3, outer=F, adj=0, cex=0.7) +dev.off() + The results of which are displayed below. + + + + +
=== modified file 'src/docbkx/en/dhis2_user_manual_en.xml' --- src/docbkx/en/dhis2_user_manual_en.xml 2012-03-03 09:23:21 +0000 +++ src/docbkx/en/dhis2_user_manual_en.xml 2012-03-05 08:13:55 +0000 @@ -52,7 +52,9 @@ R and DHIS2 Integration - + + + DHIS Technical Architecture Guide === added directory 'src/docbkx/en/resources/images/r' === added file 'src/docbkx/en/resources/images/r/District_ANC.png' Binary files src/docbkx/en/resources/images/r/District_ANC.png 1970-01-01 00:00:00 +0000 and src/docbkx/en/resources/images/r/District_ANC.png 2012-03-05 08:13:55 +0000 differ