Reading Matrix Rows and Columns From Csv File in C

Working with data in a matrix

Loading data

Our example data is quality measurements (particle size) on PVC plastic production, using eight unlike resin batches, and three different motorcar operators.

The data fix is stored in comma-separated value (CSV) format. Each row is a resin batch, and each column is an operator. In RStudio, open pvc.csv and accept a expect at what information technology contains.

                              read.csv("data/intro-r/pvc.csv",                row.names=                one)            

Nosotros have called read.csv with two arguments: the name of the file we desire to read, and which column contains the row names. The filename needs to exist a character cord, so we put it in quotes. Assigning the second argument, row.names, to be i indicates that the data file has row names, and which cavalcade number they are stored in. If we don't specify row.names the result volition not have row names.

              dat <-                                read.csv("data/intro-r/pvc.csv",                row.names=                i)            
            ##        Alice   Bob  Carl ## Resin1 36.25 35.40 35.30 ## Resin2 35.fifteen 35.35 33.35 ## Resin3 30.70 29.65 29.20 ## Resin4 29.70 thirty.05 28.65 ## Resin5 31.85 31.40 29.30 ## Resin6 30.20 xxx.65 29.75 ## Resin7 32.90 32.50 32.80 ## Resin8 36.80 36.45 33.15          
            ## [1] "data.frame"          
            ## 'data.frame':    8 obs. of  3 variables: ##  $ Alice: num  36.2 35.1 thirty.7 29.7 31.9 ... ##  $ Bob  : num  35.four 35.four 29.6 xxx.i 31.iv ... ##  $ Carl : num  35.iii 33.4 29.2 28.6 29.3 ...          

read.csv has loaded the data as a information frame. A information frame contains a collection of "things" (rows) each with a set of properties (columns) of different types.

Actually this information is better thought of equally a matrixane. In a data frame the columns contain unlike types of information, but in a matrix all the elements are the aforementioned type of data. A matrix in R is like a mathematical matrix, containing all the same type of matter (usually numbers).

R often but not always lets these exist used interchangably. It's likewise helpful when thinking about data to distinguish between a information frame and a matrix. Unlike operations make sense for data frames and matrices.

Information frames are very central to R, and mastering R is very much virtually thinking in information frames. However when we get to RNA-Seq we will be using matrices of read counts, so information technology will be worth our time to learn to use matrices as well.

Let usa insist to R that what we take is a matrix. as.matrix "casts" our data to have matrix type.

              mat <-                                as.matrix(dat)                class(mat)            
            ## [1] "matrix"          
            ##  num [ane:8, 1:iii] 36.2 35.1 30.7 29.vii 31.9 ... ##  - attr(*, "dimnames")=Listing of ii ##   ..$ : chr [one:8] "Resin1" "Resin2" "Resin3" "Resin4" ... ##   ..$ : chr [one:3] "Alice" "Bob" "Carl"          

Much ameliorate.

Indexing matrices

We tin cheque the size of the matrix with the functions nrow and ncol:

            ## [one] 8          
            ## [1] 3          

This tells us that our matrix, mat, has 8 rows and 3 columns.

If we desire to get a unmarried value from the matrix, we tin can provide a row and column index in square brackets:

                              # outset value in mat                mat[i,                1]            
            ## [one] 36.25          
                              # a middle value in mat                mat[4,                2]            
            ## [1] 30.05          

If our matrix has row names and cavalcade names, we tin also refer to rows and columns by name.

            ## [1] thirty.05          

An index like [4, 2] selects a single chemical element of a matrix, simply nosotros can select whole sections as well. For example, we tin select the showtime two operators (columns) of values for the first 4 resins (rows) like this:

            ##        Alice   Bob ## Resin1 36.25 35.xl ## Resin2 35.15 35.35 ## Resin3 thirty.70 29.65 ## Resin4 29.70 thirty.05          

The slice 1:4 ways, the numbers from 1 to iv. It'due south the same as c(one,ii,three,four), and doesn't need to be used within [ ].

            ## [1] i ii three iv          

The slice does not need to first at 1, e.g. the line below selects rows five through viii:

            ##        Alice   Bob ## Resin5 31.85 31.forty ## Resin6 30.20 30.65 ## Resin7 32.90 32.50 ## Resin8 36.80 36.45          

We tin use vectors created with c to select non-contiguous values:

            ##        Alice Carl ## Resin1 36.25 35.3 ## Resin3 xxx.70 29.2 ## Resin5 31.85 29.3          

Nosotros as well don't have to provide an index for either the rows or the columns. If we don't include an index for the rows, R returns all the rows; if we don't include an alphabetize for the columns, R returns all the columns. If we don't provide an index for either rows or columns, eastward.g. mat[, ], R returns the full matrix.

                              # All columns from row five                mat[five, ]            
            ## Alice   Bob  Carl  ## 31.85 31.40 29.30          
                              # All rows from cavalcade 2                mat[,                2]            
            ## Resin1 Resin2 Resin3 Resin4 Resin5 Resin6 Resin7 Resin8  ##  35.40  35.35  29.65  30.05  31.40  30.65  32.l  36.45          

Summary functions

Now let's perform some common mathematical operations to acquire about our data. When analyzing information we often want to look at fractional statistics, such every bit the maximum value per resin or the boilerplate value per operator. I fashion to practise this is to select the data we desire to create a new temporary vector (or matrix, or information frame), and then perform the adding on this subset:

                              # start row, all of the columns                resin_1 <-                mat[1, ]                # max particle size for resin i                max(resin_1)            
            ## [1] 36.25          

Nosotros don't actually need to shop the row in a variable of its own. Instead, we can combine the selection and the function call:

                              # max particle size for resin 2                max(mat[ii, ])            
            ## [1] 35.35          

R also has functions for other common calculations, e.g. finding the minimum, mean, median, and standard deviation of the information:

                              # minimum particle size for operator 3                min(mat[,                3])            
            ## [one] 28.65          
                              # hateful for operator 3                mean(mat[,                iii])            
            ## [1] 31.4375          
                              # median for operator 3                median(mat[,                three])            
            ## [1] 31.275          
                              # standard deviation for operator iii                sd(mat[,                3])            
            ## [1] 2.49453          

Summarizing matrices

What if we need the maximum particle size for all resins, or the average for each operator? Every bit the diagram below shows, we desire to perform the performance across a margin of the matrix:

Operations Across Axes

To support this, we tin can apply the utilize office.

apply allows united states of america to echo a function on all of the rows (MARGIN = 1) or columns (MARGIN = two) of a matrix. We tin think of apply as collapsing the matrix downward to but the dimension specified by MARGIN, with rows being dimension 1 and columns dimension two (recall that when indexing the matrix we give the row first and the cavalcade second).

Thus, to obtain the average particle size of each resin we will demand to calculate the mean of all of the rows (MARGIN = 1) of the matrix.

              avg_resin <-                                utilize(mat,                1, hateful)            

And to obtain the average particle size for each operator we will demand to calculate the hateful of all of the columns (MARGIN = ii) of the matrix.

              avg_operator <-                                employ(mat,                ii, mean)            

Since the second argument to apply is MARGIN, the above command is equivalent to apply(dat, MARGIN = two, mean).

Challenge - Slicing (subsetting) information

We tin take slices of grapheme vectors also:

                  phrase <-                                        c("I",                    "don't",                    "know",                    "I",                    "know")                    # first three words                    phrase[one:three]                
                ## [1] "I"     "don't" "know"              
                                      # terminal iii words                    phrase[3:v]                
                ## [i] "know" "I"    "know"              
  1. If the beginning four words are selected using the slice phrase[ane:4], how tin we obtain the first four words in reverse order?

  2. What is phrase[-2]? What is phrase[-5]? Given those answers, explain what phrase[-1:-3] does.

  3. Use a slice of phrase to create a new character vector that forms the phrase "I know I don't", i.due east. c("I", "know", "I", "don't").

Challenge - Subsetting data 2

Suppose you desire to determine the maximum particle size for resin 5 across operators 2 and 3. To do this you would excerpt the relevant piece from the matrix and calculate the maximum value. Which of the post-obit lines of R lawmaking gives the correct answer?

  1. max(dat[5, ])
  2. max(dat[ii:three, five])
  3. max(dat[5, ii:iii])
  4. max(dat[5, ii, 3])

t examination

R has many statistical tests built in. Ane of the virtually commonly used tests is the t test. Practice the means of two vectors differ significantly?

            ## Alice   Bob  Carl  ## 36.25 35.40 35.30          
            ## Alice   Bob  Carl  ## 35.15 35.35 33.35          
            ##  ##  Welch Two Sample t-test ##  ## data:  mat[1, ] and mat[2, ] ## t = ane.4683, df = ii.8552, p-value = 0.2427 ## alternative hypothesis: true difference in ways is not equal to 0 ## 95 percent confidence interval: ##  -i.271985  3.338652 ## sample estimates: ## mean of x mean of y  ##  35.65000  34.61667          

Actually, this can be considered a paired sample t-test, since the values can exist paired up by operator. By default t.examination performs an unpaired t test. Nosotros see in the documentation (?t.examination) that we tin can give paired=TRUE every bit an argument in guild to perform a paired t-test.

                              t.examination(mat[1,], mat[2,],                paired=                TRUE)            
            ##  ##  Paired t-examination ##  ## information:  mat[i, ] and mat[2, ] ## t = ane.8805, df = 2, p-value = 0.2008 ## alternative hypothesis: true divergence in means is not equal to 0 ## 95 percentage confidence interval: ##  -1.330952  3.397618 ## sample estimates: ## mean of the differences  ##                1.033333          

Claiming - using t.exam

Can y'all find a significant difference between any ii resins?

When we call t.test it returns an object that behaves like a listing. Recall that in R a list is a miscellaneous collection of values.

              result <-                                t.exam(mat[1,], mat[2,],                paired=                TRUE)                names(effect)            
            ## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "gauge"    ## [6] "nix.value"  "alternative" "method"      "data.name"          
            ## [1] 0.2007814          

This means nosotros can write software that uses the diverse results from t.test, for example performing a whole serial of t tests and reporting the pregnant results.

Plotting

The mathematician Richard Hamming once said, "The purpose of computing is insight, not numbers," and the best style to develop insight is often to visualize data. Visualization deserves an entire lecture (or course) of its own, but nosotros tin can explore a few of R's plotting features.

Let's take a wait at the average particle size per resin. Recall that nosotros already calculated these values in a higher place using utilise(mat, 1, mean) and saved them in the variable avg_resin. Plotting the values is done with the function plot.

plot of chunk unnamed-chunk-23

Above, we gave the office plot a vector of numbers corresponding to the average per resin across all operators. plot created a besprinkle plot where the y-axis is the average particle size and the x-centrality is the order, or alphabetize, of the values in the vector, which in this instance correspond to the 8 resins.

plot can accept many different arguments to change the appearance of the output. Here is a plot with some extra arguments:

                              plot(avg_resin,                xlab=                "Resin",                ylab=                "Particle size",                principal=                "Average particle size per resin",                type=                "b")            

plot of chunk unnamed-chunk-24

Permit'due south have a look at two other statistics: the maximum and minimum particle size per resin. Boosted points or lines can be added to a plot with points or lines.

              max_resin <-                                apply(mat,                1, max) min_resin <-                                utilise(mat,                1, min)                plot(avg_resin,                type=                "b",                ylim=                c(25,40))                lines(max_resin)                lines(min_resin)            

plot of chunk unnamed-chunk-25

R doesn't know to adjust the y limits if nosotros add new data outside the original limits, so we needed to specify ylim manually. This is R's base graphics system. If in that location is fourth dimension today, nosotros will look at a more than advanced graphics package chosen "ggplot2" that handles this kind of issue more intelligently.

Claiming - Plotting data

Create a plot showing the standard deviation for each resin.

Advanced: Create a plot showing +/- ii standard deviations about the mean.

Extension: Create similar plots for operator. Which dimension (resin or operator) is the major source of variation in this data?

Saving plots

It'south possible to save a plot as a .PNG or .PDF from the RStudio interface with the "Export" push. Nevertheless if nosotros want to keep a complete tape of exactly how we create each plot, we prefer to do this with R code.

Plotting in R is sent to a "device". By default, this device is RStudio. However we can temporarily transport plots to a dissimilar device, such as a .PNG file (png("filename.png")) or .PDF file (pdf("filename.pdf")).

                              pdf("test.pdf")                plot(avg_resin)                dev.off()            

dev.off() is very important. It tells R to stop outputting to the pdf device and return to using the default device. If you forget, your interactive plots will stop appearing as expected!

The file you created should appear in the file managing director pane of RStudio, y'all can view information technology by clicking on it.

christianprolemare.blogspot.com

Source: http://monashbioinformaticsplatform.github.io/2015-11-30-intro-r/matrices.html

0 Response to "Reading Matrix Rows and Columns From Csv File in C"

Enregistrer un commentaire

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel