Recently I needed to grab some text values from a number of pdf files. Instead of having to manually open each and every pdf file I just knew there had to be an easier way.
After a quick search, I found the solution; <a href="http://sourceforge.net/projects/itextsharp/">iTextSharp</a>, an open source C# library that allows you to do a host of awesome stuff with pdf files. It is a port of iText which is a Java library. You can find more info about iText on their website at <a href="http://www.itextpdf.com">www.itextpdf.com</a> . I just knew this library is something else when I saw they had an entire book dedicated to it.
Manipulating and reading pdf files is no trivial task, but luckily for me the pdf files I needed to read were fairly straight forward and I used the following code to return the contents of the file as one big string:
<pre><code class="lang-csharp">private string ParsePdf(string filePath)
{
 string text = string.Empty;

 PdfReader reader = new iTextSharp.text.pdf.PdfReader(filePath);
 byte[] streamBytes = reader.GetPageContent(1);
 PRTokeniser tokenizer = new PRTokeniser(streamBytes);

 while (tokenizer.NextToken())
 {
 if (tokenizer.TokenType == PRTokeniser.TokType.STRING)
 {
 text += tokenizer.StringValue;
 }
 }
 return text;
}
</code></pre>
From there I used some string manipulation to grab the values I needed and perform some additional logic. Easy!
Links from this post:
<ul>
<li><a href="http://sourceforge.net/projects/itextsharp/">iTextSharp</a></li>
<li><a href="http://itextpdf.com/">iText</a></li>
</ul>

Recently I needed to grab some text values from a number of pdf files. Instead of having to manually open each and every pdf file I just knew there had to be an easier way.

After a quick search, I found the solution; [iTextSharp](http://sourceforge.net/projects/itextsharp/), an open source C# library that allows you to do a host of awesome stuff with pdf files. It is a port of iText which is a Java library. You can find more info about iText on their website at [www.itextpdf.com](http://www.itextpdf.com) . I just knew this library is something else when I saw they had an entire book dedicated to it.

Manipulating and reading pdf files is no trivial task, but luckily for me the pdf files I needed to read were fairly straight forward and I used the following code to return the contents of the file as one big string:

```csharp
private string ParsePdf(string filePath)
{
    string text = string.Empty;

    PdfReader reader = new iTextSharp.text.pdf.PdfReader(filePath);
    byte[] streamBytes = reader.GetPageContent(1);
    PRTokeniser tokenizer = new PRTokeniser(streamBytes);

    while (tokenizer.NextToken())
    {
        if (tokenizer.TokenType == PRTokeniser.TokType.STRING)
        {
            text += tokenizer.StringValue;
        }
    }
    return text;
}
```
From there I used some string manipulation to grab the values I needed and perform some additional logic. Easy!

**Links from this post:**

*   [iTextSharp](http://sourceforge.net/projects/itextsharp/)
*   [iText](http://itextpdf.com/)

Reading PDF files with C#

Hi, I’m Pieter van der Westhuizen. I'm a professional freelance web & mobile developer from South Africa that has been code slinging for more than 23 years.
https://youtube.com/shorts/aCCKAnDNrzM


Hi, I’m Pieter van der Westhuizen. I'm a professional freelance web &amp; mobile developer from South Africa that has been code slinging for more than 23 years.
https://youtube.com/shorts/aCCKAnDNrzM