Reddit Reddit reviews Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

We found 27 Reddit comments about Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. Here are the top ones, ranked by their Reddit score.

Computers & Technology
Books
Databases & Big Data
Data Processing
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
O'Reilly Media
Check price on Amazon

27 Reddit comments about Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython:

u/grandzooby · 11 pointsr/learnpython

This book, "Python for Data Analysis" is coming out in October on Amazon, but PDFs might be available directly from O'Reilley if you pre-order. It's by Wes McKinney, who was apparently involved with pandas and has a blog about doing quant analysis with Python:
http://blog.wesmckinney.com/

You might find what you're looking for in some of his stuff.

u/uhkhu · 11 pointsr/learnpython

Pandas is a well-known library for data analysis. Very good tutorial.

Good book on Pandas

Good Udemy Course for Python

u/olifante · 10 pointsr/Python

"Python for Data Analysis" is pretty good. It's written by Wes McKinney, the creator of Pandas, so its focus is using Pandas for data analysis, but it does include sections on basic and advanced NumPy features: http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793

Alternatively, the prolific Ivan Idris has written four books covering different aspects of NumPy, all published by Packt Publishing. I haven't read any of them, but the Amazon reviews seem OK:

u/Lakerfan1994 · 8 pointsr/statistics

I would suggest getting some basic computing skills first. This book gives you a great grasp on data analysis in Python with statistical applications explored in the later part of the book. Read the whole thing through.

http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793

u/astrokeat · 6 pointsr/learnpython

I recommend Python for Data Analysis (Holy shit! That's the title of your post!). It's written by the author of Pandas and I have found it incredibly straightforward and helpful.

u/uwjames · 5 pointsr/datascience

There is a LOT you can learn. It can be very bewildering. Here are some links that should help you get started. There are a lot of other posts in this sub with good tips so you should browse a bit.

https://www.reddit.com/r/datascience/comments/7ou6qq/career_data_science_learning_path/

https://www.dataquest.io/blog/why-sql-is-the-most-important-language-to-learn/

https://www.becomingadatascientist.com/2016/08/13/podcast-episodes-0-3/

https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793

Sooner or later you'll want to start tackling some projects. That's basically where I am now in the process. I'm at the point where I know enough about Python, Statistics, and SQL to integrate some skills and hopefully do something interesting.

Best advice I can give you is

  1. Keep moving forward even if the task is daunting.

  2. Try to code for at least an hour every day
u/elitelimfish · 5 pointsr/FinancialCareers
  1. WSO is a great place to see other people's questions on this stuff so you might want to check that out.

  2. Starting pay at an okay shop should land you at least $150k but good shops will be north of $200k (Citadel, DE Shaw, Two Sigma, etc.) Quant salaries vary greatly, however the upside is practically unlimited. Not sure about other firms but at Citadel they generally don't go above 60/week.

  3. Spend some time looking at the applications for places/roles you're interested in as they will be rather specific on qualifications and background.

  4. I'm assuming you are looking to be a Quant Researcher which is where the real work is done. Many places will look at your thesis and go hardcore on poking holes in it so be ready to defend it. Your ability to research and possibly implement solutions is what they're looking for here.

  5. HFT quant work generally utilizes C++ for execution of a strategy as it runs fastest. Python and R is useful for research and analysis. In this area I'd recommend reading This Book written by a former AQR quant.

    Also I've heard good things about this book This Book. But haven't gone through it myself.

  6. Jobs are pretty stable as long as you are good at what you do. Good quant divisions will have phenomenal returns and the employees will have a good work/life balance.

  7. Location-wise NYC is naturally the best place, however Chicago would be your #2 bet.
u/ccc31807 · 4 pointsr/OMSA

I was in the same boat, with a history undergraduate major and limited math (although I picked up an MS in CS), working full time. My first semester I registered for 6040 and 8803 and had to drop 8803 because of the workload. Second semester I registered for 6501 and 6242 and had to drop 6242 because of the workload. You <might> be able to handle two courses, but GT has a lenient drop policy so the only downside is that you lose your money.

Standard advice: do your best to work through the following two books Before you start:

http://faculty.marshall.usc.edu/gareth-james/ISL/

https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793

u/justphysics · 3 pointsr/Python

This question or a variant comes up nearly weekly.

I always try to respond, if one doesn't exist already, with a plug for the module 'Pandas'.

Pandas is a data analysis module for python with built in support for reading Excel files. Pandas is perfect for database style work where you are reading csv files, excel files, etc, and creating table like data sets.

If you have used the 'R' language the pandas DataFrame may look familiar.

Specifically look at the method read_excel: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.excel.read_excel.html

main website: http://pandas.pydata.org/

book that I use frequently for a reference and examples: http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793

u/mapImbibery · 3 pointsr/learnpython

I wanna say that Wes mentions R in his book but I'm not sure. I know numpy and pandas are pretty dang fast though, it's the statistics that Python isn't so great with.

u/1istening · 3 pointsr/opendata

There's a great book about this! It goes over python basics and then goes in depth on Pandas, which is a python library used for data analysis.

I think if you've never used Python before it couldn't hurt to also find some general intro-to-python online tutorial to supplement it.

u/blue6249 · 2 pointsr/LinuxActionShow

>Like the concept of piping info between applications is just starting to make sense (even though I have no clue how it works).

Coming from a programming background it might be easier for you to think of each of the little unix core programs as a function. They all have options and generally do one thing really well. "grep" searches for things. "sed" does regex matching/replacment. "cut"... well it cuts out parts of files. The easiest way to figure out what something does is probably through the man page. (run "man grep" at the terminal). That being said some programs have -really- goddamn big man pages and are much harder to navigate. Bash, for instance, has an enormous man page.

The concept of piping makes more sense in the context of functions. In python you might write something like this:

"hello".upper()

Which would give you:

"HELLO"

In bash you could write that as:

echo "hello" | tr '[a-z]' '[A-Z]'

That first command just prints out the string, but instead of printing it out at your terminal the pipe will send all of it's output to the "tr" command. ("man tr" will help you understand what it's doing there). Because tr does not have it's output being redirected it just gets printed back to the terminal.

>Question 1, should I stick with zsh or learn the basics of bash first?

I don't think you would have much of a problem learning either just so long as you understand that there will be minor differences between different shell languages. Those differences tend to be syntax rather than functionality, and when it is a difference in functionality it tends to be much less commonly used features. If you have to choose one I would recommend bash for scripting solely because it is somewhat more portable. "sh" is even more portable than bash, though it can be more painful to use since it doesn't have some of the nice features in modern shells. Remember that you don't have to use the same language for your shell and for your scripts. You just have to define a different shebang on the first line of the script.

>2. what are some things I can use scripting for (what do you use it for)?

I don't find myself scripting much at home. At work though I spend a TON of time writing various scripts. What I -do- use bash for a ton is one-liners. Once you get used to the syntax you can write some very useful code in just a couple lines. One example that I use frequently is "Run this command every 10 seconds forever" which can be written as

while sleep 10; do
{command}
done

The "watch" program does more-or-less the same thing, but I find it unwieldy once the commands inside get more complex.

An example of a somewhat longer, and arguably poorly written script for backups using tarsnap is here.

>Any explination for common commands would be awesome.

As I mentioned earlier "man" is your friend. The other option is "command --help". You can generally google for some examples, which can be really useful for some of the less easily grok'd programs (awk, for example).

>And I do know a bit of python and have heard of iPython. Could that be a replacement for bash or zsh or is that something completely different and I'm in over my head (very likely). Much thanks.

ipython is not going to be a good replacment for your standard shell. It's cool, and I use it frequently when coding in python, but it simply lacks the powerful integration with the system that bash/zsh has. What it is extremely useful for though is exploratory programming. What really opened my eyes on the subject was the book Python for Data Analysis.

Edit: Syntax

Also, for any shell junkies please don't complain about the non-necessary "echo" up there. I know you could use a here string, but I think it would defeat the purpose of an easily digested example.

u/rhiever · 2 pointsr/Python
u/tidier · 2 pointsr/Python
u/shaggorama · 2 pointsr/learnpython

I also do a fair amount of NLP and anomaly detection in my work and use python for both. The reason I suggested starting with numpy is because, as I suggested, it is the basis on which everything else is built on.

I learned python before R, then used R for my scientific computing needs, then learned the scientific computing stack in python after building out my data science chops in R. I've found the numpy array datatype much less intuitive to work with than R vectors/matrices. I think it's really important to understand how numpy.ndarrays work (in particulary, memory views, fancy indexing and broadcasting) if you're going to use them with any regularity.

It doesn't take a ton of time to learn the basics, and to this day the most pernicious bugs I wrestle with in my scientific (python) code relate to mistakes in how I use numpy.ndarrays.

Maybe you don't think it's that important to learn scipy. I think it's useful to at least know what's available in that library, but whatever. But I definitely believe people should start with numpy before jumping into the rest of the stack. Even the book Python for Data Analysis (which is really about using pandas) starts with numpy.

Also, I strongly suspect you use "out of the box" numpy more often than you're giving it credit.

u/Nessnah · 2 pointsr/UIUC

Wish I had seen this post sooner, not sure if you'll still see this but I was pretty much in the same situation as you this past year. Statistics student trying to get into data analytics (insurance/finance). Most of these tips have already been mentioned but they are definitely valuable if you are trying to get an internship and don't have any other experience.

  • Go to career fairs. Career fair is a MUST if you don't have anything that stands out in your resume. If you don't have a perfect GPA, any internship/research experience, or noticeable personal projects then your resume won't stand out much against the hundreds of others that are submitted online. Going to a career fair gives you the chance to stand out or at least be memorable for recruiters. I applied to probably over 50 companies as well for my internship but the majority of my interviews came from career fairs. Also make sure to not limit yourself to just the Statistics/Actuarial career fair since this one is fairly small compared to others and options are much more limited. I researched some of the companies that were at the Business and Engineering fair and the positions that would be relevant for a statistic student; I got more interviews from these fairs than the Statistic one that happens later on in the year.

  • Update/Review your resume. You mentioned you only got one interview (that wasn't even relevant) out of ~50 applications which is pretty low even for someone without prior experience. Make sure your resume is formatted well and have others review it. I'm on r/cscareerquestions a lot and they have some good daily resume threads on every Tuesdays (even if you're not cs the formatting can be similar in that you should list languages/technology along with personal projects). There's also an LAS resume review office on campus available for students. When I went the professional reviewing my resume didn't know much about STEM related careers but he was able to give me some general resume tips (e.g. consistent spacing, action words, typos/grammar, eliminating white space). Also make sure your resume is always in PDF format when you submit it; resumes submitted as DOC files were usually the worst resumes at my previous job.

  • Learn and apply languages/statistical packages related to your field. Earlier you said that you are interested in learning Python and R which are very popular in most data analytic roles. Depending on how far you are in your STAT courses I wouldn't worry too much about R since it'll be used in a lot of STAT 400+ classes. Codeacademy would be a good start as an introduction to Python along with the other resources people have mentioned. After going through some of those online resources I'd also recommend you to take a look at Python for Data Analysis, it can be a difficult read but you will learn a lot about important packages that are used in the industry (NumPy, Pandas, Requests, SciPy, etc).

  • Work on your soft skills. I'm not sure if this applies to you but make sure you've practiced ways of approaching recruiters and interviews in a confident and professional manner. Many of the recruiters at the career fair are employees that work in which ever field they are trying to recruit in. On top of finding students that are qualified they also want interns that will be a good fit to their work culture. Being genuine and professional seems go a long way for interviewers/recruiters.

    All this being said, this should be taken with a grain of salt. I'm not a recruiter or a full time at a fortune 500, but these are some of the steps I took to get some internship offers this summer.
u/millsGT49 · 1 pointr/gatech

I was ISYE so I'm not sure how much you are allowed to cross over being CS but I would absolutely recommend taking a regression course. ISYE also has some data analysis electives, but to me learning and mastering regression is a must.

BBUUTT my biggest recommendation is to start playing with data yourself. I am a "Data Scientist" and graduated from the MS Analytics program at Tech and still to this day I learn the most just from playing around with data sets and trying new techniques or learning new coding tools. Don't wait to take classes to jump in, just go.

Here are some great books to get started doing "data science" in R and Python.

R: Introduction to Statistical Learning (free!!)

Python: Python for Data Analysis

u/atmontague · 1 pointr/learnprogramming

This might work for you.

u/nbitting · 1 pointr/learnpython

This book is by Wes McKinney, the author of Pandas. It's a great resource. https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793

u/its_joao · 1 pointr/learnpython

You see, python is a very simple language that doesn't require you to annotate everything line by line. You might be better off brushing up your general python knowledge befire jumping into projects. This will save you time having to read or looking for comments to understand the code. Also, consider looking at the requirements.txt file for the imports of a particular repo. It'll tell you what packages are being used and you can then Google their documentation.

I'd definitely recommend you to read a book about python first. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython https://www.amazon.co.uk/dp/1449319793/ref=cm_sw_r_cp_apa_i_U38gDb4RE5933

u/josiahstevenson · 1 pointr/BigDataJobs

Sorry for the misunderstanding --

>Factset, Bloomberg, Dimensional, AQR

Are not so much resources for dealing with data as employers of data wranglers. I mean Factset and Bloomberg are data providers, but...again, I was suggesting you look for employment with them, not have them teach you.

As for learning:

  • Sounds like you're still in school. Take as many stats and econometrics (especially "time series" anything) classes as you can, if you want to do data stuff in finance. Or...data stuff at all, really.

  • Python for Data Analysis is a guide to using a particular programming language (Python) to analyze data. The author developed the main library he showcases (pandas) while he was working for AQR, one of the biggest quant hedge fund managers, and open-sourced it when he left. Some of the examples in the book have to do with finance because of this.

  • You might like Quantopian especially if you like Python.

u/[deleted] · 0 pointsr/mysql

The Oracle training is outdated and irrelavant. The Percona training is up to date and very good. But both are aimed at DBA's, sysadmins, and application developers.

For your needs, you need to learn SQL, and learn to get useful information out of alien data sets.

Start with the basics: