언어/파이썬

머신러닝with파이썬3강(3)_자료형변환, 데이터탐색

디지털랫드 2024. 3. 4. 12:24

자료형 변환

이번 시간에는 자료형을 변환하는 방법을 설명해 드리도록 하겠습니다.

이번 시간 정리

1. type확인

df.info()

2. object(문자열)을 int64(정수) 으로 변경하기

df['컬럼명'] = df['컬럼명'].str.replace('변경전 문자열' , '')
replace는 문자열을 변경하는 함수이나 문자열에 변경할 문자열을 입력하지 않을 경우 문자열을 삭제한다.
df['컬럼명'] = df['컬럼명'].astype(int)

[리마인드]replace

문자열을 변경하는 함수이다.
df = df.replace('변경전 메뉴명', '변경후 메뉴명')

더 알아보기

※ 판다스와 파이썬의 자료형(type) 비교

판다스
-object(문자열) : 예) '호수'
-int64(정수) : 예) 10
-float64(실수) : 예) 12.12
파이썬
-string(문자열) : 예) '호수'
-int(정수) : 예) 10
-float(실수) : 예) 12.12

이제 아래의 코드 실행 버튼을 눌러 실습을 진행해 보세요!

실행 완료

[34]:

Unnamed: 0메뉴가격호수칼로리할인율할인가원산지752341608

new	[인기]아이펠치킨	16000	11	1200.0	0.5	8000.0	국내산
5	닭강정	15000	12	1500.0	0.2	12000.0	브라질
2	간장치킨	14000	9	1600.0	0.2	11200.0	국내산
3	마늘치킨	14000	9	1800.0	0.2	11200.0	국내산
4	파닭	14000	11	1300.0	0.2	11200.0	브라질
1	승일양념치킨	13000	10	1400.0	0.2	10400.0	국내산
6	양념반후라이드반	13000	10	1300.0	0.2	10400.0	국내산
0	황금후라이드	12000	10	1000.0	0.2	9600.0	국내산
10	[베스트]풀잎치킨	9900	10	1000.0	NaN	NaN	국내산

실행 완료

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9 entries, 7 to 8
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  9 non-null      object 
 1   메뉴          9 non-null      object 
 2   가격          9 non-null      int64  
 3   호수          9 non-null      object 
 4   칼로리         9 non-null      float64
 5   할인율         8 non-null      float64
 6   할인가         8 non-null      float64
 7   원산지         9 non-null      object 
dtypes: float64(3), int64(1), object(4)
memory usage: 648.0+ bytes

실행 완료

[32]:

Unnamed: 0메뉴가격호수칼로리할인율할인가원산지752341608

new	[인기]아이펠치킨	16000	11	1200.0	0.5	8000.0	국내산
5	닭강정	15000	12	1500.0	0.2	12000.0	브라질
2	간장치킨	14000	9	1600.0	0.2	11200.0	국내산
3	마늘치킨	14000	9	1800.0	0.2	11200.0	국내산
4	파닭	14000	11	1300.0	0.2	11200.0	브라질
1	승일양념치킨	13000	10	1400.0	0.2	10400.0	국내산
6	양념반후라이드반	13000	10	1300.0	0.2	10400.0	국내산
0	황금후라이드	12000	10	1000.0	0.2	9600.0	국내산
10	[베스트]풀잎치킨	9900	10	1000.0	NaN	NaN	국내산

코드 실행

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_13/1355878569.py in <module>
      1 # 자료형 변환 / astype /  object -> int
----> 2 df['호수'] = df['호수'].astype(int)
      3 df['호수']

/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors)
   5813         else:
   5814             # else, only a single dtype is given
-> 5815             new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
   5816             return self._constructor(new_data).__finalize__(self, method="astype")
   5817 

/opt/conda/lib/python3.9/site-packages/pandas/core/internals/managers.py in astype(self, dtype, copy, errors)
    416 
    417     def astype(self: T, dtype, copy: bool = False, errors: str = "raise") -> T:
--> 418         return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
    419 
    420     def convert(

/opt/conda/lib/python3.9/site-packages/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
    325                     applied = b.apply(f, **kwargs)
    326                 else:
--> 327                     applied = getattr(b, f)(**kwargs)
    328             except (TypeError, NotImplementedError):
    329                 if not ignore_failures:

/opt/conda/lib/python3.9/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors)
    590         values = self.values
    591 
--> 592         new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
    593 
    594         new_values = maybe_coerce_values(new_values)

/opt/conda/lib/python3.9/site-packages/pandas/core/dtypes/cast.py in astype_array_safe(values, dtype, copy, errors)
   1307 
   1308     try:
-> 1309         new_values = astype_array(values, dtype, copy=copy)
   1310     except (ValueError, TypeError):
   1311         # e.g. astype_nansafe can fail on object-dtype of strings

/opt/conda/lib/python3.9/site-packages/pandas/core/dtypes/cast.py in astype_array(values, dtype, copy)
   1255 
   1256     else:
-> 1257         values = astype_nansafe(values, dtype, copy=copy)
   1258 
   1259     # in pandas we don't store numpy str dtypes, so convert to object

/opt/conda/lib/python3.9/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
   1172         # work around NumPy brokenness, #1987
   1173         if np.issubdtype(dtype.type, np.integer):
-> 1174             return lib.astype_intsafe(arr, dtype)
   1175 
   1176         # if we have a datetime/timedelta array of objects

/opt/conda/lib/python3.9/site-packages/pandas/_libs/lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: '11호'

실행 완료

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9 entries, 7 to 8
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  9 non-null      object 
 1   메뉴          9 non-null      object 
 2   가격          9 non-null      int64  
 3   호수          9 non-null      object 
 4   칼로리         9 non-null      float64
 5   할인율         8 non-null      float64
 6   할인가         8 non-null      float64
 7   원산지         9 non-null      object 
dtypes: float64(3), int64(1), object(4)
memory usage: 648.0+ bytes

코드 실행

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.9/site-packages/pandas/core/nanops.py in _ensure_numeric(x)
   1601         try:
-> 1602             x = float(x)
   1603         except (TypeError, ValueError):

ValueError: could not convert string to float: '11호12호9호9호11호10호10호10호10호'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/opt/conda/lib/python3.9/site-packages/pandas/core/nanops.py in _ensure_numeric(x)
   1605             try:
-> 1606                 x = complex(x)
   1607             except ValueError as err:

ValueError: complex() arg is a malformed string

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_13/4120119945.py in <module>
      1 # 호수 평균
----> 2 df['호수'].mean()

/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py in mean(self, axis, skipna, level, numeric_only, **kwargs)
  10749         )
  10750         def mean(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
> 10751             return NDFrame.mean(self, axis, skipna, level, numeric_only, **kwargs)
  10752 
  10753         setattr(cls, "mean", mean)

/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py in mean(self, axis, skipna, level, numeric_only, **kwargs)
  10367 
  10368     def mean(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs):
> 10369         return self._stat_function(
  10370             "mean", nanops.nanmean, axis, skipna, level, numeric_only, **kwargs
  10371         )

/opt/conda/lib/python3.9/site-packages/pandas/core/generic.py in _stat_function(self, name, func, axis, skipna, level, numeric_only, **kwargs)
  10352                 name, axis=axis, level=level, skipna=skipna, numeric_only=numeric_only
  10353             )
> 10354         return self._reduce(
  10355             func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only
  10356         )

/opt/conda/lib/python3.9/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   4390                 )
   4391             with np.errstate(all="ignore"):
-> 4392                 return op(delegate, skipna=skipna, **kwds)
   4393 
   4394     def _reindex_indexer(

/opt/conda/lib/python3.9/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
     91             try:
     92                 with np.errstate(invalid="ignore"):
---> 93                     return f(*args, **kwargs)
     94             except ValueError as e:
     95                 # we want to transform an object array

/opt/conda/lib/python3.9/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    153                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    154             else:
--> 155                 result = alt(values, axis=axis, skipna=skipna, **kwds)
    156 
    157             return result

/opt/conda/lib/python3.9/site-packages/pandas/core/nanops.py in new_func(values, axis, skipna, mask, **kwargs)
    408             mask = isna(values)
    409 
--> 410         result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
    411 
    412         if datetimelike:

/opt/conda/lib/python3.9/site-packages/pandas/core/nanops.py in nanmean(values, axis, skipna, mask)
    663 
    664     count = _get_counts(values.shape, mask, axis, dtype=dtype_count)
--> 665     the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
    666 
    667     if axis is not None and getattr(the_sum, "ndim", False):

/opt/conda/lib/python3.9/site-packages/pandas/core/nanops.py in _ensure_numeric(x)
   1607             except ValueError as err:
   1608                 # e.g. "foo"
-> 1609                 raise TypeError(f"Could not convert {x} to numeric") from err
   1610     return x
   1611 

TypeError: Could not convert 11호12호9호9호11호10호10호10호10호 to numeric

모두의연구소는 전자상거래 등에서의 소비자보호에 관한 법률에 따른 통

3-7. 데이터 탐색

데이터 탐색

이번 시간에는 판다스에서 제공하고 있는 내장 함수를 통해서 데이터 탐색을 진행해 보도록 하겠습니다.

이번 시간 정리

1. 데이터 프레임 크기 (행, 컬럼)
df.shape

2. 데이터 샘플 확인 (head)
df.head()

3. 컬럼 형태 (type)
df.info()

4. 기초 통계

숫자 타입만 확인 할 수 있다.
df.describe()

5. 기초 통계 (object(문자열))

문자 타입만 확인 할 수 있다.
df.describe(include='O')

6. 상관관계
df.corr()

7. 항목 종류 수
df.nunique()

8. 항목 종류
df['컬럼명'].unique()

9. 항목별 개수

카운트까지 같이 보여주므로 추천
df['컬럼명'].value_counts()

이제 아래의 코드 실행 버튼을 눌러 실습을 진행해 보세요!

실행 완료

[35]:

Unnamed: 0메뉴가격호수칼로리할인율할인가원산지752341608

new	[인기]아이펠치킨	16000	11	1200.0	0.5	8000.0	국내산
5	닭강정	15000	12	1500.0	0.2	12000.0	브라질
2	간장치킨	14000	9	1600.0	0.2	11200.0	국내산
3	마늘치킨	14000	9	1800.0	0.2	11200.0	국내산
4	파닭	14000	11	1300.0	0.2	11200.0	브라질
1	승일양념치킨	13000	10	1400.0	0.2	10400.0	국내산
6	양념반후라이드반	13000	10	1300.0	0.2	10400.0	국내산
0	황금후라이드	12000	10	1000.0	0.2	9600.0	국내산
10	[베스트]풀잎치킨	9900	10	1000.0	NaN	NaN	국내산

실행 완료

[36]:

(9, 8)

실행 완료

[37]:

Unnamed: 0메뉴가격호수칼로리할인율할인가원산지75234

new	[인기]아이펠치킨	16000	11	1200.0	0.5	8000.0	국내산
5	닭강정	15000	12	1500.0	0.2	12000.0	브라질
2	간장치킨	14000	9	1600.0	0.2	11200.0	국내산
3	마늘치킨	14000	9	1800.0	0.2	11200.0	국내산
4	파닭	14000	11	1300.0	0.2	11200.0	브라질

실행 완료

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9 entries, 7 to 8
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  9 non-null      object 
 1   메뉴          9 non-null      object 
 2   가격          9 non-null      int64  
 3   호수          9 non-null      object 
 4   칼로리         9 non-null      float64
 5   할인율         8 non-null      float64
 6   할인가         8 non-null      float64
 7   원산지         9 non-null      object 
dtypes: float64(3), int64(1), object(4)
memory usage: 648.0+ bytes

실행 완료

[39]:

가격칼로리할인율할인가countmeanstdmin25%50%75%max

9.000000	9.000000	8.000000	8.000000
13433.333333	1344.444444	0.237500	10500.000000
1764.936259	265.099562	0.106066	1242.118007
9900.000000	1000.000000	0.200000	8000.000000
13000.000000	1200.000000	0.200000	10200.000000
14000.000000	1300.000000	0.200000	10800.000000
14000.000000	1500.000000	0.200000	11200.000000
16000.000000	1800.000000	0.500000	12000.000000

실행 완료

[40]:

Unnamed: 0메뉴호수원산지countuniquetopfreq

9	9	9	9
9	9	4	2
new	[인기]아이펠치킨	10	국내산
1	1	4	7

실행 완료

[41]:

가격칼로리할인율할인가가격칼로리할인율할인가

1.000000	0.522744	0.688875	-0.138409
0.522744	1.000000	-0.306122	0.636659
0.688875	-0.306122	1.000000	-0.813250
-0.138409	0.636659	-0.813250	1.000000

실행 완료

[42]:

Unnamed: 0    9
메뉴            9
가격            6
호수            4
칼로리           7
할인율           2
할인가           5
원산지           2
dtype: int64

실행 완료

[43]:

array(['11', '12', '9', '10'], dtype=object)

실행 완료

[44]:

10    4
11    2
9     2
12    1
Name: 호수, dtype: int64

'언어 > 파이썬' 카테고리의 다른 글

머신러닝with파이썬3강(5)_그룹핑, apply함수 (0)	2024.03.06
머신러닝with파이썬3강(4)_내장함수 (0)	2024.03.05
머신러닝with파이썬3강(2)_결측치처리,정렬 (0)	2024.03.03
머신러닝with파이썬3강(1)_데이터불러오기,만들기,삭제하기 (0)	2024.03.02
머신러닝with파이썬2강(3)_index다루기,행과 열 추가, 값 변경 (0)	2024.03.01

현재글머신러닝with파이썬3강(3)_자료형변환, 데이터탐색

기본 언어 학습, AI소식, 논문

회계감사경영인, 상장법인감사, 회계감사, 비상장법인감사, 공기업회계감사, 회계감사의정의, 회계감사부정, 소규모비상장감사, 국제회계감사기준, 법인감사, 한국회계감사기준, IAASB, 국제감사기준, 소규모상장감사, 재무제표작성의무, 적정의견, 감사와 검토, 감사는 누가 하는가, 전문가적의구심, 감사는 왜 하는가,

Today :
Yesterday :

AI+ism(에이아이즘)

머신러닝with파이썬3강(3)_자료형변환, 데이터탐색

자료형 변환

이번 시간 정리

더 알아보기

3-7. 데이터 탐색

이번 시간 정리

'언어 > 파이썬' 카테고리의 다른 글

'언어/파이썬'의 다른글

티스토리툴바

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

머신러닝with파이썬3강(3)_자료형변환, 데이터탐색

자료형 변환

이번 시간 정리

더 알아보기

3-7. 데이터 탐색

이번 시간 정리

'언어 > 파이썬' 카테고리의 다른 글

'언어/파이썬'의 다른글

관련글

티스토리툴바